One of the goals of the project Mancoosi is to get together
researcher from various disciplines to advance the state of art of
package managers. To this end, we organized an sat solving
competition specifically tailored to
upgrade/installation problems. The
winner of the competition
was announced during the workshop
lococo hosted at the international
conference FLOC the
10th of july 2010. I spent several hours preparing the infrastructure
for the competition and here I’d like to give a brief account of my
experience. This work was done together with Ralf Treinen, Roberto Di
Cosmo and Stefano Zacchiroli.
The input for the solvers in the competition are documents in the
cudf format. In the last year we
collected a number of cudf documents from the community (namely, with
the help of mandriva, caixa magica and debian). These documents are
stored on the mancoosi.org servers and can be used to train meta
installers on particularly difficult problems. We used 20 of such
documents (from debian) for the misc 2010 competition. Unfortunately,
due to problems in the conversion between dudf (a distribution
specific format use to collect upgrade problems) and cudf, we were not
able to use neither problems from mandriva or caixa magica.
Other then these real problems, we generated a number of artificial
problems built from debian repositories. The utility we used is called
randcudf
and it is freely available on the mancoosi
website, part of the
dose3
framework. We kept a number of variables into consideration
in order to generate difficult problems but not so far away from the
reality of every day use.
Among these parameters are
the size of the package universe
the number of packages that are declared as already installed in the universe (status)
the number of packages to install / remove / upgrade
the probability of generating install / remove requests with a version constraint
the number of packages declared as installed but whose dependencies might not be satisfied
the number of packages marked as keep
and the type of keep (version or package)
Playing around with these variables we were able to produce problems
of different size and different degree of complexity. During the
competition, for example, the three categories had respectively a
universe with 30k , 50K and 100K packages. Moreover we discarded all
problems that do not have a solution at all.
From our experience during the problem selection, considering over 30K
packages, if extremely easy to generate cudf problems that do not have
a solution at all. For example in debian lenny there are 17K packages
connected by a kernel of 80 conflicts. This configuration produce
around 5K strong conflicts. This means that if we pick two packages
among these 17K there is a high probability that these two packages
are in conflict. This is because of the high level of
inter-dependencies of open source distributions. With bigger
remove/install requests this probability grows even bigger. Since the
goal was to provide random problems as close as possible to reality
our documents have a request to install at most 10 packages and remove
10 packages at the same time.
The five categories used in the competition :
- cudf_set : Encoding in cudf of 1-in-3-SAT
- debian-dudf : Real installation problems, Problems collected via dudf-save.
- easy : Debian unstable / desktop installation from unstable / 10 install - 10 remove
- difficult : Debian stable + unstable / server installation from stable / 10 install - 10 remove - 1 upgrade all
- impossible : Debian oldstable, stable, testing, unstable / server installation from oldstable / 10 install - 10 remove - 1 upgrade all
Execution
The execution environment of the competition was set up in a dedicated
virtual machine running on Xen. This VM respects the specification
given in the rules of the
competition. We did two
small mistakes (to be fixed in the next version of the competition).
First we did not specify the OS running in the virtual machine. To
reproduce the results, everybody should be able to replicate the
execution environment and re-run the competition. For Misc2010, we
used a plain debian lenny (plus security updates). We tried to
maintain the strictly minimum all additional software.
Starting from a base system (as generated by debootstrap) we added :
subversion (used to update the execution environment from the mancoosi svn)
git (used to register various versions of the submitted solvers)
* sun-java (from non-free)
The second mistake was to not specify exactly the java version.
open-java has subtle differences from sun-java and it seems these
differences created a few problems for one of the participants. This
problem was quickly rectified.
Running the competition
To run the competition I wrote few simple bash scripts and test cases.
The test cases were meant to test the execution environment and to be
sure that all constraints were correctly enforced. The execution
environment is
available
in the mancoosi svn. In practice, we run the competition in four phases.
Phase One
In the first one we deployed all solvers in the execution environment.
In order to cleanup the solver directory and “start fresh” after every
invocation, I created an empty git
repository for every solver.
After each invocation, the repository was cleaned-up using
git clean -f -d
git reset --hard
find /tmp/ -user misc2010 -exec rm -Rf {} \;
the last line is to make sure that no temporary files was left in the
temporary directory.
Phase Two
In the second phase, we actually run the competition. The script used
is runcomp.sh
. It takes 3 arguments, the list of solvers, the
list of problems and a timeout in seconds. Since we used the same set
of problems for the trendy and paranoid track we run the competition
only once for both tracks. The output of the runcomp.sh
script
is a directory (i.e. tmp/201007060918
) with all the raw results
of the competition. All raw data is publicly available
here .
Phase Three
In third phase we compute the aggregate results by track using the
script recompute.sh
. This script takes 4 arguments: the list of
all solvers in one track, the list of problems (the same used before),
the timestamp of the last run (ex 201007060918
) and the name of
the track. The output of this script is a file containing all the
aggregate results, one per line, of the form category, problem,
solver, time, results
. For example a snippet from this file looks
like :
cudf_set huge1.cudf apt-pbo-paranoid-1.0.5 - ABORT
cudf_set huge1.cudf p2cudf-paranoid-1.6 2.28814 FAIL
cudf_set huge1.cudf uns-paranoid-0.0002 - ABORT
cudf_set huge1.cudf ucl-cprel-1.0 0 FAIL
cudf_set huge1.cudf aspcud-paranoid-1.0 - ABORT
cudf_set huge1.cudf inescp-1.0 1.94812 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf apt-pbo-paranoid-1.0.5 11.75 SUCCESS -18,-21
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf p2cudf-paranoid-1.6 6.06838 SUCCESS -16,-34
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf uns-paranoid-0.0002 - ABORT
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf ucl-cprel-1.0 0 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf aspcud-paranoid-1.0 14.4089 SUCCESS -15,-29
...
Results are generated using the solution
checker .
Phase Four
The last step is the classification of the solutions. The
misc-classifier
gets as input the aggregates results and outputs the html tables that
will be then published on the web.
Conclusions
Running a solver competition is not as easy as it seems. To get it
right we run an internal competition in january 2009 that helped us to
highlight, understand and solve different problems. It is mostly a
matter of writing down the rules, specify a clear and understandable
protocol for the solver submission (for example asking to version
their solver and a md5 hash associated to the binary is a very good
idea in order to avoid mix-up ) and spend some time to debug the
scripts. The runsolver
utility from Olivier Roussel (available
here) is a very
nice tool that can take care of many delicate details in process
accounting and resource management. I added a small patch to be able
to specify a specific signal as warning signal. The code is in my
private git repository : git clone
http://mancoosi.org/~abate/repos/runsolver.git/
. This is the
actual code we used for the competition. The 32 bit binary is
available in the svn. All in all it was a great experience.
The results of the competition are published
here .