One of the goals of the project Mancoosi is to get together researcher from various disciplines to advance the state of art of package managers. To this end, we organized an sat solving competition specifically tailored to upgrade/installation problems. The winner of the competition was announced during the workshop lococo hosted at the international conference FLOC the 10th of july 2010. I spent several hours preparing the infrastructure for the competition and here I’d like to give a brief account of my experience. This work was done together with Ralf Treinen, Roberto Di Cosmo and Stefano Zacchiroli.
Input - cudf documents
The input for the solvers in the competition are documents in the cudf format. In the last year we collected a number of cudf documents from the community (namely, with the help of mandriva, caixa magica and debian). These documents are stored on the mancoosi.org servers and can be used to train meta installers on particularly difficult problems. We used 20 of such documents (from debian) for the misc 2010 competition. Unfortunately, due to problems in the conversion between dudf (a distribution specific format use to collect upgrade problems) and cudf, we were not able to use neither problems from mandriva or caixa magica.
Other then these real problems, we generated a number of artificial
problems built from debian repositories. The utility we used is called
randcudf
and it is freely available on the mancoosi
website, part of the
dose3
framework. We kept a number of variables into consideration
in order to generate difficult problems but not so far away from the
reality of every day use.
Among these parameters are
the size of the package universe
the number of packages that are declared as already installed in the universe (status)
the number of packages to install / remove / upgrade
the probability of generating install / remove requests with a version constraint
the number of packages declared as installed but whose dependencies might not be satisfied
the number of packages marked as keep
and the type of keep (version or package)
Playing around with these variables we were able to produce problems of different size and different degree of complexity. During the competition, for example, the three categories had respectively a universe with 30k , 50K and 100K packages. Moreover we discarded all problems that do not have a solution at all.
From our experience during the problem selection, considering over 30K packages, if extremely easy to generate cudf problems that do not have a solution at all. For example in debian lenny there are 17K packages connected by a kernel of 80 conflicts. This configuration produce around 5K strong conflicts. This means that if we pick two packages among these 17K there is a high probability that these two packages are in conflict. This is because of the high level of inter-dependencies of open source distributions. With bigger remove/install requests this probability grows even bigger. Since the goal was to provide random problems as close as possible to reality our documents have a request to install at most 10 packages and remove 10 packages at the same time.
The five categories used in the competition :
- cudf_set : Encoding in cudf of 1-in-3-SAT
- debian-dudf : Real installation problems, Problems collected via dudf-save.
- easy : Debian unstable / desktop installation from unstable / 10 install - 10 remove
- difficult : Debian stable + unstable / server installation from stable / 10 install - 10 remove - 1 upgrade all
- impossible : Debian oldstable, stable, testing, unstable / server installation from oldstable / 10 install - 10 remove - 1 upgrade all
Execution
The execution environment of the competition was set up in a dedicated virtual machine running on Xen. This VM respects the specification given in the rules of the competition. We did two small mistakes (to be fixed in the next version of the competition). First we did not specify the OS running in the virtual machine. To reproduce the results, everybody should be able to replicate the execution environment and re-run the competition. For Misc2010, we used a plain debian lenny (plus security updates). We tried to maintain the strictly minimum all additional software.
Starting from a base system (as generated by debootstrap) we added : subversion (used to update the execution environment from the mancoosi svn) git (used to register various versions of the submitted solvers) * sun-java (from non-free)
The second mistake was to not specify exactly the java version. open-java has subtle differences from sun-java and it seems these differences created a few problems for one of the participants. This problem was quickly rectified.
Running the competition
To run the competition I wrote few simple bash scripts and test cases. The test cases were meant to test the execution environment and to be sure that all constraints were correctly enforced. The execution environment is available in the mancoosi svn. In practice, we run the competition in four phases.
Phase One
In the first one we deployed all solvers in the execution environment.
In order to cleanup the solver directory and “start fresh” after every
invocation, I created an empty git
repository for every solver.
After each invocation, the repository was cleaned-up using
git clean -f -d
git reset --hard
find /tmp/ -user misc2010 -exec rm -Rf {} \;
the last line is to make sure that no temporary files was left in the temporary directory.
Phase Two
In the second phase, we actually run the competition. The script used
is runcomp.sh
. It takes 3 arguments, the list of solvers, the
list of problems and a timeout in seconds. Since we used the same set
of problems for the trendy and paranoid track we run the competition
only once for both tracks. The output of the runcomp.sh
script
is a directory (i.e. tmp/201007060918
) with all the raw results
of the competition. All raw data is publicly available
here .
Phase Three
In third phase we compute the aggregate results by track using the
script recompute.sh
. This script takes 4 arguments: the list of
all solvers in one track, the list of problems (the same used before),
the timestamp of the last run (ex 201007060918
) and the name of
the track. The output of this script is a file containing all the
aggregate results, one per line, of the form category, problem,
solver, time, results
. For example a snippet from this file looks
like :
cudf_set huge1.cudf apt-pbo-paranoid-1.0.5 - ABORT
cudf_set huge1.cudf p2cudf-paranoid-1.6 2.28814 FAIL
cudf_set huge1.cudf uns-paranoid-0.0002 - ABORT
cudf_set huge1.cudf ucl-cprel-1.0 0 FAIL
cudf_set huge1.cudf aspcud-paranoid-1.0 - ABORT
cudf_set huge1.cudf inescp-1.0 1.94812 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf apt-pbo-paranoid-1.0.5 11.75 SUCCESS -18,-21
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf p2cudf-paranoid-1.6 6.06838 SUCCESS -16,-34
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf uns-paranoid-0.0002 - ABORT
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf ucl-cprel-1.0 0 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf aspcud-paranoid-1.0 14.4089 SUCCESS -15,-29
...
Results are generated using the solution checker .
Phase Four
The last step is the classification of the solutions. The misc-classifier gets as input the aggregates results and outputs the html tables that will be then published on the web.
Conclusions
Running a solver competition is not as easy as it seems. To get it
right we run an internal competition in january 2009 that helped us to
highlight, understand and solve different problems. It is mostly a
matter of writing down the rules, specify a clear and understandable
protocol for the solver submission (for example asking to version
their solver and a md5 hash associated to the binary is a very good
idea in order to avoid mix-up ) and spend some time to debug the
scripts. The runsolver
utility from Olivier Roussel (available
here) is a very
nice tool that can take care of many delicate details in process
accounting and resource management. I added a small patch to be able
to specify a specific signal as warning signal. The code is in my
private git repository : git clone
http://mancoosi.org/~abate/repos/runsolver.git/
. This is the
actual code we used for the competition. The 32 bit binary is
available in the svn. All in all it was a great experience.
The results of the competition are published here .