misc 2010, how to run a solver competition

One of the goals of the project Mancoosi is to get together researcher from various disciplines to advance the state of art of package managers. To this end, we organized an sat solving competition specifically tailored to upgrade/installation problems. The winner of the competition was announced during the workshop lococo hosted at the international conference FLOC the 10th of july 2010. I spent several hours preparing the infrastructure for the competition and here I’d like to give a brief account of my experience. This work was done together with Ralf Treinen, Roberto Di Cosmo and Stefano Zacchiroli.

Input - cudf documents

The input for the solvers in the competition are documents in the cudf format. In the last year we collected a number of cudf documents from the community (namely, with the help of mandriva, caixa magica and debian). These documents are stored on the mancoosi.org servers and can be used to train meta installers on particularly difficult problems. We used 20 of such documents (from debian) for the misc 2010 competition. Unfortunately, due to problems in the conversion between dudf (a distribution specific format use to collect upgrade problems) and cudf, we were not able to use neither problems from mandriva or caixa magica.

Other then these real problems, we generated a number of artificial problems built from debian repositories. The utility we used is called randcudf and it is freely available on the mancoosi website, part of the dose3framework. We kept a number of variables into consideration in order to generate difficult problems but not so far away from the reality of every day use.

Among these parameters are the size of the package universe the number of packages that are declared as already installed in the universe (status) the number of packages to install / remove / upgrade the probability of generating install / remove requests with a version constraint the number of packages declared as installed but whose dependencies might not be satisfied the number of packages marked as keep and the type of keep (version or package)

Playing around with these variables we were able to produce problems of different size and different degree of complexity. During the competition, for example, the three categories had respectively a universe with 30k , 50K and 100K packages. Moreover we discarded all problems that do not have a solution at all.

From our experience during the problem selection, considering over 30K packages, if extremely easy to generate cudf problems that do not have a solution at all. For example in debian lenny there are 17K packages connected by a kernel of 80 conflicts. This configuration produce around 5K strong conflicts. This means that if we pick two packages among these 17K there is a high probability that these two packages are in conflict. This is because of the high level of inter-dependencies of open source distributions. With bigger remove/install requests this probability grows even bigger. Since the goal was to provide random problems as close as possible to reality our documents have a request to install at most 10 packages and remove 10 packages at the same time.

The five categories used in the competition :

cudf_set : Encoding in cudf of 1-in-3-SAT
debian-dudf : Real installation problems, Problems collected via dudf-save.
easy : Debian unstable / desktop installation from unstable / 10 install - 10 remove
difficult : Debian stable + unstable / server installation from stable / 10 install - 10 remove - 1 upgrade all
impossible : Debian oldstable, stable, testing, unstable / server installation from oldstable / 10 install - 10 remove - 1 upgrade all

Execution

The execution environment of the competition was set up in a dedicated virtual machine running on Xen. This VM respects the specification given in the rules of the competition. We did two small mistakes (to be fixed in the next version of the competition). First we did not specify the OS running in the virtual machine. To reproduce the results, everybody should be able to replicate the execution environment and re-run the competition. For Misc2010, we used a plain debian lenny (plus security updates). We tried to maintain the strictly minimum all additional software.

Starting from a base system (as generated by debootstrap) we added : subversion (used to update the execution environment from the mancoosi svn) git (used to register various versions of the submitted solvers) * sun-java (from non-free)

The second mistake was to not specify exactly the java version. open-java has subtle differences from sun-java and it seems these differences created a few problems for one of the participants. This problem was quickly rectified.

Running the competition

To run the competition I wrote few simple bash scripts and test cases. The test cases were meant to test the execution environment and to be sure that all constraints were correctly enforced. The execution environment is available in the mancoosi svn. In practice, we run the competition in four phases.

Phase One

In the first one we deployed all solvers in the execution environment. In order to cleanup the solver directory and “start fresh” after every invocation, I created an empty git repository for every solver. After each invocation, the repository was cleaned-up using

git clean -f -d
git reset --hard
find /tmp/ -user misc2010 -exec rm -Rf {} \;

the last line is to make sure that no temporary files was left in the temporary directory.

Phase Two

In the second phase, we actually run the competition. The script used is runcomp.sh. It takes 3 arguments, the list of solvers, the list of problems and a timeout in seconds. Since we used the same set of problems for the trendy and paranoid track we run the competition only once for both tracks. The output of the runcomp.sh script is a directory (i.e. tmp/201007060918) with all the raw results of the competition. All raw data is publicly available here .

Phase Three

In third phase we compute the aggregate results by track using the script recompute.sh. This script takes 4 arguments: the list of all solvers in one track, the list of problems (the same used before), the timestamp of the last run (ex 201007060918) and the name of the track. The output of this script is a file containing all the aggregate results, one per line, of the form category, problem, solver, time, results. For example a snippet from this file looks like :

cudf_set huge1.cudf apt-pbo-paranoid-1.0.5 - ABORT
cudf_set huge1.cudf p2cudf-paranoid-1.6 2.28814 FAIL
cudf_set huge1.cudf uns-paranoid-0.0002 - ABORT
cudf_set huge1.cudf ucl-cprel-1.0 0 FAIL
cudf_set huge1.cudf aspcud-paranoid-1.0 - ABORT
cudf_set huge1.cudf inescp-1.0 1.94812 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf apt-pbo-paranoid-1.0.5 11.75 SUCCESS -18,-21
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf p2cudf-paranoid-1.6 6.06838 SUCCESS -16,-34
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf uns-paranoid-0.0002 - ABORT
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf ucl-cprel-1.0 0 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf aspcud-paranoid-1.0 14.4089 SUCCESS -15,-29
...

Results are generated using the solution checker .

Phase Four

The last step is the classification of the solutions. The misc-classifier gets as input the aggregates results and outputs the html tables that will be then published on the web.

Conclusions

Running a solver competition is not as easy as it seems. To get it right we run an internal competition in january 2009 that helped us to highlight, understand and solve different problems. It is mostly a matter of writing down the rules, specify a clear and understandable protocol for the solver submission (for example asking to version their solver and a md5 hash associated to the binary is a very good idea in order to avoid mix-up ) and spend some time to debug the scripts. The runsolver utility from Olivier Roussel (available here) is a very nice tool that can take care of many delicate details in process accounting and resource management. I added a small patch to be able to specify a specific signal as warning signal. The code is in my private git repository : git clone http://mancoosi.org/~abate/repos/runsolver.git/ . This is the actual code we used for the competition. The 32 bit binary is available in the svn. All in all it was a great experience.

The results of the competition are published here .