apt-get / aptitude test upgrades

After reading this interesting blog post from Petter Reinholdtsen, I’ve decided to repeat his experiments and save the results in with dudf-save . Using the Petter’s script, I’ve created a lenny schroot, installed mancoosi-contest and the run apt-get and aptitide in simulation mode to create and upload the dudf to mancoosi.debian.net.

For example : http://mancoosi.debian.net/dudf/file/?uid=adf7b774-9af8-11df-bc37-00163e46d37a is the dudf report for the upgrade of gnome and
http://mancoosi.debian.net/dudf/file/?uid=8222799a-9af8-11df-8b50-00163e46d37a for the upgrade of kde from lenny to squeeze (2010-07-28).

I’ll repeat these tests from time to time. The idea would be to find upgrade problems, but in particular to compare apt-get / aptitude results with other solvers.


using apt-get and aptitude with cudf

apt-get and aptitide were two missing competitors of the misc competition. However it is important and interesting how these two tools compete against other solvers submitted to MISC. In this post I want to present two simple tools to convert cudf documents to something that apt-get based tools can handle. Cudf and debian share many characteristics but also have important semantic differences. One important difference is about installing multiple versions of the same package. Since this is allowed in cudf, but not in debian, we can use apt-get and aptitude only to solver cudf problems that respect this constraint, ruling out, for example, all cudfs from the rpm world. Another difference to take care is about the request semantic. In cudf, request can contain version constraints. For example, one can ask to upgrade the package wheel to a version greater then 2. Since it is not possible to translate directly this request in cudf we are forced to add a dummy package encoding the disjunction of all packages that respect this constraint. This problem does not arise with remove request as the refer always to the currently installed package.

Apt-get needs two files : The Packages file that contains the list of all packages known to the meta-installer and the status file that contains the list of packages that must result currently installed. To generate these files I wrote a small utility using the dose3 framework imaginatively called cudftodeb . This tool gets a cudf and produces three files : Packages, status and Request with the Request file containing the list of files to install or remove in a syntax compatible with apt-get .

In other to run apt-get/aptitude with these files, you would need a simple bash script. You can find details here for apt-get and here for aptitude. Most important option is the -s used to simulate an installation.

With the -v option of apt-get we can generate a parsable solution. This output is the piped through an other tool called aptgetsolutions in order to produce a cudf solution closing the circle.

For example, this is the trace produced by aptitude when trying to solve the legacy.cudf problem :

Reading package lists...
Building dependency tree...
Reading extended state information...
Initializing package states...
Reading task descriptions...
The following NEW packages will be installed:
  bicycle dummy_wheel electric-engine{b} glass{a} window{a}
The following packages will be upgraded:
  door wheel
2 packages upgraded, 5 newly installed, 0 to remove and 1 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
The following packages have unmet dependencies:
  gasoline-engine: Conflicts: engine which is a virtual package.
  electric-engine: Conflicts: engine which is a virtual package.
The following actions will resolve these dependencies:

     Remove the following packages:
1)     gasoline-engine

The following NEW packages will be installed:
  bicycle dummy_wheel electric-engine glass{a} window{a}
The following packages will be REMOVED:
  gasoline-engine{a}
The following packages will be upgraded:
  door wheel
2 packages upgraded, 5 newly installed, 1 to remove and 0 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
WARNING: untrusted versions of the following packages will be installed!

Untrusted packages could compromise your system's security.
You should only proceed with the installation if you are certain that
this is what you want to do.

  wheel bicycle dummy_wheel door glass electric-engine window

*** WARNING ***   Ignoring these trust violations because
                  aptitude::CmdLine::Ignore-Trust-Violations is 'true'!
Remv gasoline-engine [1] [car ]
Inst bicycle (7 localhost) [car ]
Inst glass (2 localhost) [car ]
Inst window (3 localhost) [car ]
Inst door [1] (2 localhost) [car ]
Inst wheel [2] (3 localhost) [car ]
Inst dummy_wheel (1 localhost) [car ]
Inst electric-engine (1 localhost)
Conf bicycle (7 localhost)
Conf glass (2 localhost)
Conf window (3 localhost)
Conf door (2 localhost)
Conf wheel (3 localhost)
Conf dummy_wheel (1 localhost)
Conf electric-engine (1 localhost)

Not the package dummy_wheel used to encode the upgrade request of wheel>>2. This dummy package encodes the request as a dependency :

Package: dummy_wheel
Version: 1
Architecture: i386
Depends: wheel (= 3)
Filename: /var/fakedummy_wheel1

One last remark about apt-get. I just run on this bug today using an old version of apt-get that is shipped with lenny. For our experiments we are using only the latest version of apt-get in debian testing.


misc 2010, how to run a solver competition

One of the goals of the project Mancoosi is to get together researcher from various disciplines to advance the state of art of package managers. To this end, we organized an sat solving competition specifically tailored to upgrade/installation problems. The winner of the competition was announced during the workshop lococo hosted at the international conference FLOC the 10th of july 2010. I spent several hours preparing the infrastructure for the competition and here I’d like to give a brief account of my experience. This work was done together with Ralf Treinen, Roberto Di Cosmo and Stefano Zacchiroli.

Input - cudf documents

The input for the solvers in the competition are documents in the cudf format. In the last year we collected a number of cudf documents from the community (namely, with the help of mandriva, caixa magica and debian). These documents are stored on the mancoosi.org servers and can be used to train meta installers on particularly difficult problems. We used 20 of such documents (from debian) for the misc 2010 competition. Unfortunately, due to problems in the conversion between dudf (a distribution specific format use to collect upgrade problems) and cudf, we were not able to use neither problems from mandriva or caixa magica.

Other then these real problems, we generated a number of artificial problems built from debian repositories. The utility we used is called randcudf and it is freely available on the mancoosi website, part of the dose3framework. We kept a number of variables into consideration in order to generate difficult problems but not so far away from the reality of every day use.

Among these parameters are the size of the package universe the number of packages that are declared as already installed in the universe (status) the number of packages to install / remove / upgrade the probability of generating install / remove requests with a version constraint the number of packages declared as installed but whose dependencies might not be satisfied the number of packages marked as keep and the type of keep (version or package)

Playing around with these variables we were able to produce problems of different size and different degree of complexity. During the competition, for example, the three categories had respectively a universe with 30k , 50K and 100K packages. Moreover we discarded all problems that do not have a solution at all.

From our experience during the problem selection, considering over 30K packages, if extremely easy to generate cudf problems that do not have a solution at all. For example in debian lenny there are 17K packages connected by a kernel of 80 conflicts. This configuration produce around 5K strong conflicts. This means that if we pick two packages among these 17K there is a high probability that these two packages are in conflict. This is because of the high level of inter-dependencies of open source distributions. With bigger remove/install requests this probability grows even bigger. Since the goal was to provide random problems as close as possible to reality our documents have a request to install at most 10 packages and remove 10 packages at the same time.

The five categories used in the competition :

  • cudf_set : Encoding in cudf of 1-in-3-SAT
  • debian-dudf : Real installation problems, Problems collected via dudf-save.
  • easy : Debian unstable / desktop installation from unstable / 10 install - 10 remove
  • difficult : Debian stable + unstable / server installation from stable / 10 install - 10 remove - 1 upgrade all
  • impossible : Debian oldstable, stable, testing, unstable / server installation from oldstable / 10 install - 10 remove - 1 upgrade all

Execution

The execution environment of the competition was set up in a dedicated virtual machine running on Xen. This VM respects the specification given in the rules of the competition. We did two small mistakes (to be fixed in the next version of the competition). First we did not specify the OS running in the virtual machine. To reproduce the results, everybody should be able to replicate the execution environment and re-run the competition. For Misc2010, we used a plain debian lenny (plus security updates). We tried to maintain the strictly minimum all additional software.

Starting from a base system (as generated by debootstrap) we added : subversion (used to update the execution environment from the mancoosi svn) git (used to register various versions of the submitted solvers) * sun-java (from non-free)

The second mistake was to not specify exactly the java version. open-java has subtle differences from sun-java and it seems these differences created a few problems for one of the participants. This problem was quickly rectified.

Running the competition

To run the competition I wrote few simple bash scripts and test cases. The test cases were meant to test the execution environment and to be sure that all constraints were correctly enforced. The execution environment is available in the mancoosi svn. In practice, we run the competition in four phases.

Phase One

In the first one we deployed all solvers in the execution environment. In order to cleanup the solver directory and “start fresh” after every invocation, I created an empty git repository for every solver. After each invocation, the repository was cleaned-up using

git clean -f -d
git reset --hard
find /tmp/ -user misc2010 -exec rm -Rf {} \;

the last line is to make sure that no temporary files was left in the temporary directory.

Phase Two

In the second phase, we actually run the competition. The script used is runcomp.sh. It takes 3 arguments, the list of solvers, the list of problems and a timeout in seconds. Since we used the same set of problems for the trendy and paranoid track we run the competition only once for both tracks. The output of the runcomp.sh script is a directory (i.e. tmp/201007060918) with all the raw results of the competition. All raw data is publicly available here .

Phase Three

In third phase we compute the aggregate results by track using the script recompute.sh. This script takes 4 arguments: the list of all solvers in one track, the list of problems (the same used before), the timestamp of the last run (ex 201007060918) and the name of the track. The output of this script is a file containing all the aggregate results, one per line, of the form category, problem, solver, time, results. For example a snippet from this file looks like :

cudf_set huge1.cudf apt-pbo-paranoid-1.0.5 - ABORT
cudf_set huge1.cudf p2cudf-paranoid-1.6 2.28814 FAIL
cudf_set huge1.cudf uns-paranoid-0.0002 - ABORT
cudf_set huge1.cudf ucl-cprel-1.0 0 FAIL
cudf_set huge1.cudf aspcud-paranoid-1.0 - ABORT
cudf_set huge1.cudf inescp-1.0 1.94812 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf apt-pbo-paranoid-1.0.5 11.75 SUCCESS -18,-21
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf p2cudf-paranoid-1.6 6.06838 SUCCESS -16,-34
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf uns-paranoid-0.0002 - ABORT
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf ucl-cprel-1.0 0 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf aspcud-paranoid-1.0 14.4089 SUCCESS -15,-29
...

Results are generated using the solution checker .

Phase Four

The last step is the classification of the solutions. The misc-classifier gets as input the aggregates results and outputs the html tables that will be then published on the web.

Conclusions

Running a solver competition is not as easy as it seems. To get it right we run an internal competition in january 2009 that helped us to highlight, understand and solve different problems. It is mostly a matter of writing down the rules, specify a clear and understandable protocol for the solver submission (for example asking to version their solver and a md5 hash associated to the binary is a very good idea in order to avoid mix-up ) and spend some time to debug the scripts. The runsolver utility from Olivier Roussel (available here) is a very nice tool that can take care of many delicate details in process accounting and resource management. I added a small patch to be able to specify a specific signal as warning signal. The code is in my private git repository : git clone http://mancoosi.org/~abate/repos/runsolver.git/ . This is the actual code we used for the competition. The 32 bit binary is available in the svn. All in all it was a great experience.

The results of the competition are published here .


rssh and suid mode

I use rssh to allow restricted shell access to my servers. a few weeks ago I’ve noticed a lot of errors in my log of the form

Jun 22 11:04:06 dev rssh[13508]: setting log facility to LOG_USER
Jun 22 11:04:06 dev rssh[13508]: setting umask to 022
Jun 22 11:04:06 dev rssh[13508]: line 38: configuring user xxxx
Jun 22 11:04:06 dev rssh[13508]: setting xxxx's umask to 022
Jun 22 11:04:06 dev rssh[13508]: allowing rsync to user xxxx
Jun 22 11:04:06 dev rssh[13508]: chrooting xxxx to /home/xxxx
Jun 22 11:04:06 dev rssh[13508]: chroot cmd line: /usr/lib/rssh/rssh_chroot_helper 5 "rsync --server --sender -lde.L . "
Jun 22 11:04:06 dev rssh_chroot_helper[13508]: new session for xxxx, UID=1000
Jun 22 11:04:06 dev rssh_chroot_helper[13508]: chroot() failed, 5: Operation not permitted

It turned out a problem with a recent security update that removed the set user id from /usr/lib/rssh/rssh_chroot_helper. Dpkg has a nice way to make permanent such changes, that is dpkg-stateoverride. It simply boils down to:

#dpkg-statoverride --add root root 4755 /usr/lib/rssh/rssh_chroot_helper
#dpkg-statoverride --update
# stat  /usr/lib/rssh/rssh_chroot_helper
  File: `/usr/lib/rssh/rssh_chroot_helper'
  Size: 24904       Blocks: 56         IO Block: 4096   regular file
Device: 801h/2049d  Inode: 21326498    Links: 1
Access: (4755/-rwsr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2010-06-22 12:03:17.170460092 +0200
Modify: 2010-04-04 02:07:37.000000000 +0200
Change: 2010-06-22 12:02:23.040115785 +0200

xslt : namespace, move nodes, cdata

Date Tags xml / xslt

Every now and then I have to re-learn xslt to transform xml documents. My mantra is to use the right tool for the right job… so I’m here again struggling with xslt. There is a bit of a love/hate relation between me and this language. I’ve used a lot during my master thesis transforming an enormous - industry size - software specification (in SDL) to a formal language. This was a very painful way of learning a new language, After that I just used it to perform small tasks, but I always manage to forget part of the specification…

Anyway, just to avoid repeating this error again, the lesson today is about xml namespaces. | Here you can find a lengthy explanation about namespace and all possible related problems to xslt. I stumbled on a very simple case. Why my xslt stylesheet does not work in the presence of namespace ?

A small motivating example. Suppose you have a very simple xml document

<doc xmlns="http://mynamespace.org">
  <a><b>aaa</b><c>bbb</c></a>
</doc>

and you want to transform it in a different xml document as :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><c>bbb</c></a>
<b xmlns="http://mynamespace.org"><![CDATA[aaa]]></b></newdoc>

Therefore you want to - change the document root (and namespace) - copy all the content of the element but the element <code lang="xml"><b> - move the element below <code lang="xml"><a> - embed the text content of ``` in a CDATA section.

We go one step at the time. First we transform the root element and we change the default namespace. The header of the xsl file is standard except for the declaration of the attribute xmlns:ns="http://mynamespace.org". Since our source xml document as a namespace, we have to match elements in this namespace. In other to do so, we associate a labelnsto the namespacehttp://mynamespace.organd the we use it to match the element <code lang="xml"><doc>.

<?xml version="1.0"?>
<xsl:transform
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
  xmlns:ns="http://mynamespace.org">

The second part is the standard way of copying nodes and contents…

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

In the third part we match the root element, we create a new element and we copy everything. Since ``` does not have a select it applies by default to all nodes inside the root element.

<xsl:template match="ns:doc">
  <xsl:element name="newdoc" namespace="http://mynamespace.org">
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>

</xsl:transform>

The result :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><b>aaa</b><c>bbb</c></a>
</newdoc>

Now we want to move the element . We add a new template to match <code lang="xml"><a> and copy all its content except the node ```

<xsl:template match="ns:a">
  <xsl:copy>
    <xsl:apply-templates select="*[not(self::ns:b)]"/>
  </xsl:copy>
</xsl:template>

This template being more specific then the default template we specified at the beginning of the document will be applied to . Then we have to modify the template for the root element in order to copy the content of <code lang="xml"><a> and the the content of ``` below.

<xsl:template match="ns:doc">
  <xsl:element name="newdoc" namespace="http://mynamespace.org">
    <xsl:apply-templates/>
    <xsl:apply-templates select="ns:a/ns:b"/>
  </xsl:element>
</xsl:template>

This will give something like this :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><c>bbb</c></a>
<b xmlns="http://mynamespace.org">aaa</b></newdoc>

Last we want to embed the content of ``` in a cdata section. This is easy as we just need to add an output directive at the beginning of the file. The complete example :

<?xml version="1.0"?>
<xsl:transform
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
  xmlns:ns="http://mynamespace.org">

<xsl:output method="xml" indent="yes" cdata-section-elements="ns:b" />

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="ns:a">
  <xsl:copy>
    <xsl:apply-templates select="*[not(self::ns:b)]"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="ns:doc">
  <xsl:element name="newdoc" namespace="http://othernamespace.org">
    <xsl:apply-templates/>
    <xsl:apply-templates select="ns:a/ns:b"/>
  </xsl:element>
</xsl:template>

</xsl:transform>
```xml

And the final result :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><c>bbb</c></a>
<b xmlns="http://mynamespace.org"><![CDATA[aaa]]></b></newdoc>