Subsection 6.1 is from [MBC+06], Subsection 6.3 is from [CDL+06]. Subsections 6.2 and 6.4 haven’t been published before.
In this section we show some experimental results that we have gathered by analyzing with our tools two of the most famous GNU/Linux-based distributions: Debian GNU/Linux [Pro] and Mandriva Linux [Man].
The reference package repositories are a snapshot of the complete Debian pool located at http://ftp.debian.org/pool and the packages distributed with the Mandriva 2006 Edition.
Out of the 4211 packages in the main part of the Mandriva 2006 distribution, 42 are not installable. In 38 cases, this is due to a dependency that is not available; in the case of four packages, mozilla-thunderbird-enigmail-{de,es,fr,it}, the problem is a simultaneous dependency and conflict with the package mozilla-thunderbird-enigmail.
The Debian pool snapshot contains 34701 packages, described in a Debian Control File [Theb] of almost 30Mb of metadata defining, among other things, almost 200.000 package relationships (dependencies and conflicts). Inserting new information that can possibly break the consistency of the pool and finding what is the source of the problem that lead to that is a difficult task if not supported by automatic tools for the analysis.
Figure 15 shows an excerpt of the output generated by our debcheck tool after the analysis of the previously described Debian pool snapshot. The result of this analysis is that for 123 packages there are no possible way to install them. In particular, 111 of them are not installable because of a missing dependency (i.e., there is no package available in the pool that could satisfy a dependency relationship). The other 12 are more interesting: they are not installable because the specified dependency relationships induce an unavoidable conflict.
For example, the package cacti-cactid version 0.8.6e-2 depends on libsnmp5, version 5.2.1.2 or greater; but the only appropriate version of libsnmp5 in the pool, 5.2.1.2-2, conflicts with all versions of cacti-cactid up to and including 0.8.6e-2. In this case either there is some problem in the specification of the package relationships or there are some packages that have been removed from the pool (e.g., other versions of libsnmp5) leaving it in an inconsistent state.
Such kind of information gives the distribution editor a better understanding of what is going on when performing operations on the package metadata or on the repositories.
cacti-cactid (= 0.8.6e-2): FAILED The following constraints cannot be satisfied: cacti-cactid (= 0.8.6e-2) conflicts with libsnmp5 (= 5.2.1.2-2) cacti-cactid (= 0.8.6e-2) depends on libsnmp5 (>= 5.2.1.2) {libsnmp5 (= 5.2.1.2-2)}
Figure 15: Excerpt of the debcheck output on the Debian pool packages
The processing speed is very encouraging and this makes the tools a valid and effective aid to the distribution editor (the machine used for the tests is a single-processor Intel Xeon 3.4 GHz machine running Mandriva Linux). Checking the Mandriva package repository took 55 seconds with the SAT solver and 8 seconds with the standalone rpmcheck tool. Checking the Debian pool snapshot took 18 minutes and 13 seconds with the SAT solver and 43 seconds with the standalone debcheck tool. The timing difference is largely due to the fact the rpmcheck and debcheck tools parse the package metadata only once and produce the constraints associated to all the package satisfiability problems in a single run, while the toolchain generate one distinct subproblem, and calls the generic SAT solver, for each package.
This is an indication of how long it takes to complete some operations on our project server, which is a single-processor Intel Xeon 3.4 GHz machine running Mandriva Linux. The figures for rpmcheck/debcheck include time for parsing, the figures for SAT do not.
Operation | User time |
Parsing | |
Mandriva 2006 | 13s |
Debian snapshot | 30s |
Cone extraction | |
Mandriva 2006 | 4m06s |
Debian snapshot | 27m40s |
Installability checks | |
Mandriva 2006 with SAT | 55s |
Mandriva 2006 with rpmcheck | 8s |
Debian snapshot with SAT | 18m13s |
Debian snapshot with debcheck | 43s |
In parallel with our formal complexity and algorithmic investigations, we also performed some empirical measurements on the Debian and Mandriva distributions, to try and grasp the practical complexity of the problems.
The following diagram gives a histogram showing the number of packages as a function of the size of the dependency closure, from the Debian stable, unstable and testing pools on 2005-12-13, which has 31149 packages. The average closure size is 158; 50% of the packages have a closure size of 71 or less, 90% of 372 or less, and 99% of 1077 or less. These numbers show that naive combinatorial algorithms, exponential in the size of the dependency closure, are clearly out of the question.
edos-debcheck is currently used to monitor the state of Debian’s distributions (unstable, testing, stable), as well as some custom distributions (currently Skolelinux) or not officially released distributions (for instance architecture/OS pairs that are based on the kFreeBSD kernel). The results of the analysis are available at http://edos.debian.net/edos-debcheck.
There are different reasons why non-installable packages actually exist in these distributions. One important reason is that most of the binary packages are architecture dependent, that is there is one package per architecture. As a consequence, when accessing the reasons for non-installability of packages we have to take into account a vector of repositories, the entries of which are indexed by the various architectures.
The meta-data of a binary package are generated during the package compilation from the meta-data in the source package, and may depend on the actual compilation environment or conditional code in the source package. As a consequence, the metadata of a package with the same unit and version may vary from architecture to architecture.
As a consequence, transient non-installability errors are normal in the unstable distribution. Persistent errors, however, indicate a potential problem.
This happens typically when a source package generates several binary packages, one of which is architecture dependent and one is not. This is a recommended practice in case a package contains large architecture-independent parts, like data or documentation. In this case splitting off an architecture independent package saves resources all along the deployment chain. Hence, there often is a good technical reason behind a non-installable architecture-independent package.
Packages which aren’t installable on any of the architectures of a distribution are more likely due to an error. This may happen with packages that are installable in some architecture that has been part of a distribution in the past, but which has been removed since then. Another possible reason is dependence on a package that had to be removed from a distribution, for instance due to licensing problems or grave bugs.