6  Experimental results

Subsection 6.1 is from [MBC+06], Subsection  6.3 is from [CDL+06]. Subsections 6.2 and 6.4 haven’t been published before.

In this section we show some experimental results that we have gathered by analyzing with our tools two of the most famous GNU/Linux-based distributions: Debian GNU/Linux [Pro] and Mandriva Linux [Man].

The reference package repositories are a snapshot of the complete Debian pool located at http://ftp.debian.org/pool and the packages distributed with the Mandriva 2006 Edition.

6.1  Non-installable Packages

Out of the 4211 packages in the main part of the Mandriva 2006 distribution, 42 are not installable. In 38 cases, this is due to a dependency that is not available; in the case of four packages, mozilla-thunderbird-enigmail-{de,es,fr,it}, the problem is a simultaneous dependency and conflict with the package mozilla-thunderbird-enigmail.

The Debian pool snapshot contains 34701 packages, described in a Debian Control File [Theb] of almost 30Mb of metadata defining, among other things, almost 200.000 package relationships (dependencies and conflicts). Inserting new information that can possibly break the consistency of the pool and finding what is the source of the problem that lead to that is a difficult task if not supported by automatic tools for the analysis.

Figure 15 shows an excerpt of the output generated by our debcheck tool after the analysis of the previously described Debian pool snapshot. The result of this analysis is that for 123 packages there are no possible way to install them. In particular, 111 of them are not installable because of a missing dependency (i.e., there is no package available in the pool that could satisfy a dependency relationship). The other 12 are more interesting: they are not installable because the specified dependency relationships induce an unavoidable conflict.

For example, the package cacti-cactid version 0.8.6e-2 depends on libsnmp5, version 5.2.1.2 or greater; but the only appropriate version of libsnmp5 in the pool, 5.2.1.2-2, conflicts with all versions of cacti-cactid up to and including 0.8.6e-2. In this case either there is some problem in the specification of the package relationships or there are some packages that have been removed from the pool (e.g., other versions of libsnmp5) leaving it in an inconsistent state.

Such kind of information gives the distribution editor a better understanding of what is going on when performing operations on the package metadata or on the repositories.


      cacti-cactid (= 0.8.6e-2): FAILED
      The following constraints cannot be satisfied:
      cacti-cactid (= 0.8.6e-2) 
      conflicts with libsnmp5 (= 5.2.1.2-2)
      cacti-cactid (= 0.8.6e-2) 
      depends on libsnmp5 (>= 5.2.1.2) 
               {libsnmp5 (= 5.2.1.2-2)}
    
Figure 15: Excerpt of the debcheck output on the Debian pool packages

The processing speed is very encouraging and this makes the tools a valid and effective aid to the distribution editor (the machine used for the tests is a single-processor Intel Xeon 3.4 GHz machine running Mandriva Linux). Checking the Mandriva package repository took 55 seconds with the SAT solver and 8 seconds with the standalone rpmcheck tool. Checking the Debian pool snapshot took 18 minutes and 13 seconds with the SAT solver and 43 seconds with the standalone debcheck tool. The timing difference is largely due to the fact the rpmcheck and debcheck tools parse the package metadata only once and produce the constraints associated to all the package satisfiability problems in a single run, while the toolchain generate one distinct subproblem, and calls the generic SAT solver, for each package.

6.2  Benchmarks

This is an indication of how long it takes to complete some operations on our project server, which is a single-processor Intel Xeon 3.4 GHz machine running Mandriva Linux. The figures for rpmcheck/debcheck include time for parsing, the figures for SAT do not.

OperationUser time
Parsing
Mandriva 200613s
Debian snapshot30s
Cone extraction
Mandriva 20064m06s
Debian snapshot27m40s
Installability checks
Mandriva 2006 with SAT55s
Mandriva 2006 with rpmcheck8s
Debian snapshot with SAT18m13s
Debian snapshot with debcheck43s

6.3  Empirical measurements

In parallel with our formal complexity and algorithmic investigations, we also performed some empirical measurements on the Debian and Mandriva distributions, to try and grasp the practical complexity of the problems.

The following diagram gives a histogram showing the number of packages as a function of the size of the dependency closure, from the Debian stable, unstable and testing pools on 2005-12-13, which has 31149 packages. The average closure size is 158; 50% of the packages have a closure size of 71 or less, 90% of 372 or less, and 99% of 1077 or less. These numbers show that naive combinatorial algorithms, exponential in the size of the dependency closure, are clearly out of the question.





6.4  Experiences with edos-debcheck in the Debian project

edos-debcheck is currently used to monitor the state of Debian’s distributions (unstable, testing, stable), as well as some custom distributions (currently Skolelinux) or not officially released distributions (for instance architecture/OS pairs that are based on the kFreeBSD kernel). The results of the analysis are available at http://edos.debian.net/edos-debcheck.

There are different reasons why non-installable packages actually exist in these distributions. One important reason is that most of the binary packages are architecture dependent, that is there is one package per architecture. As a consequence, when accessing the reasons for non-installability of packages we have to take into account a vector of repositories, the entries of which are indexed by the various architectures.

The meta-data of a binary package are generated during the package compilation from the meta-data in the source package, and may depend on the actual compilation environment or conditional code in the source package. As a consequence, the metadata of a package with the same unit and version may vary from architecture to architecture.

  • The unstable distribution is in fact the staging ground for building releasable distributions. Packages that depend on each other enter this distribution in an arbitrary order which depends on when a developer uploads a package, or on when a package is compiled and uploaded by an autobuilder (these are daemons that compile packages for the various architectures). For instance, package a may depend on package b, and the developer of a uploads a package for the architecture i386 while the developer of b uploads his package for amd64 (he should have tested package b using a locally built binary package of a on amd64). In this case, a is uninstallable in the repository for i386 until the i386 autobuilder daemon uploads the binary package for b.

    As a consequence, transient non-installability errors are normal in the unstable distribution. Persistent errors, however, indicate a potential problem.

  • A package a may depend on package b, but b is not available on all architectures a is available on. This may be due to the fact that there is a problem with compiling b on some architectures, or that a has a too liberal architecture specification.
  • A special case of the latter is that a has its architecture set to all. This indicates a binary package that is in fact the same on all architectures, and hence exists only once in the package pool. Package a may, however, depend on a package b which is architecture dependant but does not exist for every architecture.

    This happens typically when a source package generates several binary packages, one of which is architecture dependent and one is not. This is a recommended practice in case a package contains large architecture-independent parts, like data or documentation. In this case splitting off an architecture independent package saves resources all along the deployment chain. Hence, there often is a good technical reason behind a non-installable architecture-independent package.

    Packages which aren’t installable on any of the architectures of a distribution are more likely due to an error. This may happen with packages that are installable in some architecture that has been part of a distribution in the past, but which has been removed since then. Another possible reason is dependence on a package that had to be removed from a distribution, for instance due to licensing problems or grave bugs.