distcheck vs edos-debcheck
This is the second post about distcheck. I want to give a quick overview of the differences between edos-distcheck and the new version. First despite using the same sat solver and encoding of the problem, Distcheck has been re-written from scratch. Dose2 has several architectural problems and not very well documented. Adding new features had become too difficult and error-prone, so this was a natural choice (at least for me). Hopefully Dose3 will survive the Mancoosi project and provide a base for dependency reasoning. The framework is well documented and the architecture pretty modular. It’s is written in ocaml, so sadly, I don’t expect many people to join the development team, but we’ll be very open to it.
These are the main differences with edos-debcheck .
Performances
distcheck is about two times faster than edos-debcheck (from dose2), but it is a “bit” slower then debcheck (say the original debcheck), that is the tool wrote by Jerome Vouillon and that was then superseded in debian by edos-debcheck. The original debcheck was a all-in-one tool that did the parsing, encoding and solving without converting the problem to any intermediate format. distcheck trades a bit of speed for generality. Since it is based on Cudf, it can handle different formats and can be easily adapted in a range of situation just by changing the encoding of the original problem to cudf.
Below there are a couple of test I’ve performed on my machine (debian unstable). The numbers speak alone.
$time cat tmp/squeeze.packages | edos-debcheck -failures > /dev/null
Completing conflicts... * 100.0%
Conflicts and dependencies... * 100.0%
Solving * 100.0%
real 0m19.515s
user 0m19.193s
sys 0m0.276s
$time ./distcheck.native -f deb://tmp/squeeze.packages > /dev/null
real 0m10.859s
user 0m10.669s
sys 0m0.172s
Input
The second big difference is about different input format. In fact, at the moment, we have two different tools in debian, one edos-debcheck and the other edos-rpmcheck. Despite using the same underlying library these two tools have different code bases. distcheck basically is a multiplexer that convert different inputs to a common format and then uses it (agnostically) to solve the installation problem. It can be called in different ways (via symlinks) to behave similarly to its predecessors.
At the moment we are able to handle 5 different formats
deb:// Packages 822 format for debian based distributions
hdlist:// a binary format used by rpm based distribution
synth:// a simplified format to describe rpm based package
repositories
eclipse:// a 822 based format that encoded OSGi plugings metadata
cudf:// the native cudf format
distcheck handles gz and bz2 compressed file transparently . However if you care about performances, you should decompress your input file first and the parse it with distcheck and it often takes more time to decompress the file on the fly that run the installability test itself. There is also an experimental database backend that is not compiled by default at them moment.
Output
Regarding the output, I’ve already explained the main differences in an old post. As a quick reminder, the old edos-debcheck had two output options. The first is a human readable - unstructured output - that was a handy source of information when running the tool interactively. The second was a xml based format (without a dtd or a schema, I believe) that was used for batch processing.
distcheck has only one output type in the YAML format that aims to be human and machine readable. Hopefully this will cater for both needs. Moreover, just recently I’ve added the output of distcheck a summary of who is breaking what. The output of edos-debcheck was basically a map of packages to the reasons of the breakage. In addition to this information distcheck gives also a maps between reason (a missing dependency or a conflict) to the list of packages that are broken by this problem.This additional info is off by default, but I think it can be nice to know what is the missing dependency that is responsible for the majority of problems in a distribution…
For example, calling distcheck with —summary :
$./distcheck.native --summary deb://tests/sid.packages
backgroud-packages: 29589
foreground-packages: 29589
broken-packages: 143
missing-packages: 138
conflict-packages: 5
unique-missing-packages: 52
unique-conflict-packages: 5
summary:
-
missing:
missingdep: libevas-svn-05-engines-x (>= 0.9.9.063)
packages:
-
package: enna-dbg
version: 0.4.0-4
architecture: amd64
source: enna (= 0.4.0-4)
-
package: enna
version: 0.4.0-4
architecture: amd64
source: enna (= 0.4.0-4)
-
missing:
missingdep: libopenscenegraph56 (>= 2.8.1)
packages:
-
package: libosgal1
version: 0.6.1-2+b3
architecture: amd64
source: osgal (= 0.6.1-2)
-
package: libosgal-dev
version: 0.6.1-2+b3
architecture: amd64
source: osgal (= 0.6.1-2)
Below I give a small example of the edos-debcheck output compared to the new yaml based output.
$cat tests/sid.packages | edos-debcheck -failures -explain
Completing conflicts... * 100.0%
Conflicts and dependencies... * 100.0%
Solving * 100.0%
zope-zms (= 1:2.11.1-03-1): FAILED
zope-zms (= 1:2.11.1-03-1) depends on missing:
- zope2.10
- zope2.9
zope-tinytableplus (= 0.9-19): FAILED
zope-tinytableplus (= 0.9-19) depends on missing:
- zope2.11
- zope2.10
- zope2.9
...
And an extract from the distcheck output (the order is different. I cut and pasted parts of the output here…)
$./distcheck.native -f -e deb://tests/sid.packages
report:
-
package: zope-zms
version: 1:2.11.1-03-1
architecture: all
source: zope-zms (= 1:2.11.1-03-1)
status: broken
reasons:
-
missing:
pkg:
package: zope-zms
version: 1:2.11.1-03-1
architecture: all
missingdep: zope2.9 | zope2.10
-
package: zope-tinytableplus
version: 0.9-19
architecture: all
source: zope-tinytableplus (= 0.9-19)
status: broken
reasons:
-
missing:
pkg:
package: zope-tinytableplus
version: 0.9-19
architecture: all
missingdep: zope2.9 | zope2.10 | zope2.11
...
Future
The roadmap to release version 1.0 of distcheck is as follows:
add background and foreground package selection. This feature will
allow the use to specify a larger universe (background packages), but
check only a subset of this universe (foreground packages). This
should allow users to select packages using grep-dctrl and then pipe
them to discheck . At the moment we can select individual packages on
the command line or we can use expression like bash (<= 2.7)
to
check all version of bash in the universe with version greater than 2.7.
code cleanup and a bit of refactoring between distcheck and
buildcheck (that is a frontend for distcheck that allow us to report broken build dependencies)
consider essential packages while performing the installation test.
Here there are few things we have to understand, but the idea would be to detect possible problems related the implicit presence of essential packages in the distribution. At the moment, distcheck performs the installation test in the empty universe, while ideally, the universe should contain all essential packages.
finish the documentation. The effort in underway and we hope to
finalize shortly to release the debian package in experimental.