Learning from the Future of Component Repositories - CBSE 2012

Learning from the Future of Component Repositories ( Pietro Abate, Roberto Di Cosmo, Ralf Treinen and Stefano Zacchiroli ) has been accepted to be presented at CBSE 2012 (26-28 June, Bertinoro, Italy)

Abstract

An important aspect of the quality assurance of large component repositories is the logical coherence of component metadata. We argue that it is possible to identify certain classes of such problems by checking relevant properties of the possible future repositories into which the current repository may evolve. In order to make a complete analysis of all possible futures effective however, one needs a way to construct a finite set of representatives of this infinite set of potential futures. We define a class of properties for which this can be done.

We illustrate the practical usefulness of the approach with two quality assurance applications: (i) establishing the amount of “forced upgrades* induced by introducing new versions of existing components in a repository, and (ii) identifying outdated components that need to be upgraded in order to ever be installable in the future. For both applications we provide experience reports obtained on the Debian distribution.

The tools presented in this paper (outdated and challenges) are already in Debian as part of the ‘dose-extra’ package.

Update

For the second year in a raw our paper won the Best Paper Award for the CBSE 2012 conference !!!

Update 2

I presented this paper at cbse2012. The slides of my presentations are attached.


mancoosi tools in debian testing

Finally the dose3 libraries and tools landed in testing this weekend. We solved a couple of bugs already and it seems nobody complained too loudly. If you used the edos tools in the past you might be interested to check out our new tools in the package dose-extra.

Actually @mancoosi we will be delighted to ear about you experience with our tools and how to make them better and more useful. Please drop me a line !

The next major release of dose will be multi-arch aware and provide performance improvements and other minor features.

If you missed it, and you are now curious, I delivered a talk at fosdem regarding our tools:

  • QA tools for FOSS distributions (FOSDEM 2012) - Pietro Abate (video)

A big thanks to ralf of course for packaging everything !


Dependency order matters !

Recently I’ve discovered a subtle consequence of how the order in which dependencies are specified in debian actually matters. While re-factoring the code of dose3, I changed the order in which dependencies are considered by our sat solver (of edos-fame) . I witnessed a twofold performance loss just by randomizing how variables were presented to our sat solver. This highlights, on one hand, how our solver is strongly dependent on the structure of the problem and, on the other hand the standard practice of debian maintainers to assign an implicit priority in the disjunctive dependencies where the first is the most preferred packages (and maybe the most tested, at least dependency-wise).

The basic idea of distcheck is to encode the dependencies information contained in a Packages file in CNF format and then to feed them to a sat solver to find out if a package has broken dependencies or if its dependencies are such that no matter what, it would be impossible to install this package on a user machine.

Conflicts are encoded as binary clauses. So if package A conflicts with package B, I add a constraint they says “not (A and B)” , that is A and B cannot be considered together. The dependencies encoding associates to each disjunction of the depends field a clause that says “A implies B”. If a package foo depends on A,B | C,D , I’ll add “foo implies A and B” & “foo implies C and D” . This encoding is pretty standard and it is easy to understand.

The problem is how the sat solver will search for a solution to the problem “Is is possible to install package foo in an empty repository”. The solver we use is very efficient and can easily deal with 100K packages or more. But in general is not very good at dealing with random CNF instances. The reason because edos-debcheck is so efficient lies in the way it exploits the structure of the sat problems.

The goal of a sat solver is to find a model (that is a variable assignment list) that is compatible with the given set of constraints. So if my encoding of the debian repository is a set of constraints R, the installability problem boils down to add an additional constraint to R imposing that the variable associated to the package foo must be true, and then ask the solver to find a model to make this possible. This installation, in sat terms, would be just an array of variables that must be true in order to satisfy the given set of constraints.

If you look at the logic problem as a truth table, the idea is to find a row in this table. This is the solution of your problem. Brute force of course is not an option and modern sat solvers use a number of strategies and heuristic to guide the search in the most intelligent way possible. Some of them try to learn from previous attempts, some of them, when they are lost try to pick a random variable to proceed.

If we consider problems that have a lot of structure, award winning sat solver do not back track very much. By exploiting the structure of the problem, their algorithm allows them to considerably narrow down the search only to those variables that are really important to find a solution.

All this long introduction was to talk about the solver that is currently used in edos-debcheck and distcheck (that is a rewrite of the edos-debcheck).

So why dependency order matters ? If we consider any package, even if the policy does not specify any order in the dependencies, it’s common practice to write disjunctive dependencies specifying the most probable and tested alternative first and all other, more esoteric choices later. Moreover real packages are considered before virtual packages. Since every developer seems be doing the same, some kind of structure might be hidden in the order in which dependencies are specified.

Part of the efficiency of the the solver used in our tools is actually due to the fact that its search strategy is strongly dependent on the order in which literal are specified in each clause. Saying the package foo depends on A and B is “different” then saying it depends on B and A, even if semantically equivalent.

In my tests, I found about a twofold performance loss if the order of literals is either randomized or inverted. This is clearly a specific problem related to our solvers, while other solvers might not be susceptible to such small structural changes. Sat competitions often employs some form of obfuscation strategies of well known problems with well known structures in order to make useless to encode a search strategy that exploits the specific structure of a problem.

Since here we’re not trying to win a sat competition, but to provide tool to solve a specific problem, we are of course very happy to exploit this structure.


QA tools for FOSS distributions

I’m going to deliver this talk at fosdem 2012, room H.1301 (CrossDistribution Devroom) at 16:30 on Sat. If you are interested, please come by. In particular I’d like to talk with all the developers out there that are using our work (of edos fame) and to discuss with them future plans to migrate their programs to the new generation of mancoosi - powered QA tools. Scroll down to get the slides .

fosdem link : http://fosdem.org/2012/schedule/event/distro_qa_tools

Abstract

FOSS distributions are increasingly over pressure to deliver stable releases including the most up to date upstream software. Time-based release strategies has exacerbated this problem putting even more pressure on QA teams. The recently concluded Mancoosi project has developed a number of tools to automatically analyse large packages collections for a range of problems, from installability checks to speculative analysis of software repositories. In this talk I’ll present four command line tools to identify and correct potential problems as soon as possible during the release cycle.

In particular : Debcheck: This tools helps to identify all broken packages within a repository and provides a detailed explanation of the problem. This can be used to prevent shipping releases that contain packages that cannot be installed because of missing or malformed dependencies. Buildcheck: Given a Sources file and a set of binary repositories, this tool identifies those source packages that cannot be compiled because their build dependencies cannot be satisfied. Outdated: This tool identifies those broken packages that need special attention because of outdated meta-data. Challenged: This tool performs a speculative analysis of the repository to identify those packages that, if upgraded to a specific version, would break a large number of other packages in the repository. This tool would be particularly useful during the upgrade of a specific component to evaluate its impact on the software archive.

Most of our tools support both rpm (version 4 and 5) and deb based distributions.

The mancoosi team.


new thinkpad x220

This summer my beloved thinkpad x301 died in a cloud of smoke. It was exactly 3 years and 20 days old while my warranty was valid only for 3 years. Now, don’t tell me this is a coincidence. Anyway. After about 5 months, I finally managed to convince my employer to get me a new thinkpad, the x220. My specs includes a 128G SSD , 4G of RAM, 2.4Gz processor, camera and fingerprint reader.

It’s a pity that the x300 series is not in production anymore. They were light, with a solid battery and a large screen. The X1 just don’t cut it. Even though it can be consider the successor of the x300, its battery life is just not enough for my needs. On the other hand, the x220 is a very nice machine : the screen is a bit smaller then the X300, but it is light, with a very powerful processor, good battery and it feels very solid. In my opinion lenovo should have packed the new hw of the x220 in the chassis of the x300, maybe with small compromise on the battery life (I got the big battery and I can squeeze almost 7 hs with a single charge) but clearly this was not a good business choice…

Installing debian on this laptop is not immediate because none of the official debian installers are shipped with a recent kernel (as in 3.x series). Since with the official debian installer I cannot have neither the driver for the Ethernet card or the driver for the wirelles card, I opted to use a custom installer built by kmuto (http://kmuto.jp/debian/d-i/ ) . Using this installer the ethernet card is recognized immediately and it’s easy to proceed with the installation as usual. Another option would have been to add the binary blog for the wireless chip, but apparently the deb installer supports only WEP auth, while all my access point are WPA. I didn’t spend too much time on the wireless setup, so it might well be that is indeed possible to install using a WPA access point.

Last time I installed a laptop, I used the automatic partition option to have lvm on top of a lucks encrypted partition, only to discover later that the default dimensions of the partitions were a bit too small for me. For example, by default the boot partition is only 256Mb. This is plenty if you want to have only one kernel image installed at each given time, but if you want more then one kernel, memtest and for example a grml rescue iso, it’s easy to run out of space.

So I proceed to manually partition the disc creating a boot partition of 512M, and using the rest as a luks encrypted device with lvm on top and 3 logical volumes : sys (15G), swap (4G) and home (the rest). For my daily use having 15G on my laptop for /usr, /var, etc should more the enough…

Next step was to install the system. Since in recent times I got extremely pissed off with gnome 3, I’ve decided to dump it completely and go back to awesome. But since awesome all by itself is a bit sad, I paired it up with xfce. Everything works, except the automount, and I’m still trying to figure out how to make it work. Apparently is a consolkit problem… I’ll write another post about the xfce4 + awsome setup soon…

Today I’ve also started playing with the finger print reader. It seems working, but I haven’t managed yet to use it in conjunction with pam for authentication … more to come.

And… On last closing remark : during the last 5 months I’ve used a dell latitude e6410 … Gosh. I feel I’m on anther planet. The keyboard of a thinkpad give you pleasure, not pain, from 2 to 4G of RAM is a big jump and from a conventional HD to a SSD … well… it seems I’m flyinggggg :) I’ve the impression my productivity just went up 50% !!!

If you work with your laptop everyday get a good laptop. It is well worth the investment …

Update

Now debian can be installed on this model using the stock installation images.