3  Formalization

This section is mostly based on [MBC+06], with some additional stuff from [CDL+06].

3.1  A Summary of Package Metadata

In this section we briefly summarize the features of DEB and RPM packages that are relevant to understand the result of EDOS for dependency management. Where not explicitly stated, these features are to be intended the same in both DEB and RPM package systems. A more detailed analysis of various packaging systems can be found in Section 1.

A package is a binary bundle that contains a component, all the data needed to its correct functioning and some metadata which describe its attributes and its requirements with respect to the environment in which it will be deployed. Packages not always contain “executable” components. Often documentation and non-executable artifacts are packaged as well. However, to our purposes, this distinction is not relevant because the package format is agnostic with respect to its actual content type.

Both DEB and RPM packages are actually compressed archives. However, while RPM is actually an ad-hoc format explicitly conceived for this purpose, DEB packages can be produced using standard tools (i.e., ar and tar), so they can be easily managed. Nevertheless, the most relevant difference between DEB and RPM package format concerns their metadata specification. While RPM packages encode it in a binary form as a part of its ad-hoc archive format, DEB packages use a textual representation for it. This fact makes its processing easier. Figure 9 shows an excerpt of the metadata taken from the firefox DEB package in version 1.5.dfsg+1.5.0.1-2 for the i386 architecture.


Package: firefox
Version: 1.5.dfsg+1.5.0.1-2 ...
Depends: fontconfig, psmisc, libatk1.0-0 (>= 1.9.0), libc6 (>= 2.3.5-1) ...
Suggests: xprint, firefox-gnome-support (= 1.5.dfsg+1.5.0.1-2), latex-xft-fonts
Conflicts: mozilla-firefox (<< 1.5.dfsg-1)
Replaces: mozilla-firefox
Provides: www-browser...
    
Figure 9: Excerpt of a DEB package metadata

Every package has a version that is used to give a temporal order to the packaged component release. Though the version structure of RPM and DEB packages is the same (i.e., epoch number, version id and revision id), the comparison algorithm is slightly different. Version information is central when specifying relationships between packages.

The kinds of relationships that can be expressed in RPM and DEB package metadata are almost the same, except some very specialized ones that are seldom used or processed (e.g., obsoletes, replaces, suggests, enhances)

In this paper we have taken into account only the three main relationships that are used with the same semantics in both RPM and DEB packages:

  • Depends (DEB), Requires (RPM): used to establish a requirement on the packages that must be present in the system in order to make the current packaged component fully functional.
  • Conflicts (DEB, RPM): used to establish a requirement on the packages that cannot be present at the same time in the system with the current one. A successful installation can be performed only if no conflicting packages are already present in the system.
  • Pre-Depends (DEB), PreReq (RPM): similar to the Depends relationships but used to establish a requirement on the packages that must be already present in the system in order to successfully deploy the packaged component. The difference between Pre-Depends and Depends is that while Depends package might not be present in the system when deploying the packaged component (but only after, so they can be deployed together with the current component), Pre-Depends packages must be already installed even before attempting to deploy the current packaged component.

Relationships are specified by using a list of package names, optionally with version constraints. Each element in the list represents a requirement for the given relationship, and all these requirements must be met (i.e., they are a conjunction) in order to satisfy the dependency relationship. Actually, the DEB package format allows an element of the list to be a disjunction of requirements. This is done by using the | operator. In this case, in order to meet the (disjunctive) requirement it is sufficient that at least one of the constituting requirements is met.

Figure 9 shows that the package firefox version 1.5.dfsg+1.5.0.1-2 has, among the others, a Depends requirement on the package fontconfig and on a package debianutils with a version >= 1.16. This means that once the packaged component firefox is deployed, in order to correctly function, it will need the debianutils component with a version greater than or equal to 1.16 to be deployed as well. When no version constraint is specified, then any packaged component version will do.

The allowed version constraints can be specified by using only the standard comparison operators with their standard semantics: <, <=, =, >= and =.


 Depends: libc6 (>= 2.0), 
 xlibs (>= 4.0) | xlib6g (>= 3.3.3.1), ...
    
Figure 10: DEB disjunctive dependency requirement

Figure 10, on the other hand, shows an example of a disjunctive dependency requirement. In order to correctly function, the current packaged component needs libc6 with a version >= 2.0 and either xlibs with a version >= 4.0 or xlib6g with a version >= 3.3.3.1.

Another essential information is given by the Provides metadata. When specified it allows to declare an identifier that can be used to reference the package. Such an identifier is often called either virtual package or feature. For instance, in Figure 9 the firefox package Provides the identifier web-browser. It is important to point out that this identifier can be used when specifying dependency relationships. So, for example, another package could declare a dependency requirement by specifying web-browser. In this case this requirement should be considered as a disjunctive requirement whose elements are all the packages that provide web-browser.

In RPM metadata, these features are the only way of declaring a relationship between packages; it is not possible to declare a relationship directly between packages, only by way of features. Also, it is worth noting that while RPM does not allow disjunctive dependencies per se, they can be emulated by declaring a dependency on a feature that is provided by two or more packages.

DEB also allows virtual packages, but their usage is strictly limited to a few cases (there is an explicitly maintained list of virtual packages) and using direct relations between packages is the standard practice.

3.2  Formal Model of Package Relations

In this section we formalize the definitions we have described in Section 3.1 and we introduce the notion of package installation that identify the central property we want to guarantee for each package present in a package repository.

Definition 1 (Package, unit)   A package is a pair (u,v) where u is a unit and v is a version. Units are arbitrary strings, and we assume that versions are non-negative integers.

Now we have two choices: either we try to represent formally the version constraints used in the different package formats, and then our model will need to take into account relationships between versions like the >=, =, >>, <<, <= found in the Figure 9; or we keep our model simple and eliminate all version constraints but one, equality (==), by replacing all the other version constraints with the explicitly listing of all packages versions that satisfy the version constraints (we call such replacement an expansion). This second choice has the additional advantage that the model becomes independent on the specific version comparison operators, present and future, of any given package format, but of course such expansions are relative to a given repository, so any change to the repository will make it necessary to perform the expansion phase again.

As an example, consider a repository containing versions 0.9, 1.0 and 1.1 of mozilla-firefox. Then the conflict relationship in Figure 9 would expand as shown in Figure 11.


Package: firefox
Version: 1.5.dfsg+1.5.0.1-2
Conflicts: mozilla-firefox (==0.9), mozilla-firefox (== 1.0), mozilla-firefox (== 1.1) ...
    
Figure 11: Dependencies expansion

Once this expansion has been performed, we are left only with direct dependency or conflict relationships among actual packages, i.e. (unit, version) pairs, and we can then in a straightforward way encode them it into a graph.

In the following we will hence assume that the distribution is fixed, and that the version comparison operators have been expanded. Notice that, due to this assumption, we can also get rid of all the details concerning the particular ordering over version strings used in common FOSS packages: we use the specific version comparison algorithm only to expand the dependencies, and then we can use any set of unique identifiers for representing each version. One reason why directly working with ordering constraints on version numbers is that the version numbering schemas used in DEB and RPM packages are not discrete: for any two version strings v1 and v2 such that v1 < v2, there exists v3 such that v1 < v3 < v2.

For instance, if the repository of our Debian distribution contains the versions 2.15-6, 2.16.1cvs20051117-1 and 2.16.1cvs20051206-1 of the unit binutils, we may encode these versions respectively as 0,1 and 2, giving the packages (binutils,0), (binutils,1), and (binutils,2).

Definition 2 (Repository)   A repository is a tuple R = (P,D,C) where P is a set of packages, D : PP(P(P)) is the dependency function (we write P(X) for the set of subsets of X), and CP × P is the conflict relation. The repository must satisfy the following conditions:
  • The relation C is symmetric, i.e., 12) ∈ C if and only if 21) ∈ C for all π12P.
  • Two packages with the same unit but different versions conflict, that is, if π1 = (u,v1) and π2 = (u,v2) with v1v2, then 12) ∈ C.

The second requirement in the definition of a repository is present in some package management systems, notably Debian’s, but not all. For instance, RPM-based distributions allow simultaneous installation of several versions of the same unit, at least in principle (if, for example, they do not install files in the same location).

In a repository R = (P,D,C), the dependencies of each package p are given by D(p) = { d1, …, dk } which is a set of sets of packages, interpreted as follows. If p is to be installed, then all its k dependency requirements must be satisfied. For di to be satisfied, at least one of the packages of di must be available. In particular, if one of the di is the empty set, it will never be satisfied, and the package p is not installable.

The attentive reader might have noticed that we do not have a separate dependency function for modeling Pre-Depends relationships. In fact, we consider them as if they were normal Depends relationships. This is reasonable because we put ourselves on the distribution editor-side where we have a package repository (and not an installed system). At this level, the two kinds of relationships can be considered equivalent.

Example 1   Let R = (P,D,C) be the repository given by
  
    P = { abcdefghij } 
    D(a) = 

{b}, {c,d}, {d,e}, {d,f

    D(b) = 

{g

   D(c) = 

{ghi

   D(d) = 

{hi

    D(e) = D(f) = 

{j

    C = { (c,e), (e,c), (e,i), (i,e), (g,h), (h,g) }
where a = (ua,0), b = (ub,0), c = (uc,0) and so on. The repository R is represented in Figure 12. For the package a to be installed, the following packages must be installed: b, either c or d, either d or e, and either d or f. Packages c and e, e and i, and g and h cannot be installed at the same time.

Figure 12: The repository of Example 1.

As described in Section 3.1, dependencies are usually conjunctive, that is they are of the form

  a → b1 ∧ b2 ∧ ⋯ ∧ bs

where a is the target and b1, b2, … are its requirements. However, more complex dependencies can be specified, with name disjunctive requirements. So the general form for a dependency specification is a conjunction of disjunctions:

  a → (b11 ∨ ⋯ ∨ b1r1) ∧ ⋯ ∧ (bs1 ∨ ⋯ ∨ bsrs).     (1)

For a to be installed, each term of the right-hand side of the implication 1 must be satisfied. In turn, the term bi1 ∨ ⋯ ∨ biri when 1 ≤ is is satisfied when at least one of the bij with 1 ≤ jri is satisfied. If a is a package in our repository, we therefore have

  D(a) = { { b11, …, b1r1 }, ⋯, { bs1, …, bsrs } }.

In particular, if one of the terms is empty (if ∅ ∈ D(a)), then a cannot be satisfied. This side-effect is useful for modeling repositories containing packages mentioning another package b that is not in that repository. Such a situation may occur because of an error in the metadata, because the package b has been removed, or b is in another repository, maybe for licensing reasons.

Concerning the relation C, two packages π1 = (u1,v1), π2 = (u2,v2) ∈ P conflict when (π1, π2) ∈ C. Since conflicts are a function of presence and not of installation order, the relation C is symmetric.

Definition 3 (Installation)   An installation of a repository R = (P,D,C) is a subset I of P, giving the set of packages installed on a system. An installation is healthy when the following conditions hold:
  • Abundance: Every package has what it needs. Formally, for every π ∈ I, and for every dependency dD(π) we have Id ≠ ∅.
  • Peace: No two packages conflict. Formally, (I × I) ∩ C = ∅.
Definition 4 (Installability and co-installability)   A package π of a repository R is installable if there exists a healthy installation I such that π ∈ I. Similarly, a set of packages Π of R is co-installable if there exists a healthy installation I such that Π ⊆ I.

Note that because of conflicts, every member of a set XP may be installable without the set X being co-installable.

Example 2   Assume a depends on c, b depends on d, and c and d conflict. Then, the set {a,b} is not co-installable, despite each of a and b being installable and not conflicting directly.
Definition 5 (Maximal co-installability)   A set X of co-installable packages of a repository R is maximal if there is no other co-installable subset X of R that strictly contains X. We write maxco(R) for the family of all maximal co-installable subsets of R.
Definition 6 (Dependency closure)   The dependency closure Δ(Π) of a set of packages Π of a repository R is the smallest set of packages included in R that contains Π and that is closed under the immediate dependency function D : P(P) → P(P) defined as
  
D
(Π) = 
 
π ∈ Π
d ∈ D(π)
 
 d.

In simpler words, Δ(Π) contains Π, then all packages that appear as immediate dependencies of Π, then all packages that appear as immediate dependencies of immediate dependencies of Π, and so on. Since the domain of D is a complete lattice, and D is clearly a monotonic function (it is even a continuous function), we immediately get by the Knaster-Tarski theorem that such a smallest set exists and can be actually computed as follows:

The dependency closure Δ(Π) of Π is:
  Δ(Π) = 
 
n ≥ 0
 
D
n(Π).

The notion of dependency closure is useful to extract the part of a repository that pertains to a package or to a set of packages.

Definition 7 (Generated subrepository)   Let R = (P,D,C) be a repository and Π ⊆ P be a set of packages. The subrepository generated by Π is the repository R|Π = (P′,D′,C′) whose set of packages is the dependency closure of Π and whose dependency and conflict relations are those of R restricted to that set of packages. More formally we have P′ = Δ(Π), D′ : P′ → P(P(P′)), π ↦ { dP′  |  dD(π) } and C′ = C ∩ (P′ × P′).

The following graph shows the subrepository generated by the package c of example 1. The dependency closure of c is the set of package nodes of that subrepository, that is {c,g,h,i}.

A larger, real-world example is the subrepository generated by the package sysstat in Debian stable on 2005-12-13, viewed as a boolean graph. Version numbers have been omitted for brevity.



We then have the following property, which allows to consider only the relevant subrepositories when answering questions of installability.

Proposition 1 (Completeness of subrepositories)   A package π is installable w.r.t. R if and only if it is installable w.r.t. R|π.

The desirable property that we want to ensure over a repository R is the following:

Definition 8 (Trimmed repository)   A repository R is trimmed if every package π ∈ R is installable with respect to R itself.

The intuition is that if a repository is not trimmed, then it contains packages that cannot be installed in any configuration: these packages behave as if they were not part of the repository. This means that either they should not be actually there or there is some a problem in their metadata specification that should be corrected.