4  Algorithmic Considerations

Subsections 4.1 and 4.2 are from [EDO05], subsections 4.3 and 4.4 are from [EDO06].

4.1  DEB and RPM both use unary constraints

As we have seen, the basic constraints that are used both in RPM and DEB dependencies may have only one of the following forms:

  • P meaning “any version of package P”, that can be seen as an abbreviation for P > 0
  • P op const where op is a binary comparison operation and const is a constant value, meaning “any version v of package P such that v op const is true”.

This kind of constraint is called unary constraint in the Constraint Solving community, and is known to be strictly less expressive than binary constraints (which are as general as n-ary constraints).

4.1.1  Reduction to boolean constraints

It is important to remark that the unary constraints appearing in the dependencies of DEB and RPM can be replaced by an equivalent set of boolean constraints.

  • For each available version v of a package P in the repository R, introduce the variable Pv
  • Replace each unary constraint by a disjunction of boolean constraints as follows
    • P becomes Pv1 ∨ … ∨ Pvk where v1, …,vk are the available versions of P in the repository
    • P op const becomes Pv1 ∨ … ∨ Pvk where v1, …, vk are the available versions of P in the repository that satisfy vi op const

This encoding is immediate in the DEB format, while it requires some gymnastic using the provides: tag in the RPM format. Of course, the size of the boolean encoding can be bigger than the original problem: if the problem is of size n and the maximum number of versions available for a single package is k, the boolean encoding is O(kn), which represents a less than (or equal than) quadratic blowup.

In the following, we will hence focus on package dependencies represented using boolean constraints only.

4.2  Package installation is NP-Complete

Theorem 1   Checking whether a single package P can be installed, given a repository R, is NP-complete.

It is important to remark that the unary constraints appearing in the dependencies of DEB and RPM can be replaced by an equivalent set of boolean constraints.

  • For each available version v of a package P in the repository R, introduce the variable Pv
  • Replace each unary constraint by a disjunction of boolean constraints as follows
    • P becomes Pv1 ∨ … ∨ Pvk where v1, …,vk are the available versions of P in the repository
    • P op const becomes Pv1 ∨ … ∨ Pvk where v1, …, vk are the available versions of P in the repository that satisfy vi op const

This encoding is immediate in the DEB format, while it requires some gymnastic using the provides: tag in the RPM format.

Of course, the size of the boolean encoding can be bigger than the original problem: if the problem is of size n and the maximum number of versions available for a single package is k, the boolean encoding is O(kn), which represents a less than (or equal than) quadratic blowup.

In the following, we will hence focus on package dependencies represented using boolean constraints only.

Since the language used to combine these constraints involves disjunctions (either explicit, as in the DEB format, or implicit through the provides: tags of the RPM format), conjunctions and negations (through the conflicts: tag), one might expect that the CSP problem of checking the installability of a given package P using a repository R is computationally difficult.

This is indeed the case, as we show in what follows.

For the sake of clarity, we will only discuss here the depends: and conflicts: constraints, and assume that the provides: tags have been previously inlined.

More formally, the problem is

Configuration.

We call configuration of a repository R (which is the set of available packages) an assignment α : R → {Installed,Uninstalled } mapping each package in R to the label Installed or Uninstalled.

The problem.

Deciding whether a given package P is installable, given a repository R, corresponds to finding a configuration α of R that assigns Installed to P and such that all the constraints associated to the packages in R mapped to Installed are satisfied. More formally, the installation problem can be represented as the language

  PACKINST


⟨ RP ⟩  
 
    ∃ α    α : R → {Installed,Uninstalled} ∧ 
  α(P) = Installed ∧ 
  
∀ P′    α(P′) = Installed =⇒ the constraints associated to P′ in R are satisfied in α.

where ⟨ R,P ⟩ is a suitable encoding of the repository R and the package name P.

4.2.1  Package installation is in NP

It is easy to see that PACKINST is in NP. Indeed, if R contains n packages, then a configuration α can be encoded in n bits and thus can be non-deterministically guessed in time proportional to n. Then, checking whether the constraints associated to each package are satisfied can be done in time linear in the size of the repository – for each package P in R such that α(P) = Installed, we check that for all the disjunctive dependency clauses P1 ∨ … ∨ Pk of P there exists an i such that α(Pi) = Installed, and that for all packages P′ conflicting with P we have α(P′) = Uninstalled. Since all those constraints appear explicitly in R and thus count in the size of R, the checking is done time in linear in the size of R (save the usual hidden logarithmic access factors, which may be superlinear in n).

4.2.2  Encoding a 3SAT instance as a Debian package installation

To see that PACKINST is NP-hard, we will show that any 3SAT problem can be reduced in polynomial time to an instance of of a Debian package installation problem.

Let S = C1∧…∧ Cn be an instance of 3SAT, with each Ci being the disjunctions of three literals (li,1li,2li,3) each of the li,k being either a propositional atom a or a negated propositional atom a. Let A={a1,…,ak} be the set of propositional atoms occurring in S. We build the following Debian repository RS, containing a package PS representing S itself, one package PCi for each clause Ci, and packages Va, Pa, and Pa for each propositional atom a:

  1. PS depends PC1, …, PCn,Va1, …,Vak
  2. PCi depends Pi,1 | Pi,2 | Pi,3, for each 1≤ in
  3. Va depends Pa|Pa for each atom a
  4. Pa conflicts Pa for each atom a
  5. Pa conflicts Pa for each atom a

The problem S is thus reduced to the instance ⟨ RS, PS ⟩ of PACKINST, which can be constructed in deterministic time polynomial in n. We will now show that ⟨ RS, PS ⟩ is a positive instance of PACKINST if and only if S is satisfiable. If we have a boolean valuation f satisfying S, this valuation gives us a configuration α whose constraints are all satisfied, and that maps PS to Installed, by taking α(PS) = α(PC1) = ⋯ = α(PCn) = Installed, α(Pa) = f(a) and α(Pa) = ¬ f(a) for every atom a. Therefore package PS is indeed installable in the repository RS.

Conversely, if ⟨ RS, PS ⟩ is a positive instance of PACKINST, there is a configuration α mapping PS to Installed and satisfying all the constraints 1–4. Then we have that PCi and Va must be mapped to Installed also (because of the dependency in 1) for every i and every atom a. By virtue of dependencies 3 and 4, for every propositional atom a exactly one of Pa and Pa must be installed. Furthermore, for every i, at least one of Pi,k must be mapped to Installed because of the dependencies in 2. As a consequence, the valuation f, defined for every propositional atom a by f(a) = true if α(Pa) = Installed and f(a) = false otherwise, satisfies S.

4.2.3  Encoding a 3SAT instance as an RPM package installation

The idea is the same as for the Debian encoding, but the details are slightly different because of the lack of explicit OR dependencies in the RPM format. We just give the RPM repository encoding the 3SAT instance S.

  1. PS depends PC1, …, PCn, Va1,…,Vak
  2. Pa provides Ci1, …, Cik , for each Ci1,…, Cik containing a literal equal to a
  3. Pa provides Ci1, …, Cik , for each Ci1,…, Cik containing a literal equal to a
  4. Pa provides Va, for each atom a
  5. Pa provides Va, for each atom a
  6. Pa conflicts Pa, for each atom a

4.2.4  Conclusions:

Despite the apparent differences, the constraint languages in DEB and RPM are sensibly equivalent in expressiveness, and the associated installation problems are both NP-complete.

This means that automatic package installation tools like APT, URPMI or SMART live dangerously on the edge of intractability, and must carefully apply heuristics that may be either safe (the approach advocated by SMART), and hence still not guaranteed to avoid intractability, or unsafe, thus accepting the risk of not always finding a solution when it exists.

The detailed analysis of the algorithms underlying these existing tools, and a proposal for a state-of-the-art tool for automatic package management is one of the goals we set for the next deliverables.

4.3  Encoding the Installability problem as a SAT problem

In the following sections we show how we have formalized the Installability problem using two approaches: Constraint Programming and SAT.

The formalization of installability provided above leads quite naturally to an encoding as a Boolean satisfiability problem. We define a propositional variable Iuv for every package (u,v) in the repository R, indicating that unit u is installed in version v. We denote for unit u by Ru the set of versions v such that (u,v)∈ R.

We then build a boolean formula Rs stating that package (u,v) is installable as a conjunction of the following boolean formulas:

At most one version per unit:

For every unit u in the repository:

 
v1,v2∈ Ru v1 ≠ v2
 ¬( Iuv1 ∧ Iuv2)
Constraints:

If R contains a dependency for (u,v) of the form

  Depends: (u11 op11 v11 ∨ ⋯ ∨ u1r1 op1r1   v1r1)
  ∧ ⋯ ∧ (us1 ops1 vs1∨ ⋯ ∨ usrs opsrs vsrs).

we introduce the formula

  Iuv ⇒ (
 
w op11 v11
 Iu1w ∨ ⋯ ∨
 
w op1r1 v1r1
 Iur1w
 ∧ …∧ (
 
w ops1 vs1
 Iusw ∨ ⋯ ∨
 
w opsrs vsrs
 Iursw

If R contains a conflict for (u,v) of the form

  Conflicts: (u11 op11 v11 ∨ ⋯ ∨ u1r1 op1r1   v1r1)
  ∧ ⋯ ∧ (us1 ops1 vs1∨ ⋯ ∨ usrs opsrs vsrs).

we introduce the formula

  Iuv ⇒ (
 
w op11 v11
 ¬ Iu1w ∨ ⋯ ∨
 
w op1r1 v1r1
 ¬ Iur1w
 ∧ …∧ (
 
w ops1 vs1
 ¬ Iusw ∨ ⋯ ∨
 
w opsrs vsrs
 ¬ Iursw
Proposition 2   A package (u,v) is installable in the repository R if and only if the boolean formula RsIuv is satisfiable.

Satisfiability of this formula can be checked by a SAT solver.

4.4  Encoding the Installability problem as a CP problem

We can also formulate the installability problem for a given package in a Debian repository R as a constraint programming problem over finite domains, but in this case we must start from the repository before expanding the version relationships.

To simplify the problem by getting rid of the inessential details related to version comparison algorithms in DEB or RPM formats, we first preprocess the repository and replace version strings by integer as follows: for each unit u, collect all of its mentioned version strings v, and order them accordingly to the appropriate, format specific, comparison algorithm; then replace each occurrence of u op v by u op nv, where nv is the position of v in the increasingly ordered sequence of versions of u. In other terms, we simply project over an initial segment of the integers starting at 1 the order structure of the versions of each package. This does not change the nature of the constraint problem, but reduces it to a problem over the Integer domain, for which solvers are more easily available.

We then build a constraint satisfaction problem over a finite domain by constructing a set of constraints Rc out of R as follows:

Variables and domains:
For each unit u in the repository R, we introduce a finite domain variable, with domain equal to the set of available versions of the unit present in the repository, plus one special value 0 denoting the fact that no version of the unit is installed. We add the constraint Xu ∈ {0,v1, …,vk} to Rc.
Constraints
We add constraints to Rc that encode the dependency information associated to each package π=(u,v)∈ R as follows. If R contains a dependency for π=(u,v) of the form
  
    Depends: (u11 op11 v11 ∨ ⋯ ∨ u1r1 op1r1 v1r1)
∧ ⋯ ∧ (us1 ops1 vs1∨ ⋯ ∨ usrs opsrs vsrs).
we introduce the constraint
  
    (Xu=v) ⇒ (Xu11 op11 v11 ∨ ⋯ ∨ Xu1r1 op1r1 v1r1)
∧ ⋯ ∧ (Xus1 ops1 vs1∨ ⋯ ∨ Xusrs opsrs vsrs)
If R contains a conflict for π=(u,v) of the form
  
    Conflicts: (u11 op11 v11 ∨ ⋯ ∨ u1r1 op1r1 v1r1)
∧ ⋯ ∧ (us1 ops1 vs1∨ ⋯ ∨ usrs opsrs vsrs).
we introduce the constraint
  
    (Xu=v) ⇒
    (Xu11 compl(op11v11 ∧ ⋯ ∧ Xu1r1  compl(op1r1v1r1)
    ∨ ⋯ ∧ (Xus1 compl(ops1vs1∧ ⋯ ∧ Xusrs compl(opsrsvsrs)
where compl(op) is the operation opposite to op (e.g., compl(>>) is <=, etc.)

Notice that, in the encoding above, if we encounter a package name with no version constraint (so we find just u instead of u >> 3, for example), we simply produce Xu >0 as the encoding. It is now possible to prove the following:

Proposition 3   A package π=(u,v) is installable in the repository R if and only if the constraint Xu==v is compatible with Rc.

Hence, to check installability of a package u,v in a repository R, we can pass the constraint set Rc to any CP solver and ask whether Xu=v is satisfiable. We can also simply ask whether there exist a version of a unit that is installable, by asking whether Xu>0 is satisfiable.