Subsections 4.1 and 4.2 are from [EDO05], subsections 4.3 and 4.4 are from [EDO06].
As we have seen, the basic constraints that are used both in RPM and DEB dependencies may have only one of the following forms:
This kind of constraint is called unary constraint in the Constraint Solving community, and is known to be strictly less expressive than binary constraints (which are as general as n-ary constraints).
It is important to remark that the unary constraints appearing in the dependencies of DEB and RPM can be replaced by an equivalent set of boolean constraints.
This encoding is immediate in the DEB format, while it requires some gymnastic using the provides: tag in the RPM format. Of course, the size of the boolean encoding can be bigger than the original problem: if the problem is of size n and the maximum number of versions available for a single package is k, the boolean encoding is O(kn), which represents a less than (or equal than) quadratic blowup.
In the following, we will hence focus on package dependencies represented using boolean constraints only.
It is important to remark that the unary constraints appearing in the dependencies of DEB and RPM can be replaced by an equivalent set of boolean constraints.
This encoding is immediate in the DEB format, while it requires some gymnastic using the provides: tag in the RPM format.
Of course, the size of the boolean encoding can be bigger than the original problem: if the problem is of size n and the maximum number of versions available for a single package is k, the boolean encoding is O(kn), which represents a less than (or equal than) quadratic blowup.
In the following, we will hence focus on package dependencies represented using boolean constraints only.
Since the language used to combine these constraints involves disjunctions (either explicit, as in the DEB format, or implicit through the provides: tags of the RPM format), conjunctions and negations (through the conflicts: tag), one might expect that the CSP problem of checking the installability of a given package P using a repository R is computationally difficult.
This is indeed the case, as we show in what follows.
For the sake of clarity, we will only discuss here the depends: and conflicts: constraints, and assume that the provides: tags have been previously inlined.
More formally, the problem is
We call configuration of a repository R (which is the set of available packages) an assignment α : R → {Installed,Uninstalled } mapping each package in R to the label Installed or Uninstalled.
Deciding whether a given package P is installable, given a repository R, corresponds to finding a configuration α of R that assigns Installed to P and such that all the constraints associated to the packages in R mapped to Installed are satisfied. More formally, the installation problem can be represented as the language
|
where ⟨ R,P ⟩ is a suitable encoding of the repository R and the package name P.
It is easy to see that PACKINST is in NP. Indeed, if R contains n packages, then a configuration α can be encoded in n bits and thus can be non-deterministically guessed in time proportional to n. Then, checking whether the constraints associated to each package are satisfied can be done in time linear in the size of the repository – for each package P in R such that α(P) = Installed, we check that for all the disjunctive dependency clauses P1 ∨ … ∨ Pk of P there exists an i such that α(Pi) = Installed, and that for all packages P′ conflicting with P we have α(P′) = Uninstalled. Since all those constraints appear explicitly in R and thus count in the size of R, the checking is done time in linear in the size of R (save the usual hidden logarithmic access factors, which may be superlinear in n).
To see that PACKINST is NP-hard, we will show that any 3SAT problem can be reduced in polynomial time to an instance of of a Debian package installation problem.
Let S = C1∧…∧ Cn be an instance of 3SAT, with each Ci being the disjunctions of three literals (li,1∨ li,2 ∨ li,3) each of the li,k being either a propositional atom a or a negated propositional atom a. Let A={a1,…,ak} be the set of propositional atoms occurring in S. We build the following Debian repository RS, containing a package PS representing S itself, one package PCi for each clause Ci, and packages Va, Pa, and Pa for each propositional atom a:
The problem S is thus reduced to the instance ⟨ RS, PS ⟩ of PACKINST, which can be constructed in deterministic time polynomial in n. We will now show that ⟨ RS, PS ⟩ is a positive instance of PACKINST if and only if S is satisfiable. If we have a boolean valuation f satisfying S, this valuation gives us a configuration α whose constraints are all satisfied, and that maps PS to Installed, by taking α(PS) = α(PC1) = ⋯ = α(PCn) = Installed, α(Pa) = f(a) and α(Pa) = ¬ f(a) for every atom a. Therefore package PS is indeed installable in the repository RS.
Conversely, if ⟨ RS, PS ⟩ is a positive instance of PACKINST, there is a configuration α mapping PS to Installed and satisfying all the constraints 1–4. Then we have that PCi and Va must be mapped to Installed also (because of the dependency in 1) for every i and every atom a. By virtue of dependencies 3 and 4, for every propositional atom a exactly one of Pa and Pa must be installed. Furthermore, for every i, at least one of Pℓi,k must be mapped to Installed because of the dependencies in 2. As a consequence, the valuation f, defined for every propositional atom a by f(a) = true if α(Pa) = Installed and f(a) = false otherwise, satisfies S.
The idea is the same as for the Debian encoding, but the details are slightly different because of the lack of explicit OR dependencies in the RPM format. We just give the RPM repository encoding the 3SAT instance S.
Despite the apparent differences, the constraint languages in DEB and RPM are sensibly equivalent in expressiveness, and the associated installation problems are both NP-complete.
This means that automatic package installation tools like APT, URPMI or SMART live dangerously on the edge of intractability, and must carefully apply heuristics that may be either safe (the approach advocated by SMART), and hence still not guaranteed to avoid intractability, or unsafe, thus accepting the risk of not always finding a solution when it exists.
The detailed analysis of the algorithms underlying these existing tools, and a proposal for a state-of-the-art tool for automatic package management is one of the goals we set for the next deliverables.
In the following sections we show how we have formalized the Installability problem using two approaches: Constraint Programming and SAT.
The formalization of installability provided above leads quite naturally to an encoding as a Boolean satisfiability problem. We define a propositional variable Iuv for every package (u,v) in the repository R, indicating that unit u is installed in version v. We denote for unit u by Ru the set of versions v such that (u,v)∈ R.
We then build a boolean formula Rs stating that package (u,v) is installable as a conjunction of the following boolean formulas:
For every unit u in the repository:
| ¬( Iuv1 ∧ Iuv2) |
If R contains a dependency for (u,v) of the form
|
we introduce the formula
|
If R contains a conflict for (u,v) of the form
|
we introduce the formula
|
Satisfiability of this formula can be checked by a SAT solver.
We can also formulate the installability problem for a given package in a Debian repository R as a constraint programming problem over finite domains, but in this case we must start from the repository before expanding the version relationships.
To simplify the problem by getting rid of the inessential details related to version comparison algorithms in DEB or RPM formats, we first preprocess the repository and replace version strings by integer as follows: for each unit u, collect all of its mentioned version strings v, and order them accordingly to the appropriate, format specific, comparison algorithm; then replace each occurrence of u op v by u op nv, where nv is the position of v in the increasingly ordered sequence of versions of u. In other terms, we simply project over an initial segment of the integers starting at 1 the order structure of the versions of each package. This does not change the nature of the constraint problem, but reduces it to a problem over the Integer domain, for which solvers are more easily available.
We then build a constraint satisfaction problem over a finite domain by constructing a set of constraints Rc out of R as follows:
|
|
|
|
Notice that, in the encoding above, if we encounter a package name with no version constraint (so we find just u instead of u >> 3, for example), we simply produce Xu >0 as the encoding. It is now possible to prove the following:
Hence, to check installability of a package u,v in a repository R, we can pass the constraint set Rc to any CP solver and ask whether Xu=v is satisfiable. We can also simply ask whether there exist a version of a unit that is installable, by asking whether Xu>0 is satisfiable.