After few inspiring talks in the drupal room at
fosdem I decided to spend few hours to figure out
the module dependency system in drupal.
Drupal has a highly modular design. The core is composed by a set of
required modules (dependencies) and a set of optional modules
(suggests). All contrib modules declare similar dependencies between
each other. All dependencies are conjunctive, that is, in order to
install a component all its dependencies must be satisfied. There are
no conflict between components, and this implies that a module is
always installable. The only implicit conflict is that two versions of
the same module cannot be installed at the same time. This makes the
module installation algorithm trivial as it is equivalent to a simple
visit of the dependency graph (that might have cycles).
There is a nice page on the drupal
website explaining the format of the metadata for the next version of
drupal .
For example :
name = Tables Filter
description = Provides a filter that converts a [table ] macro into HTML encoded table.
dependencies[] = filter
package = Input filters
core = 6.x
; Information added by drupal.org packaging script on 2009-09-10
version = "6.x-1.0"
core = "6.x"
project = "tables"
datestamp = "1252563652"
Note the conversion in an intermediate aggregate data below.
In order to analyze all modules’ dependencies I’ve downloaded all
available modules for the release 6 of drupal (15th Feb 2010),
extracted all the meta data and transform them in something that the
tools in dose3 can handle. Downloading all projects archives I’ve also
find that there a significant number of archives that cannot be
downloaded (403 / 404) and few mistakes in the metadata … I’ll blog
about this in the future maybe.
==Numbers and intermediate aggregate modules list==
From the file .info in each module archive, I extracted all the
relevant data and transformed in a 822 format similar to the one used
in debian. There are about 4800 modules in the drupal repository for
drupal 6.x.
This is a small snippet representing few drupal core modules and a
meta package (that I created from the metadata) to express the core’s
dependencies) :
[...]
package: tables
version: 6.x-1.0
depends: filter
package: blogapi
version: 6.15
package: profile
version: 6.15
package: filter
version: 6.15
package: drupal
version: 6.15
depends: system , user , block , node , filter
provides: core = 6.15
suggests: translation , comment , menu , openid , contact , tracker , forum , ping , syslog , help , dblog , search , trigger , poll , update , locale , php , path , taxonomy , color , aggregator , upload , throttle , statistics , blog , book , blogapi , profile
[...]
Since I’m considering only modules for drupal version 6.x, all
dependencies for core >= 6.0 , core < 7.0
are left implicit.
Dependency graphs
The result are a set of nice graphs showing for each package their
(deep) dependencies. From the global dependency graph, I’ve extracted
the “connected” components, that is all modules that are related with
each other in some way. This generates 375 sub-graphs. This is the
top 10 (WARNING: some of the biggest pdf systematically manage to
trash my workstation… handle with care) … and circo didn’t manage
to create the pdf for views and taxonomy:
The complete list is
here
From these graphs, it seems that apart from a couple of dozen of
packages, the rest of the drupal components are loosely connected. I
don’t think this is a matter of code sharing but this is more likely
because the drupal repository has a plethora of small components with
a very special functionality that only depends on the drupal core.
Dist check
Distcheck is a small utility that transforms package dependencies in a
propositional logic problem and then uses a sta solver to simulate
it’s installation. Since there are no conflicts, it should be always
possible to install a package. The only reason for a package to be
broken is a missing dependency in the repository. Periodically
performing this analysis could prevent the distribution of broken packages.
Conclusions
Dist Check the module repository to avoid releasing a module that is
not installable due to a missing dependency.
(like debtree, or directly using debtree)
As the system grows it might be necessary to review the dependency
system to include disjunctive dependencies and conflicts between
modules. At present this might be not necessary, Adding more
expressivity to the dependency system of course will significantly
increase the complexity of the installation problem (from polynomial
to NP-complete).
I think it is important to spend few words about this last point. It
is clear that not all 4800 packages can be installed at the same time.
Just think about the filter modules that manipulate user’s
submissions. At the moment the only was a site developer has to
discover a conflict it to try out the module and check if it did not
break anything else on the site. Given the complexity of many drupal
site this can be a painful and costly task to perform.
Adding conflicts to the meta data will make modules integration much
easier for site developers, and move the burden of finding potential
problems to the module developers and to the module installer. As I
said before if we include conflicts (that is negation, in logical
terms) the problem of installing a new module suddenly become
NP-complete. Running a NP complete algorithm on a webserver is of
course a bad idea, but using drush offline to run complex install
operations, should be completely acceptable as much as it is
acceptable to wait for apt-get to install the latest program on debian.
If conflicts are indeed needed it would be fun to have a
mod_php_minisat and to implement a small dependency solver in php !