1  Package Systems

This section is based on [EDO05] which was written in September 2005. Note that some of the package systems described here have evolved since then, so some of the technical details may be outdated.

Packages are a convenient way for users to get new (or updated) software installed on their computer. A package is more than just a collection of files to be installed. It usually contains additional information required for the proper installation and/or uninstallation: other packages that this package depends on, directories for files to be installed, menu items for the desktop environments, scripts to be executed before/after installing/uninstalling the package, and more. Packages are usually not installed manually by the user, but using a package manager. The package manager’s duty is to automate (as much as possible) the process of installing, upgrading, configuring, and removing software packages from the user’s computer.

Packages may be either binary packages or source packages. Binary packages contain the files needed for installation and proper functioning of the software, but not the source files. The files in the binary package are precompiled and so usually are expected to work on a limited set of machine architectures.

Source packages contain the files needed for compilation of the software on the user’s computer. The source package contains a compilation script (typically in form of a Unix Makefile) which automates the compilation and (afterwards) installation processes. In some contexts it is considered good practice to also provide an uninstallation procedure in the makefile. These source packages, which are originally intended for compilation and (de-)installation of the software on the target machine, are used by F/OSS distributors as basis for their own source packages. The derived distribution-specific source package contains again a compilation script which compiles the source files and arranges all relevant files into bundles constituting the binary packages.

Source packages are more flexible, because the user may choose to tweak the source and because the compilation will usually be optimized for the user’s architecture at compile time. However, most users are not expected to be able to cope with software compilation on their machines and to fix problems with the compilation. Also, local compilation can slow down the machine quite considerably.

In this section we will present a detailed survey of the current package managements systems and their package formats in order to review what are the currently used metadata and how they are used by package management systems.

1.1  Debian Packages

DEB[DPa], the package management system used by Dpkg, the package manager for the Debian distribution, was created in 1993 by Ian Jackson. It has been upgraded since then, and the format is now in version 2.0.

There are two types of packages: binary packages and source packages. Binary packages contain files that can be installed directly from the package file; source packages contain source code that can be used to create binary packages—it is possible to create multiple binary packages from one source package.

An example of the contents of a DEB file (the ocaml binary package from debian-unstable) can be seen in Figure 1.

1.1.1  DEB file format


  • ocaml_3.08.3-8_i386.deb
    • debian-binary (version)
    • control.tar.gz
      • postinst (post-install script)
      • prerm (pre-removal script)
      • postrm (post-removal script)
      • md5sums (MD5 sums for data.tar.gz)
      • control (package metadata)
    • data.tar.gz
      • /usr
      • /usr/lib
      • /usr/lib/ocaml/3.08.3
Figure 1: Composition of a DEB package

A DEB package (binary or source) is an ar file which has three members (see Figure 1): the package version (which nowadays is 2.0) and two gzip-compressed tar archives containing, respectively, metadata (in proper DEB terminology, the control data) and the files that are to be installed as part of the package.

The control archive contains all metadata. Besides a control file in which the metadata are stored, it contains MD5 sums for the package data, as well as scripts that are to be run when installing or removing the package.

1.1.2  Debian Binary Package Metadata

The control file, which contains all metadata, is a text file which consists of paragraphs, each of which consists of fields. The paragraphs are separated by blank lines.

A field is usually a single line which contains the field name, followed by a colon and the field contents. It is possible to include fields that span multiple lines; in that case, the second and further lines start with a space.

An example of a control file, once again from the ocaml package, can be seen in Figure 2.


Package: ocaml
Version: 3.08.3-7
Section: devel
Priority: optional
Architecture: i386
Depends: ocaml-base (= 3.08.3-7), ocaml-base-3.08.3, 
         ocaml-nox-3.08.3, ocaml-base-nox (>= 3.08.3-6)
Suggests: xlibs-dev, tcl8.4-dev, tk8.4-dev
Provides: ocaml-3.08.3
Installed-Size: 7052
Maintainer: Debian OCaml Maintainers 
            <debian-ocaml-maint@lists.debian.org>
Description: ML language implementation with a class-based 
 object system Objective Caml is an implementation of the 
 ML language, based on the Caml Light dialect extended with 
 a complete class-based object system and a powerful module 
 system in the style of Standard ML.
 ...
Figure 2: Example of a DEB control file

A list of all possible fields (except for dependencies) that occur in the control file of a binary package follows:

Package (mandatory)
The package name. This name must consist only of lower-case letters, digits, plus and minus signs and periods. It must be at least two characters long and start with a letter or a digit.
Source
This field identifies the name of the source package from which the package was created.
Version (mandatory)
The package version number. For a more detailed explanation of version numbers, see below.
Section
This field can be used to classify packages. There are three sections: main, contrib and non-free. For more on the different sections, see below.
Priority
Can be one of required (the system will not function without it), important (the bare minimum expected on any Unix system), standard (installed by default), optional (usually installed but not necessary, for example the X Window System) and extra (everything else). Packages on the optional level and above should not conflict with each other.
Architecture (mandatory)
The architecture can either be a specific architecture (such as i386 or sparc), or all to specify an architecture-independent package.
Essential
If this field is set to yes, the package should not be removed under any circumstances, though it can be upgraded or replaced.
Installed-Size
The size of the package when it is installed.
Maintainer (mandatory)
The person responsible for the package.
Description (mandatory)
The first line of this field contains a single-line description of the package; a more detailed description in a few paragraphs can be found in the following lines.

Then follow the package dependencies. In the DEB format, there are several different types of dependencies (in the list, we assume that package A depends on package B, i.e. that the dependency for package B figures in the control file of package A):

Depends
Package B must be configured before package A can be configured. This means a run dependency; package A cannot run without package B.
Recommends
Like Depends, but package B is not absolutely necessary. However, package B will usually be needed in order for package A to function properly.
Suggests
Like Depends, but package A can function properly without package A.
Enhances
The opposite of Suggests; “package A enhances package B” is similar to “package B suggests package A”.
Pre-Depends
This is an install dependency; package A cannot be installed without package B.
Conflicts
This is the opposite of Depends. It is impossible for two conflicting packages to be installed on one system at the same time.
Replaces
This field provides for a way to resolve conflicts. If package A and B conflict, and if package A replaces package B, then package B will be removed and package A will be installed. Furthermore, this field can also be used to indicate that a package overwrites files from another package (something that would normally lead to an error).

A dependency can also limit the versions of the package that satisfy it. For example, in the ocaml package shown above, the version of ocaml-base installed must be exactly equal to 3.08.3-7, while the version of ocaml-base-nox installed must be greater than or equal to 3.08.3-6.

There are five different ’version operators’:

=
Exactly equal;
<=
Earlier or equal;
>=
Later or equal;
<< or < (deprecated)
Strictly earlier;
>> or > (deprecated)
Strictly later.
Virtual packages

It is also possible to declare dependencies on virtual packages. A virtual package is a package that does not in itself exist, but must be provided by another package. For example, the ocaml package provides the virtual package ocaml-3.08.3; any dependency on ocaml-3.08.3 can be satisfied by the ocaml package.

A more complex example would be a virtual package named web-server. A package that needs a web server, but is not interested in any particular web server, could declare a dependency on web-server. Any package that provides web-server could then satisfy that dependency.

Virtual packages do not have versions, but the possibility to add this functionality to later versions of the DEB format is specifically left open.

In the ocaml example, however, we can see that some notion of versions has already been informally added: the ocaml package depends on the virtual package ocaml-3.08.3, which is provided only by ocaml version 3.08.3; earlier versions provide ocaml-3.08.2 and so on, in effect providing some sort of version requirement.

1.1.3  Debian Source Package Metadata

As referenced several times earlier, the DEB format also has source packages. From one source package, it is possible to build several binary packages, and this is reflected in the fact that the control file for a source package consists of one general paragraph and several paragraphs for the binary packages that can be created from the source package.

The control information of a source package is different from the one used in binary packages: A source package may build-depend on or build-conflict with other packages, thus expressing requirements for the source package to compile. Since the source package is in general common to several architectures it contains schemata for the control information of the binary packages which are instantiated at compilation time. For instance, the architecture may now either consist of a list of architectures (where any abbreviates the list of all supported architectures), the word any which would be replaced in the created binary package by the actual architecture string, or all for an architecture-independent binary package. Dependencies in the schema for the control information of a binary package may be qualified by an architecture specifier. The schema may also contain variables which are instantiated at compilation time.

1.1.4  Debian Version Numbers

A DEB version number consists of three components:

First, the epoch, a single integer number. This is the most important component; whatever the rest of the version number, a package of epoch n+1 will always be of higher version than a package of epoch n. It is intended to be used in case of a change in version numbering scheme, or if a mistake is made. The epoch is optional (if not present, epoch 0 is assumed), and separated from the upstream version by a colon. If there is no epoch, the upstream version may not contain any colon.

Then, the upstream version. This usually is the original version of the software that has been packaged. It may contain letters, digits, periods, plus and minus signs and colons. It should start with a digit.

Next comes the optional Debian revision, separated from the upstream version by a minus sign; if the Debian revision is not present, the upstream version may not contain a minus sign. The Debian revision, which is of the same form as the upstream version, indicates the version of the Debian package based on the upstream version; therefore, it changes if the Debian package is changed, but the upstream version is not. It is conventional to reset the Debian revision to at 1 every time the upstream version is changed.

Version comparison is done from left to right; first the epoch, then the upstream version and finally the Debian revision are compared.

For any two strings that must be compared (epoch to epoch, upstream version to upstream version or Debian revision to Debian revision), firstly the initial parts that contain only non-digit characters are determined and compared lexicographically. If there is no difference, the initial parts of the remainders that contain only digits are compared numerically. This process (comparing non-digit strings lexicographically and digit strings numerically) is repeated until either a difference is found or both strings are exhausted.

1.1.5  Debian Sections

The main section

The packages in the main section all comply with the Debian Free Software Guidelines[DPb]. Furthermore, packages in the main section do not have any ‘positive’ dependencies (Depends, Recommends or Build-Depends dependencies) on packages outside the main distribution.

They also conform to a certain standard of quality (“they must not be so buggy that we refuse to support them”).

The contrib section

The contrib section contains packages that do conform to the Debian Free Software Guidelines, but that do not satisfy the requirement of having no dependencies on packages outside the main section.

The standard of quality is the same as for the main section.

The non-free section

In the non-free section, packages can be placed that do not conform to the Debian Free Software Guidelines.

Non-US sections

Each of the three sections mentioned above has a non-US subsection. Packages that are in the main section cannot depend on packages that are in the non-US subsection of main, but it is possible for packages in the non-US subsection of main to depend on packages that are in the main section. The same goes for the contrib section.

1.2  RPM Packages

In this section we investigate the format of RPM packages [Bai], starting from their structure and focusing in particular on the attributes which detail the metadata associated with the package and, in particular, attributes that represent the relationships with other packages. RPM packages can be of two different types:

  1. Binary packages: contains a compiled and ready to install/run packages software.
  2. Source packages: contains the source code to build and package the software into a binary package.

In this document we will address only binary packages.

1.2.1  RPM Package Naming Convention

RPM packages follow a well defined naming convention in order to maintain consistency between the name of the package file and the information encoded in the RPM package format and contained in the file itself. This naming convention is also endorsed by all the tools that supports RPM package creation. Since all the information regarding the RPM package are self contained in the package itself, an RPM package will continue to be usable even if its file name is renamed to some other file name which does not follow the naming convention.

Moreover, it is important to notice that the same naming convention is also used in some metadata fields in the RPM package format (see Section 1.2.2)

The standard RPM package naming convention is the following:

name-version-release.architecture.rpm

where:

  • name is the name of the packaged software (e.g., bash, xorg, gnome, etc.). Often package names are used for describing subpackages, that is, packages that derive from the same software distribution. This is the case, for example, when a single software is split into different packages, each one of them providing different and additional functionalities. For example, the xorg-x11-libs software package provides all the libraries needed to run X11 applications, while xorg-x11-libs-devel provides also the files needed to compile and link applications against X11 libraries.
  • version is the packaged software version. It may contain any character except the dash (’-’) one.
  • release is usually a number which indicates how many times the software has been packaged. However it is usually used also to give other kinds of information, such as the initials of the packaging entity (e.g., mdk). The release field follows has the same restrictions of the version one.
  • architecture is a string describing the hardware architecture name the package has to be run on. The string noarch is used when a given package is compatible with all the architectures (e.g., packages which contains software written in some scripting language). On the other hand, The string src can be used instead of the actual architecture name in order to indicate that the package is source package which actually contains the source code to build the software. Table 1 shows the list of supported architectures.

Here are some examples of package names found in various Linux distributions: mc-4.6.1-0.pre3.2mdk.i586.rpm, gedit-0.9.7-2.i386.rpm, gaim-1.3.0-1.fc4.i386.rpm, kphone-4.1.1-1.fc4.x86_64.rpm. Notice how the release field is often used, besides to show the actual release number, to indicate the distribution the package belongs to as well (i.e., Mandriva/Mandrake (mdk), Fedora Core4 (fc4), etc.)

1.2.2  RPM File Format

RPM packages are bundled in a binary format. The format is the same for both binary and source packages. The current version of the RPM format is 3.0 and it is used by all the RPM package managers since version 2.1.

An RPM package is divided into four logical sections:

  1. Lead: Contains the package format signature and some information concerning the structure of the package itself.
  2. Signature: A collection of digital signatures that are used to sign the package by using cryptographic techniques.
  3. Header: Contains all the package metadata, such as package description, package relationships etc.
  4. Payload: The actual archive which contains all the files that are bundled with the package.

RPM being a (multi-platform) binary file format, it has been designed in order to be correctly handled by the RPM package manager, despite of the actual platform it is executed. In particular the reference byte-ordering is the one defined for the Internet (network byte order)

The Lead section

The Lead section of an RPM package is basically used as a “signature” in order to identify the file as an RPM package. For example, the Unix file command uses this information in order to recognize the format. RPM package managers and other RPM oriented tools use this information as well.

Much of the information that is present in the Lead section is obsolete and is actually ignored by current RPM package managers. Moreover, that information is duplicated in the Header section. It is maintained only for backward compatibility of the file format with older tools.

The structure of the Lead section is represented by the data structure rpmlead (defined in lib/rpmlib.h) in the RPM source tree [RT] and is made of the following fields:

  • magic (4 bytes)
    The byte sequence 0xED 0xAB 0xEE 0xDB which uniquely identify the file as an RPM package.
  • major, minor (1 byte each)
    Two bytes representing the major and minor version of the RPM package format. The current version available, at the time when this document has been written, is 3.0.
  • type (2 bytes)
    Two bytes representing the type of the RPM package. Currently only two RPM package type are provided: binary (type == 0) and source (type == 1).
  • archnum (2 bytes)
    Two bytes representing the hardware architecture the RPM package has been built for. The actual architecture is however indicated in the Header section. Table 1 shows the supported architectures and their mappings to the ID used in the archnum field.
  • name (66 bytes)
    A null-terminated, zero-padded, string representing the name and the version of the package using the standard RPM package name specification conventions (see Section 1.2.1).
  • osnum (2 bytes)
    Two bytes representing the Operating System the package was built for. Table 1 shows the supported operating systems and their mappings to the ID used in the osnum field.
  • signature_type (2 bytes)
    Two bytes representing the format of the signature section. Currently, version 3.0 of the RPM package format mandates the header-style format for this section and, therefore, the value of this field will always be set to 5.

ArchitectureIDArchitectureIDArchitectureID
i3861alphaev62ppc6416
i4861sparc3ppc64iseries16
i5861sun43ppc64pseries16
i6861sun4m3m68k6
Athlon1sun4c3rs60008
Pentium31sun4d3ia649
Pentium41sparcv83armv3l12
AMD1sparcv93armv4b12
x86_641sparc6410armv4l12
AMD641sun4u10s39014
ia32e1mips4i37014
alpha2mipsel11s390x15
alphaev52IP7sh17
alphaev562ppc5xtensa18
alphapcap562ppciseries5  
alphaev62ppcpseries5 
    
OSIDOSID
Linux1IRIX6410
IRIX2NEXTSTEP11
SunOS53BSD_OS12
SunOS44machten13
AmigaOS5CYGWIN32_NT14
AIX5CYGWIN32_9515
HP-UX6UNIX_SV16
OSF17MiNT17
osf4.07OS/39018
osf3.27VM/ESA19
FreeBSD8Linux/39020
SCO_SV9Linux/ESA20
Table 1: RPM Supported architectures and operating systems

The Header structure

The header structure defines the format of the header and signature section of an RPM package file. The choice of the names is a bit confusing and it is maintained for historical reasons. The header structure is quite complicated and it models a small database where it is possible to store and retrieve arbitrary data by the means of keys, called tags. The header structure is composed of several header entries that logically provide the actual data. An entry is characterized by the following attributes:

  • tag describes the kind of data that is associated with the current entry. Table 2 shows the available tags in version 3.0 of the RPM package format.

    Tag 
    RPMTAG_NAMERPMTAG_VERSION
    RPMTAG_RELEASERPMTAG_EPOCH
    RPMTAG_SUMMARYRPMTAG_DESCRIPTION
    RPMTAG_BUILDTIMERPMTAG_BUILDHOST
    RPMTAG_INSTALLTIMERPMTAG_SIZE
    RPMTAG_PROVIDENAMERPMTAG_REQUIREFLAGS
    RPMTAG_REQUIRENAMERPMTAG_REQUIREVERSION
    RPMTAG_CONFLICTNAMERPMTAG_CONFLICTVERSION
    Table 2: Header structure tags

  • type defines the type of the data associated with the tag for the current entry. Table 3 shows some of the data types available in version 3.0 of the RPM package format. Currently there are more than 200 tags defined in the RPM package format (all the tags are defined in lib/rpmlib.h in the RPM source tree [RT]) and they are used to specify all the metadata information which describe an RPM package.

    TypeID
    NULL0
    CHAR1
    INT82
    INT163
    INT324
    INT645
    STRING6
    BIN7
    STRING_ARRAY8
    I18NSTRING_TYPE9
    Table 3: Header structure data types

  • count defines the number of items of the specified type stored in the actual data associated with the current entry. Some of the data types, for example the STRING one, allow only a count equal to 1.

The format of the header structure, on the other hand, is made of the following fields:

  • magic (3 bytes)
    The byte sequence 0x8E 0xAD 0xE8 which identify the beginning of the header structure.
  • version + reserved (1 + 4 bytes)
    A byte defining the version of the header structure and 4 more bytes reserved for a future usage.
  • entries count (4 bytes)
    Four bytes representing a 32bit integer which gives the number of entries stored in the header structure.
  • data size (4 bytes)
    Four bytes representing a 32bit integer which gives the total size in bytes of the data associated to all the entries stored in the header structure.
  • index (n * 16 bytes)
    A set of n 16-bytes records, where n is the number of entries specified by the entries count attribute, where each record contains the following attributes: tag, type, offset, count. These attributes actually defines the logical entry described above. The offset attribute is a byte offset relative to the beginning of the data part of the header structure, where the actual data associated with the current entry is stored.
  • data (k bytes)
    A sequence of bytes which contains all the data associated with all the entries stored in the header structure. The size of the data part depends on the actual data stored for each entry.
The Signature section

The signature section contains one or more digital signatures for assessing the origin of the package. The signature section is stored by using the header structure format described in Section 1.2.2. The signature section may contain multiple signatures. However every RPM package must have at least a signature which specifies the size of the package (identified by the tag RPMTAG_SIGSIZE) and a signature which gives the MD5 hash of the package (identified by the tag RPMTAG_SIGMD5). Multiple cryptographic signatures, identified by the relative tags (e.g., RPMTAG GPG, RPMTAGPGP, etc.) could present, but are not required.

The Header section

The header contains all the metadata information regarding the RPM package itself. It is stored by using the header structure format described in 1.2.2 and provides all the information needed to handle a given RPM package. A detailed description of the relevant metadata attributes is presented in Section 1.3

The Payload section

The payload section contains the actual archive with all the files belonging to the RPM package. The format of the payload is a gzipped cpio archive which is uncompressed, depending on the directives specified in the package metadata, when the package is actually installed. The format of the cpio archive is SVR4 with a CRC checksum.

1.3  RPM Package Metadata

In this section we will examine the most relevant package metadata that are present the header section (Section 1.2.2) of an RPM package. In particular we will focus on those metadata describing package relationships with other RPM packages (i.e., dependency information).

It is clear that all the metadata are encoded using the header structure format described in Section 1.2.2 by means of tags and their associated data. For the sake of clarity, in order to refer to package metadata we will use descriptive names instead of actual tag ID associated to the data.

Moreover, the descriptive names are the ones used in spec files. These are files used by automated packaging tools in order to create RPM packages. A spec file contains all the directive and the metadata information specified in a textual and readable format. Starting from the spec file package tools are able to build a standard RPM package in the format described in Section 1.2.2.

In the following section we will use the syntax and the tags taken from the standard spec file format for describing how to build RPM packages. We will not describe all the directives of the spec file format because this will be out of scope for this report. However it is possible to find a quite complete description of these directives in [RT]

1.3.1  Descriptive and naming metadata

Descriptive metadata allows the packager to specify informational attributes regarding the package itself. In the following we will detail the most relevant metadata attributes.

Package version format

As already hinted in Section 1.2.1, every package is characterized by a version that is used extensively in package metadata in order to specify relationships between packages.

Generally a complete version specification has the following format:

[epoch:]version[-release]

where:

  • epoch is a monotonically increasing integer which can be omitted and, in this case, it is assumed equal to 0.
  • version is an alphanumeric string that cannot contain the dash (’-’) character. The version number is usually set by the developer or the upstream maintainer.
  • release has a format similar to the package version and is usually a number that is increased each time a change is made to the package build files.

Obviously package versions impose an ordering on packages which is used when it is necessary to specify package relationships with other packages. The comparison algorithm breaks up the package version and is basically a segmented comparison. The version is broken up in segments, each of them containing either alphabetical character or digits. The segments are compared in order, with the rightmost segment being the least significant.

The alphabetical segments are compared using a lexicographical ascii ordering, while the digit segments are compared the same way after having removed any leading zero. If the two digit segments are equal, the longer the bigger.

No additional knowledge is embedded in the algorithm, so a version number 5.6 will be older than 5.0000503 because the comparison will be made between 6 and 503 (i.e. 0000503 without leading zeroes), and 503 > 6.

Name tag

The Name tag specifies the name of the software being packages. It follows the naming conventions described in Section 1.2.1.

Version tag

The Version tag specifies the version of the software being packages. Usually it matches the version number of the software itself and it specifies the version part of the complete version specification described in Section 1.3.1.

Release tag

The Release tag specifies the version part of the complete version specification described in Section 1.3.1.

Epoch tag

The Epoch tag specifies the epoch part of the complete version specification described in Section 1.3.1.

Description and Summary tags

The Description tag is used to provide an in-depth description of the packaged software while the Summary tag is used only for giving brief description of the same packaged software.

Group tag

The Group tag provides a way to organize packages into groups. A group specification consists of a series of strings separated by the ’/’ character. This allows the specification of subgroups as if they were subdirectories, e.g. Application/Editors.

1.3.2  Dependency Metadata

Dependency metadata are used to establish relationships between packages. Those relationships are used in order to ensure that once the packaged software is installed, the system will provide anything it needs to work properly (i.e., other packaged software, libraries, etc.)

By (correctly) specifying dependency metadata it is possible to guarantee that package management operations (see Section 1.4) will not break the consistency of the system when they are performed.

In this section we describe the metadata tags that are used in RPM packages in order to explicit relationships between packages, and we will detail their semantics.

Dependency specification

A dependency relation is always specified by using the name of a package and, in case, some additional constraint defined by using arithmetic comparison operator. This is possible because, as described in Section 1.3.1, version number are totally ordered.

The RPM package format allows the usage of the following comparison operators: <, <=, =, >=, =, when specifying dependency relation constraints. The semantics of those operators is the standard one, applied to version numbers.

Dependency relation specifications establish relations between the current package in its current version and the set of other packages entailed by the dependency specification:

  • If only a package name P is specified in the dependency relation, then there is a dependency between the current package in its current version and package P in all or any (depending on the semantics of the relation) of its versions.
  • If a package name P and a constraint C on its version number is specified (e.g., >= 2.3) , then there is a dependency between the current package in its current version and package P in all or any (depending on the semantics of the relation) versions that satisfies the constraint C
Requires tag

The Requires tag is used to specify what are the packages that are needed in order to make the packaged software work. At least one of the packages identified by the dependency specification must be present when installing the current package. Actually (see Section 1.4) the required packages could even not be currently installed in the system. However it is important that during the installation process, in which multiple packages may be processed, there is a package which satisfy the dependency relation. Obviously, if only a single package is requested for installation, then there should be an already installed package that satisfies the dependency relation.

Requires: ncurses, libmpeg >= 2.3

The previous example shows a Requires dependency where the current package needs whatever version of the package ncurses and a version greater or equal to version 2.3 of the package libmpeg installed.

PreReq tag

The PreReq tag has exactly the same semantics of the Requires tag (see Section 1.3.2) but it mandates that at least one the packages identified by the dependency constraint must be already installed in the system before attempting to install the current package.

Conflict tag

The Conflict tag is the complement of the Requires tag and is used to specify what are the packages that must not be installed in order to make the packaged software work. All the packages identified by the specification must not be present when trying to install the current package (neither already installed in the system, nor in the package list to be processed).

Conflicts: sendmail

The previous example shows a Conflicts dependency where the current package cannot coexist with any version of the sendmail package.

Provides tag (Virtual packages and capabilities)

The Provides tag is used to declare a capability which is provided by the current packages. Actually the provided capability is often referred to as Virtual package, that is actually an alias that can be used in dependency relation to refer to those real packages which provide it.

The Provides tag offers also a way to group packages together. In fact, by specifying a dependency relation using a virtual package or capability identifier, we can implicitly identify all the actual packages which provide that virtual package through a Provides declaration.

Usually, the Provides tag is also used to provide file dependencies as if they were virtual packages (e.g., Provides: /bin/sh). This is particularly useful when RPM packages are used on systems in part managed by an RPM package management system. The advantage of doing so is that a package providing a virtual package /bin/sh can be safely removed without actually removing the file /bin/sh.

Provides: lda

The previous example shows a package which provides a virtual package lda (abbreviation for Local Delivery Agent). The identifier lda can be used, for example, by the sendmail package, in its Requires dependency, in order to model the fact that sendmail to properly work needs a local delivery agent (lda).

It is important to notice that when virtual packages are used to specify dependency relations, it is not possible to use version constraint on them. This is obvious because a virtual package may be provided by different package types whose version numbers are, of course, incomparable.

Finally, often there are packages that Provides a virtual package and Requires the same virtual package. For example the package bash provides the virtual package bash and also requires it in order to be installed. This could be seen as a contradiction, but for what we have said in Section 1.3.2, the package could be nevertheless installed because bash will provide all the requirements the package itself.

Obsoletes tag

The Obsoletes tag says which packages are obsoleted by this one. Older versions of the package are automatically obsoleted.

Automatic dependencies

When building a RPM package, a set of dependency relation are implicitly declared. In order to do so, starting from the list of the files that make up the package, for each file in the list the following operations are performed:

  • If the file is executable then it is examined by using the ldd command (which yields the shared libraries required by each program or shared library specified on the command line) in order to find out what are the shared libraries it needs. These shared libraries names are actually added to the RPM package as Requires dependencies.
  • If the file is a shared library, then its soname (the name used by a shared library to determine compatibility between different versions of the same shared library) is added to the capabilities the RPM package provides by using the Provides tag.

Even if automatically provided and required library names may seem file names, they are actually capability identifiers that are not related to actual file names contained in the package itself.

For example, a package containing the command ls will automatically require the following libraries:

linux-gate.so.1
librt.so.1
libc.so.6
libpthread.so.0
/lib/ld-linux.so.2

1.3.3  Script metadata

In a RPM package it is possible to find metadata which provide an operational behavior that is executed at some stage, after having issued an operation on a package. These metadata simply specify shell-script or script written in some other interpreted language, and are executed by the RPM package management system.

Many script metadata are used to handle source RPM packages, in order to automate the building process. The following ones, however, are used when actions are performed on binary packages, in particular, during installation and removal of RPM packages.

%pre tag

The %pre script is executed just before the package is to be installed.

%post tag

The %post script is executed after the package has been installed. It usually contain some setup command, such as the execution of ldconfig to update the system library cache, or the editing of system wide configuration files (e.g., when a new shell is installed the /etc/shell is updated accordingly).

%preun tag

The %preun script is executed just before the removal of a package.

%postun tag

The %postun script is executed just before the removal of a package. It usually contains cleanup code and complementary action with respect to those specified in the %post script.

1.4  RPM package management system

The RPM package management system is build around a single command line utility rpm, which provides the user all the functionalities to:

  • Build RPM package starting from source code.
  • Install RPM packages.
  • Remove RPM packages.
  • Upgrade RPM packages.
  • Query uninstalled RPM packages for information.
  • Query the installed base of RPM packages.

rpm make use of a central database where it stores all the information about the packages that are already present in the system. Each operation provided by rpm queries this database in order to perform consistency checks with respect to package dependencies.

1.4.1  RPM package installation

When rpm executes a package installation it performs the following steps:

  • Dependency check: starting from the information provided by package metadata (see Section 1.3.2) rpm checks that all the required capabilities and packages are present in the current systems (or in the other packages to be installed with the current one).
  • Conflict check: If there exists a package which does not satisfy all the conflict constraint described in the package metadata then the installation is aborted.
  • %pre script execution.
  • Configuration files processing: If the package installs some configuration files (as specified in the package metadata) it is possible that these configuration files might overwrite the already present ones. This situation is handled carefully by making backup copies before actually overwrite those files.
  • Payload archive unpacking.
  • %post script execution.
  • Central database update.

1.4.2  RPM package removal

When rpm executes a package removal it performs the following steps:

  • Dependency check: rpm checks the central database in order to make sure that there is no other package that has a dependency relationship with the package being removed.
  • %preun script execution.
  • Config file backup: if some of the config files have been modified, rpm makes a backup copy before removing them.
  • File check and removal: for each file belonging to the package being removed, rpm checks whether that file belongs also to some other package in the system. If they don’t it removes them.
  • %postun script execution.
  • Central database update.

1.4.3  RPM package upgrade

When rpm executes a package upgrade it basically performs first an installation of the package and then a removal of the upgraded ones taking care of correctly handling the various config files that are present in the packages.

1.5  A comparison between the RPM and Debian package systems

Most important features for package management are common to RPM and Debian (dependencies, versioning, informational metadata, and the like) but certain features are quite different, and we list here the more relevant ones.

File dependencies
File dependencies is a feature that is present in the RPM format but not in DEB. It allows a package to require specific files, instead of packages. The problem is that these dependencies are not explicitly present in the list of provides of the packages, and RPM uses information from the list of files in the packages to handle them. Tools manipulating such dependencies need to find all files required by the packages that they want to install or remove, and look for them in the list of files present in all packages known to the system at execution time. Since this package universe changes over time, this kind of file dependencies may become a source of problems for tools trying to perform sophisticated manipulations of package sets.
rpmlib dependencies
RPM has some special dependencies for requiring some features that are not present in some versions of RPM itself. They’re not provided by the RPM package and are treated as a special case by rpmlib, which individually checks these dependencies against a list of features compiled into rpmlib. Tools that manipulate RPM packages, apart from rpm itself, ignore all such features present in the requires list of packages.
ORed dependencies
This feature is only present in DEB, but absence of it in RPM does not change the expressibity of the dependency language, as an OR dependency can be directly simulated using an artificial capability and multiple packages providing this very same capability (this is similar to what DEB’s provides: tag does).
Package relevance
DEB files allow to specify some relevance information about packages that RPM has not:
Package priority
is an important feature of deb which is absent in RPM. Package priorities tell how important the package is for the system and are used in situations such as when APT needs to choose which package it should install or remove to satisfy some dependency.
Essential packages
The essential tag of deb is used by apt to determine whether a package can be removed. If the user attempts to remove a package like glibc or bash, it will issue a warning and ask for confirmation.
Mixed tools like apt-rpm use a file containing a list of all RPM packages and their respective priorities, listing at least the important and essential packages for the distribution being used.
Multiple simultaneously installed versions of a package
Debian does not allow two versions of the same packages to be concurrently installed in the system, and apt does not handle that. In that system, packages that are frequently duplicated, such as the kernel or ncurses, are provided with different package names, like ncurses4 and ncurses5.
Architecture variations
Some RPM packages have versions compiled with optimizations that are specific to a variation of an architecture. For example, the kernel may have packages compiled for i586 and i686, in addition to the generic i386 package.

The following table has been taken from [rpm] and shows a comparison matrix between the two package formats DEB and RPM.

Feature deb rpm
Security, authentication, and verification
signed packages yes[1] yes

checksums
yes yes

permissions, owners, etc
yes yes

Usability by standard linux tools recognizable by file
yes yes

data unpackable by standard tools
yes [2] no [3]

metadata accessible by standard tools
yes no

creatable by standard tools
yes no
Metadata
name yes yes
version yes yes
description yes yes
dependencies yes yes
recommendations yes no
suggestions yes no
conflicts yes yes
virtual packages and provides yes yes
versioned dependencies and conflicts yes yes
boolean package relationships yes no [4]
file dependencies no yes
copyright info no [5] yes
grouping yes yes
priority yes no
Special files
config files yes yes
documentation files no yes
ghost files no yes
Package programs
binary programs allowed yes no
pre-install program yes yes
post-install program yes yes
pre-remove program yes yes
post-remove program yes yes
verify program no yes
triggers no yes
Scalability
no hard-coded limits yes yes [6]
new metadata yes yes [7]
new section yes no
format version data yes yes

Remarks:

  1. Not yet widely used though.
  2. The admin would only have to remember that a deb is an ar archive, containing some tarballs.
  3. rpm2cpio can do it, but it’s not a standard tool, except on rpm-based systems. Some fairly short programs can do it, but none of them are something you’d want to memorize.
  4. An rpm may depend on a list of packages, but boolean OR is not supported. You can often get the same effect using virtual packages and provides. This isn’t quite the same, since it does require more coordination between packagers, and the following relationship cannot be expressed with provides: foo (<< 1.1) | foo (>> 2.0)
  5. Copyright info is included in debian packages, but not in an easily extractable format.
  6. Technically, the rpm "lead" contains hard-coded limits on the package name, but the lead is no longer really used by anything except file.
  7. To be useful, you need to get a tag number assigned to your new piece of metadata, which implies modifying the rpm program.

The remark made with respect to the boolean package relationships (remark number 4), is not quite exact. In fact, by using the features provided by the RPM package format, it is possible, by using the provides mechanism, to express boolean OR package relationships. In order to do so, for every distinct OR relationship to be specified we would have to introduce an unique identifier P and tag every package participating in the relationship with a provides P. However, this solution is so tricky and impractical to implement that it is not a viable alternative in the currently available RPM package management system.

1.6  Ports system

The ports system, as used under various names by FreeBSD, NetBSD (pkgsrc) and Gentoo (portage), is different from the DEB and RPM formats in that it focuses mainly on source packages instead of binary packages; the standard way of installing software is not by installing a binary package, but by compiling it from the original source.

The core of the ports system is a collection of build scripts. In the FreeBSD ports collection and the NetBSD pkgsrc, these are Makefiles; in the Gentoo portage system, they are bash scripts (called “Ebuilds”). These build scripts contain all the instructions for building the software. At the minimum, the build script contains the location where the original source can be found, but there are many possibilities for customization, e.g. the use of the GNU autoconf and automake programs, patches, specific compiler options, etc.

The ports system (under NetBSD and FreeBSD) consists of a directory tree, with every package having its own directory. These directories contain the Makefiles. In order to install a package, one simply cds to the appropriate directory and types make install.

There are minor differences between the ports system as used by FreeBSD, NetBSD and Gentoo, but the basic ideas remain the same. Therefore, we will use the NetBSD pkgsrc system as an example for the rest of this section.

1.6.1  Source packages in ports systems

Let us look at the NetBSD makefile for the ocaml package given in Figure 3.


# $NetBSD: Makefile,v 1.38 2005/06/14 21:00:41 minskim Exp $

.include "Makefile.common"

CONFIGURE_ARGS+= -no-tk CONFIGURE_ENV+= disable_x11=yes

BUILD_TARGET= world .if (${MACHINE_ARCH} == "i386") || \ (${MACHINE_ARCH} == "powerpc") || \ (${MACHINE_ARCH} == "sparc") BUILD_TARGET+= opt opt.opt PLIST_SRC= ${PKGDIR}/PLIST.opt . if ${OPSYS} != "Darwin" PLIST_SRC+= ${PKGDIR}/PLIST.prof . endif PLIST_SRC+= ${PKGDIR}/PLIST .endif

.if ${OPSYS} == "Darwin" PLIST_SRC+= ${PKGDIR}/PLIST.stub .endif

.include "../../mk/bsd.pkg.mk"

Figure 3: NetBSD makefile for the ocaml package.

We see that this Makefile invokes another Makefile, called Makefile.common, given in Figure 4. The reason for this is that there is also another package, called ocaml-graphics, which provides the OCaml language with support for X11 graphics (the ocaml package does not require X11). The common settings for both packages are in Makefile.common, whereas the settings that are only for ocaml are in the main Makefile seen above.

We see that configure arguments are set in order to disable X11, and that the compilation options are changed depending on the architecture (OCaml native compilation is only available on a few architectures). Also, the Darwin operating systems requires some additional options.

Lastly, the bsd.pkg.mk file is included; this is the general file that contains all the code that takes care of automatic downloading, extracting, patching, building and installing.


DISTNAME= ocaml-3.08.4
CATEGORIES= lang
MASTER_SITES= http://caml.inria.fr/pub/distrib/ocaml-3.08/
EXTRACT_SUFX= .tar.bz2

MAINTAINER= adam@NetBSD.org HOMEPAGE= http://caml.inria.fr/ocaml/ COMMENT= The latest implementation of the Caml dialect of ML

DISTINFO_FILE= ${.CURDIR}/../../lang/ocaml/distinfo PATCHDIR= ${.CURDIR}/../../lang/ocaml/patches

USE_TOOLS+= gmake HAS_CONFIGURE= yes CONFIGURE_ARGS+= -prefix ${PREFIX} CONFIGURE_ARGS+= -libs "${LDFLAGS}" CONFIGURE_ARGS+= -with-pthread CONFIGURE_ENV+= BDB_LIBS=${BDB_LIBS} \ BDB_BUILTIN=${USE_BUILTIN.${BDB_TYPE}} CPPFLAGS+= -DDB_DBM_HSEARCH

.include "../../mk/bsd.prefs.mk"

.if ${OPSYS} == "Darwin" || ${OPSYS} == "Linux" INSTALL_UNSTRIPPED= yes .endif

.include "../../mk/bdb.buildlink3.mk"

post-extract: cp-power-bsd cp-gnu-config

cp-power-bsd: @${CP} ${WRKSRC}/asmrun/power-elf.S ${WRKSRC}/asmrun\ /power-bsd.S

cp-gnu-config: @${CP} ${PKGSRCDIR}/mk/gnu-config/config.guess ${WRKSRC}\ /config/gnu/ @${CP} ${PKGSRCDIR}/mk/gnu-config/config.sub ${WRKSRC}\ /config/gnu/

.include "../../mk/pthread.buildlink3.mk"

Figure 4: Common Makefile.

A lot more options are set in the common Makefile. Firstly, a few options to do with the original source code (the ‘distfiles’): how the file is called, where it can be downloaded, and how it can be decompressed.

Also, some information about the software: the person responsible for packaging (the maintainer), the homepage, and a short description (‘comment’).

Then, the locations for distribution info (the size and MD5 keys of the distfiles, in order to make sure that they have not been corrupted) and patches are explicitly set; the Makefile can be used by different packages, so it could be called from different directories.

Then follow some variables that influence build behaviour; ocaml uses GNU make instead of the standard BSD make, it has a configure script that needs several arguments, and there are some options that need to be set in order to properly use the Berkeley DB system.

After this, the included bsd.prefs.mk once again is a system file that contains code to interpret all these variables.

Then, it is specified that the strip utility must not be used under Darwin or Linux.

The post-extract target is defined to specify actions that have to be taken directly after decompressing the distribution file(s). There are six phases in the build process, and for each of these, pre- and post- targets can be defined:

  1. fetch: Download the distribution files.
  2. extract: Decompress the files just downloaded.
  3. patch: Apply any patches.
  4. configure: Configure the build process.
  5. build: Build the software.
  6. install: Install the software.

In the case of the ocaml package, some files must be copied after decompressing the distribution files; this is specified under the cp-power-bsd and cp-gnu-config targets.

Also, the files bdb.buildlink3.mk and pthread.buildlink3.mk files are included. These files take care of dependencies; apparently, the ocaml package needs the Berkeley DB and the pthread libraries. Here is the buildlink3 file for ocaml is given in Figure 5.


# $NetBSD: buildlink3.mk,v 1.12 2005/02/04 21:35:51 adrianp Exp $

BUILDLINK_DEPTH:= ${BUILDLINK_DEPTH}+ OCAML_BUILDLINK3_MK:= ${OCAML_BUILDLINK3_MK}+

.if !empty(BUILDLINK_DEPTH:M+) BUILDLINK_DEPENDS+= ocaml .endif

BUILDLINK_PACKAGES:= ${BUILDLINK_PACKAGES:Nocaml} BUILDLINK_PACKAGES+= ocaml BUILDLINK_DEPMETHOD.ocaml?= build

.if !empty(OCAML_BUILDLINK3_MK:M+) BUILDLINK_DEPENDS.ocaml+= ocaml>=3.08.2 BUILDLINK_PKGSRCDIR.ocaml?= ../../lang/ocaml

. include "../../mk/bsd.prefs.mk" . if ${OPSYS} == "Darwin" INSTALL_UNSTRIPPED= yes . endif

PRINT_PLIST_AWK+= /^@dirrm lib\/ocaml$$/ \ { print "@comment in ocaml: " $$0; next }

BUILDLINK_TARGETS+= ocaml-wrappers OCAML_WRAPPERS= ocaml ocamlc ocamlc.opt ocamlcp ocamlmklib ocamlmktop \ ocamlopt ocamlopt.opt

ocaml-wrappers: ${PKGSILENT}${PKGDEBUG} \ for w in ${OCAML_WRAPPERS}; do \ ${SED} -e 's|@SH@|${SH}|g' \ -e 's|@OCAML_PREFIX@|${BUILDLINK_PREFIX.ocaml}|g' \ -e 's|@CFLAGS@|${CFLAGS}|g' \ -e 's|@LDFLAGS@|${LDFLAGS}|g' \ <${.CURDIR}/../../lang/ocaml/files/wrapper.sh \ >${BUILDLINK_DIR}/bin/$$w; \ ${CHMOD} +x ${BUILDLINK_DIR}/bin/$$w; \ done

.endif # OCAML_BUILDLINK3_MK

BUILDLINK_DEPTH:= ${BUILDLINK_DEPTH:S/+$//}

Figure 5: The ocaml buildlink file.

Here, the actual dependency variables are set; the version depended on is ocaml, greater than or equal to 3.08.2.

There are many other variables that can influence the build process in different ways. It is, for example, possible to set compile options both per package and generally (i.e. for multiple packages; the option "ssl" for example will result in all packages being compiled with SSL support, if available).

This system is very flexible, and it is possible, using the Makefile syntax, to specify very complicated build procedures. However, for most packages, specifically those that use a GNU configure script, it is enough to simply specify the location of the source files and a few options.

Virtual packages, as in the DEB and RPM formats, are not supported (possibly because it would require searching through the entire pkgsrc tree in order to find a package that provides the virtual package). They can, however, be emulated (by creating a normal package that depends on the packages that are to provide the virtual package).

1.6.2  Binary packages in ports systems

This section is based on reverse-engineering.

A binary package, under NetBSD, is simply a tarred and gzipped file that contains the files that are part of the package. Besides that, there are some special files that contain the meta-data:

+CONTENTS
The files that are installed as part of the package, optionally with MD5 keys.
+COMMENT
A one-line description of the package.
+DESC
A longer description of the package.
+MTREE_DIRS
A description of the directory structure expected by the package.
+BUILD_VERSION
The CVS tags of all files involved in the building process (in the example, Makefile is version 1.38, and Makefile.common is version 1.11)
+BUILD_INFO
Variables involved in the building process, such as compiler flags, architecture, but also dependencies. See below for more information.
+SIZE_PKG
The size of the package.
The BUILD_INFO file

This is arguably the most important file of a binary package, since it contains the package metadata. The metadata are stored in the form of a list of build variables. An example, again from the ocaml package, is given in Figure 6.


BDB_TYPE=db1
BDBBASE=/usr
PLISTIGNORE_FILES=
DISTFILES=ocaml-3.08.3.tar.bz2
PATCHFILES=
PKGSYSCONFBASEDIR=/usr/pkg/etc
PKGSYSCONFDIR=/usr/pkg/etc
PKGPATH=lang/ocaml
OPSYS=NetBSD
OS_VERSION=2.0
MACHINE_ARCH=i386
MACHINE_GNU_ARCH=i386
CPPFLAGS= -DDB_DBM_HSEARCH  -I/usr/include
CFLAGS=-O2 -I/usr/include
FFLAGS=-O
LDFLAGS= -L/usr/lib -Wl,-R/usr/lib -Wl,-R/usr/pkg/lib
CONFIGURE_ENV=BDB_LIBS=  BDB_BUILTIN=yes PTHREAD_CFLAGS=\ -pthread\ 
 PTHREAD_LDFLAGS=\ -pthread PTHREAD_LIBS= PTHREADBASE=/usr disable_x11=yes
 CC=cc CFLAGS=-O2\ -I/usr/include CPPFLAGS=-DDB_DBM_HSEARCH\ -I/usr/include
 CXX=c++ CXXFLAGS=-O2\ -I/usr/include COMPILER_RPATH_FLAG=-Wl,-R F77=f77
 FC=f77 FFLAGS=-O LANG=C LC_COLLATE=C LC_CTYPE=C LC_MESSAGES=C
 LC_MONETARY=C LC_NUMERIC=C LC_TIME=C
 LDFLAGS=-L/usr/lib\ -Wl,-R/usr/lib\ -Wl,-R/usr/pkg/lib
 LINKER_RPATH_FLAG=-R PATH=/usr/tmp/lang/ocaml/work/.wrapper/bin:
 /usr/tmp/lang/ocaml/work/.buildlink/bin:/usr/tmp/lang/ocaml/work/.gcc/bin:
 /usr/tmp/lang/ocaml/work/.tools/bin:/usr/pkg/bin:/sbin:/usr/sbin:/bin:
 /usr/bin:/usr/pkg/sbin:/usr/pkg/bin:/usr/X11R6/bin:/usr/local/sbin:
 /usr/local/bin:/usr/pkg/bin:/usr/X11R6/bin PREFIX=/usr/pkg
 PKG_SYSCONFDIR=/usr/pkg/etc
 INSTALL_INFO=/usr/tmp/lang/ocaml/work/.tools/bin/install-info
 MAKEINFO=/usr/tmp/lang/ocaml/work/.tools/bin/makeinfo MAKE=make
 WRAPPER_DEBUG="yes" WRAPPER_UPDATE_CACHE="yes"
CONFIGURE_ARGS=-prefix /usr/pkg -libs " -L/usr/lib -Wl,-R/usr/lib
  -Wl,-R/usr/pkg/lib" -with-pthread -no-tk
OBJECT_FMT=ELF
LICENSE=
RESTRICTED=
NO_SRC_ON_FTP=
NO_SRC_ON_CDROM=
NO_BIN_ON_FTP=
NO_BIN_ON_CDROM=
CC_VERSION=gcc-3.3.3
GMAKE=GNU Make 3.80
PKGTOOLSVER=20050318
REQUIRES=/usr/lib/libc.so.12
REQUIRES=/usr/lib/libcurses.so.6
REQUIRES=/usr/lib/libm.so.0
REQUIRES=/usr/lib/libm387.so.0
REQUIRES=/usr/lib/libpthread.so.0
Figure 6: BUILD_INFO file for ocmal.

We see that all kinds of information are stored in the file, including information on dependencies, though the dependencies here are on libraries, not on packages.

Conspicuously absent are the package name and version. These are stored in the +CONTENTS file.

1.7  Portage

Gentoo Linux is using a packaging system very different from other distributions. It is inspired by the BSD ports system, with new advanced features. This system, called Portage allows to install programs, by compiling them automatically from sources, with all the optimizations for your computer, according to your choices.

More informations about Portage can be found in the Gentoo Handbook [GPb], the Gentoo Developer Handbook [GPa], and Gentoo manual pages [AGMGNJMF].

Some of the advanced features of Portage are:

  • the ability to have multiple versions and revisions of the same package in the tree,
  • conditional dependencies between packages,
  • sandboxed safe installation,
  • configuration file protection and profiles.

Gentoo’s release model is based on the following ideas: There is only one package repository which is evolving continuously. Each package lives together with other versions of the same program and you can decide which version you want on your system. Packages are tagged by keywords, indicating for each hardware architecture whether it is available, not available, or available but not tested sufficiently. For example, if a package is tagged “x86 ppc ~alpha -hppa ~amd64”, it means that it is available on x86 and ppc, not available on hppa, not tested sufficiently on alpha and amd64, not tested on other architectures. Packages with a tag “-” or “~” are masked, that means that by default they won’t be installed (but you can decide to override the flag).

Package developers can allow two different versions of a package to be installed in the same system.

1.7.1  The Portage Tree and ebuilds

The portage tree is a directory tree on your system where all the informations on packages are stored. There is one directory for each package, containing all the versions of the package. All the informations about a version are in a file called ebuild. Ebuilds are bash shell scripts defining variables (DESCRIPTION, HOMEPAGE, DEPEND, KEYWORDS, etc.) and bash functions (pkg_setup, src_unpack, src_compile, src_install, pkg_preinst, pkg_postinst, pkg_config, etc.), which can use a set of predefined function. An excerpt from an ebuild is given in Figure 7


# Copyright 1999-2005 Gentoo Foundation

Distributed under the terms of the GNU General Public License v2

$Header: /var/cvsroot/gentoo-x86/app-editors/emacs/emacs-21.4-r1.

ebuild, v 1.15 2005/08/23 03:12:54 agriffis Exp $

DESCRIPTION="An incredibly powerful, extensible text editor" HOMEPAGE="http://www.gnu.org/software/emacs" SRC_URI="mirror://gnu/emacs/${P}a.tar.gz leim? ( mirror://gnu/emacs/leim-${PV}.tar.gz )"

LICENSE="GPL-2" SLOT="21" KEYWORDS="alpha amd64 arm hppa ia64 ppc ppc64 s390 ~sh sparc x86" IUSE="X Xaw3d gnome leim lesstif motif nls nosendmail"

RDEPEND="sys-libs/ncurses sys-libs/gdbm X? ( virtual/x11 >=media-libs/giflib-4.1.0.1b >=media-libs/jpeg-6b-r2 >=media-libs/tiff-3.5.5-r3 >=media-libs/libpng-1.2.1 !arm? ( Xaw3d? ( x11-libs/Xaw3d ) motif? ( lesstif? ( x11-libs/lesstif ) !lesstif? ( >=x11-libs/openmotif-2.1.30 ) ) gnome? ( gnome-base/gnome-desktop ) ) ) nls? ( sys-devel/gettext ) !nosendmail? ( virtual/mta )" DEPEND="${RDEPEND} >=sys-devel/autoconf-2.58"

PROVIDE="virtual/emacs virtual/editor" SANDBOX_DISABLED="1"

DFILE=emacs-${SLOT}.desktop

src_unpack() { ... }

src_compile() { ... }

...

Figure 7: Excerpt from an ebuild.

Naming conventions

pkg-ver{suf{#}}{-r#}.ebuild where suf is one of alpha < beta < pre < rc < (no suffix) < p and -r# is gentoo specific revision number. For example, linux-2.4.0pre10-r2.ebuild.

The USE flags

When you install a Gentoo system, you need to define “USE” flags. USE variables are used to tell portage:

  • what package you want to install
  • what features a certain package should support

E.g. if you don’t put the kde keyword in your USE flags, packages that have optional KDE support will be compiled without it packages that have optional KDE dependency will be installed without installing the KDE libraries (as dependencies). Default USE is defined in /etc/make.profile/make.defaults:

USE="oss apm arts avi berkdb bitmap-fonts crypt cups encode 
fortran f77 font-server foomaticdb gdbm gif gpm gtk gtk2 imlib 
jpeg kde gnome libg++ libwww mad mikmod motif mpeg ncurses nls 
oggvorbis opengl pam pdflib png python qt quicktime readline 
sdl spell ssl svga tcpd truetype truetype-fonts type1-fonts X 
xml2 xmms xv zlib"

You can add your own flags in /etc/make.conf, for example: USE="-kde -qt msn yahoo jabber". You can declare USE-flags for individual packages (not system-wide), or just for one installation. A list of available USE-flags in /usr/portage/profiles/use.desc is:

gtk     - Adds support for x11-libs/gtk+ (The GIMP Toolkit)
gtk2    - Use gtk+-2.0.0 over gtk+-1.2 in cases where a 
                                        program supports both.
gtkhtml - Adds support for gnome-extra/gtkhtml
imap    - Adds support for IMAP
...

You can also use local  USE-flags:

app-editors/emacs:multi-tty - Add multi-tty support
app-editors/emacs:nosendmail - If you do not want to install any MTA

Some packages do not only listen to USE-flags, but also provide USE-flags. When you install such a package, the USE-flags they provide are added to your USE setting (for example: kde provided by kde-base/kdebase).

1.7.2  Dependencies

Dependencies between packages are described in ebuilds in the variables DEPEND and RDEPEND. DEPEND tells Portage about which packages are needed to build the package. The RDEPEND variable specifies which packages are needed for the package to run.

In dependencies, you write gentoo packages names:

RDEPEND="sys-libs/ncurses
         sys-libs/gdbm"

meaning that any version of these packages will fit.

You may also specify a version number, for example:

RDEPEND=">=media-libs/giflib-4.1.0.1b
         =media-libs/jpeg-6b-r2
         ~sys-apps/qux-1.0
         =sys-apps/foo-1.2*
         !sys-libs/gdbm"

which means that you need a version of giflib newer or equal to 4.1.0.1b, exactly jpeg-6b-r2, (you can also have <, >, or <=), and:

~sys-apps/qux-1.0 will select the newest portage revision of qux-1.0.

=sys-apps/foo-1.2* will select the newest member of the 1.2 series, but will ignore 1.3 and later/earlier series. That is, foo-1.2.3 and foo-1.2.0 are both valid, while foo-1.3.3, foo-1.3.0, and foo-1.1.0 are not.

!sys-libs/gdbm will prevent this package from being emerged while gdbm is already emerged.

As you can see in the example, Portage allows to do conditional dependencies. For example X? means that the following parenthesis will be in the dependencies only if X is in the USE flags. !arm? means that the following parenthesis will be in the dependencies only if arm is not in the USE flags.

A package can also depend on either a package or another one. Examples:

DEPEND="|| ( app-games/unreal-tournament 
             app-games/unreal-tournament-goty )"
DEPEND="|| ( sdl? ( media-libs/libsdl ) 
             svga? ( media-libs/svgalib ) 
             opengl? ( virtual/opengl ) 
             ggi? ( media-libs/libggi ) 
             virtual/x11 )"

In the last example, one of the packages will be chosen, and the order of preference is determined by the order in which they appear.

Virtuals

A package can provide a virtual package, so that other packages can depend on it. This is useful for example when a package depends on a system logger or a mail transport agent, but not on a particular one.

PDEPEND

The variable PDEPEND contains a list of all packages that will have to be installed after the program has been compiled.