This section is based on [EDO05] which was written in September 2005. Note that some of the package systems described here have evolved since then, so some of the technical details may be outdated.
Packages are a convenient way for users to get new (or updated) software installed on their computer. A package is more than just a collection of files to be installed. It usually contains additional information required for the proper installation and/or uninstallation: other packages that this package depends on, directories for files to be installed, menu items for the desktop environments, scripts to be executed before/after installing/uninstalling the package, and more. Packages are usually not installed manually by the user, but using a package manager. The package manager’s duty is to automate (as much as possible) the process of installing, upgrading, configuring, and removing software packages from the user’s computer.
Packages may be either binary packages or source packages. Binary packages contain the files needed for installation and proper functioning of the software, but not the source files. The files in the binary package are precompiled and so usually are expected to work on a limited set of machine architectures.
Source packages contain the files needed for compilation of the software on the user’s computer. The source package contains a compilation script (typically in form of a Unix Makefile) which automates the compilation and (afterwards) installation processes. In some contexts it is considered good practice to also provide an uninstallation procedure in the makefile. These source packages, which are originally intended for compilation and (de-)installation of the software on the target machine, are used by F/OSS distributors as basis for their own source packages. The derived distribution-specific source package contains again a compilation script which compiles the source files and arranges all relevant files into bundles constituting the binary packages.
Source packages are more flexible, because the user may choose to tweak the source and because the compilation will usually be optimized for the user’s architecture at compile time. However, most users are not expected to be able to cope with software compilation on their machines and to fix problems with the compilation. Also, local compilation can slow down the machine quite considerably.
In this section we will present a detailed survey of the current package managements systems and their package formats in order to review what are the currently used metadata and how they are used by package management systems.
DEB[DPa], the package management system used by Dpkg, the package manager for the Debian distribution, was created in 1993 by Ian Jackson. It has been upgraded since then, and the format is now in version 2.0.
There are two types of packages: binary packages and source packages. Binary packages contain files that can be installed directly from the package file; source packages contain source code that can be used to create binary packages—it is possible to create multiple binary packages from one source package.
An example of the contents of a DEB file (the ocaml binary package from debian-unstable) can be seen in Figure 1.
- ocaml_3.08.3-8_i386.deb
- debian-binary (version)
- control.tar.gz
- postinst (post-install script)
- prerm (pre-removal script)
- postrm (post-removal script)
- md5sums (MD5 sums for data.tar.gz)
- control (package metadata)
- data.tar.gz
- /usr
- /usr/lib
- /usr/lib/ocaml/3.08.3
- …
A DEB package (binary or source) is an ar file which has three members (see Figure 1): the package version (which nowadays is 2.0) and two gzip-compressed tar archives containing, respectively, metadata (in proper DEB terminology, the control data) and the files that are to be installed as part of the package.
The control archive contains all metadata. Besides a control file in which the metadata are stored, it contains MD5 sums for the package data, as well as scripts that are to be run when installing or removing the package.
The control file, which contains all metadata, is a text file which consists of paragraphs, each of which consists of fields. The paragraphs are separated by blank lines.
A field is usually a single line which contains the field name, followed by a colon and the field contents. It is possible to include fields that span multiple lines; in that case, the second and further lines start with a space.
An example of a control file, once again from the ocaml package, can be seen in Figure 2.
Package: ocaml Version: 3.08.3-7 Section: devel Priority: optional Architecture: i386 Depends: ocaml-base (= 3.08.3-7), ocaml-base-3.08.3, ocaml-nox-3.08.3, ocaml-base-nox (>= 3.08.3-6) Suggests: xlibs-dev, tcl8.4-dev, tk8.4-dev Provides: ocaml-3.08.3 Installed-Size: 7052 Maintainer: Debian OCaml Maintainers <debian-ocaml-maint@lists.debian.org> Description: ML language implementation with a class-based object system Objective Caml is an implementation of the ML language, based on the Caml Light dialect extended with a complete class-based object system and a powerful module system in the style of Standard ML. ...
A list of all possible fields (except for dependencies) that occur in the control file of a binary package follows:
Then follow the package dependencies. In the DEB format, there are several different types of dependencies (in the list, we assume that package A depends on package B, i.e. that the dependency for package B figures in the control file of package A):
A dependency can also limit the versions of the package that satisfy it. For example, in the ocaml package shown above, the version of ocaml-base installed must be exactly equal to 3.08.3-7, while the version of ocaml-base-nox installed must be greater than or equal to 3.08.3-6.
There are five different ’version operators’:
It is also possible to declare dependencies on virtual packages. A virtual package is a package that does not in itself exist, but must be provided by another package. For example, the ocaml package provides the virtual package ocaml-3.08.3; any dependency on ocaml-3.08.3 can be satisfied by the ocaml package.
A more complex example would be a virtual package named web-server. A package that needs a web server, but is not interested in any particular web server, could declare a dependency on web-server. Any package that provides web-server could then satisfy that dependency.
Virtual packages do not have versions, but the possibility to add this functionality to later versions of the DEB format is specifically left open.
In the ocaml example, however, we can see that some notion of versions has already been informally added: the ocaml package depends on the virtual package ocaml-3.08.3, which is provided only by ocaml version 3.08.3; earlier versions provide ocaml-3.08.2 and so on, in effect providing some sort of version requirement.
As referenced several times earlier, the DEB format also has source packages. From one source package, it is possible to build several binary packages, and this is reflected in the fact that the control file for a source package consists of one general paragraph and several paragraphs for the binary packages that can be created from the source package.
The control information of a source package is different from the one used in binary packages: A source package may build-depend on or build-conflict with other packages, thus expressing requirements for the source package to compile. Since the source package is in general common to several architectures it contains schemata for the control information of the binary packages which are instantiated at compilation time. For instance, the architecture may now either consist of a list of architectures (where any abbreviates the list of all supported architectures), the word any which would be replaced in the created binary package by the actual architecture string, or all for an architecture-independent binary package. Dependencies in the schema for the control information of a binary package may be qualified by an architecture specifier. The schema may also contain variables which are instantiated at compilation time.
A DEB version number consists of three components:
First, the epoch, a single integer number. This is the most important component; whatever the rest of the version number, a package of epoch n+1 will always be of higher version than a package of epoch n. It is intended to be used in case of a change in version numbering scheme, or if a mistake is made. The epoch is optional (if not present, epoch 0 is assumed), and separated from the upstream version by a colon. If there is no epoch, the upstream version may not contain any colon.
Then, the upstream version. This usually is the original version of the software that has been packaged. It may contain letters, digits, periods, plus and minus signs and colons. It should start with a digit.
Next comes the optional Debian revision, separated from the upstream version by a minus sign; if the Debian revision is not present, the upstream version may not contain a minus sign. The Debian revision, which is of the same form as the upstream version, indicates the version of the Debian package based on the upstream version; therefore, it changes if the Debian package is changed, but the upstream version is not. It is conventional to reset the Debian revision to at 1 every time the upstream version is changed.
Version comparison is done from left to right; first the epoch, then the upstream version and finally the Debian revision are compared.
For any two strings that must be compared (epoch to epoch, upstream version to upstream version or Debian revision to Debian revision), firstly the initial parts that contain only non-digit characters are determined and compared lexicographically. If there is no difference, the initial parts of the remainders that contain only digits are compared numerically. This process (comparing non-digit strings lexicographically and digit strings numerically) is repeated until either a difference is found or both strings are exhausted.
The packages in the main section all comply with the Debian Free Software Guidelines[DPb]. Furthermore, packages in the main section do not have any ‘positive’ dependencies (Depends, Recommends or Build-Depends dependencies) on packages outside the main distribution.
They also conform to a certain standard of quality (“they must not be so buggy that we refuse to support them”).
The contrib section contains packages that do conform to the Debian Free Software Guidelines, but that do not satisfy the requirement of having no dependencies on packages outside the main section.
The standard of quality is the same as for the main section.
In the non-free section, packages can be placed that do not conform to the Debian Free Software Guidelines.
Each of the three sections mentioned above has a non-US subsection. Packages that are in the main section cannot depend on packages that are in the non-US subsection of main, but it is possible for packages in the non-US subsection of main to depend on packages that are in the main section. The same goes for the contrib section.
In this section we investigate the format of RPM packages [Bai], starting from their structure and focusing in particular on the attributes which detail the metadata associated with the package and, in particular, attributes that represent the relationships with other packages. RPM packages can be of two different types:
In this document we will address only binary packages.
RPM packages follow a well defined naming convention in order to maintain consistency between the name of the package file and the information encoded in the RPM package format and contained in the file itself. This naming convention is also endorsed by all the tools that supports RPM package creation. Since all the information regarding the RPM package are self contained in the package itself, an RPM package will continue to be usable even if its file name is renamed to some other file name which does not follow the naming convention.
Moreover, it is important to notice that the same naming convention is also used in some metadata fields in the RPM package format (see Section 1.2.2)
The standard RPM package naming convention is the following:
where:
Here are some examples of package names found in various Linux distributions: mc-4.6.1-0.pre3.2mdk.i586.rpm, gedit-0.9.7-2.i386.rpm, gaim-1.3.0-1.fc4.i386.rpm, kphone-4.1.1-1.fc4.x86_64.rpm. Notice how the release field is often used, besides to show the actual release number, to indicate the distribution the package belongs to as well (i.e., Mandriva/Mandrake (mdk), Fedora Core4 (fc4), etc.)
RPM packages are bundled in a binary format. The format is the same for both binary and source packages. The current version of the RPM format is 3.0 and it is used by all the RPM package managers since version 2.1.
An RPM package is divided into four logical sections:
RPM being a (multi-platform) binary file format, it has been designed in order to be correctly handled by the RPM package manager, despite of the actual platform it is executed. In particular the reference byte-ordering is the one defined for the Internet (network byte order)
The Lead section of an RPM package is basically used as a “signature” in order to identify the file as an RPM package. For example, the Unix file command uses this information in order to recognize the format. RPM package managers and other RPM oriented tools use this information as well.
Much of the information that is present in the Lead section is obsolete and is actually ignored by current RPM package managers. Moreover, that information is duplicated in the Header section. It is maintained only for backward compatibility of the file format with older tools.
The structure of the Lead section is represented by the data structure rpmlead (defined in lib/rpmlib.h) in the RPM source tree [RT] and is made of the following fields:
Architecture ID Architecture ID Architecture ID i386 1 alphaev6 2 ppc64 16 i486 1 sparc 3 ppc64iseries 16 i586 1 sun4 3 ppc64pseries 16 i686 1 sun4m 3 m68k 6 Athlon 1 sun4c 3 rs6000 8 Pentium3 1 sun4d 3 ia64 9 Pentium4 1 sparcv8 3 armv3l 12 AMD 1 sparcv9 3 armv4b 12 x86_64 1 sparc64 10 armv4l 12 AMD64 1 sun4u 10 s390 14 ia32e 1 mips 4 i370 14 alpha 2 mipsel 11 s390x 15 alphaev5 2 IP 7 sh 17 alphaev56 2 ppc 5 xtensa 18 alphapcap56 2 ppciseries 5 alphaev6 2 ppcpseries 5
OS ID OS ID Linux 1 IRIX64 10 IRIX 2 NEXTSTEP 11 SunOS5 3 BSD_OS 12 SunOS4 4 machten 13 AmigaOS 5 CYGWIN32_NT 14 AIX 5 CYGWIN32_95 15 HP-UX 6 UNIX_SV 16 OSF1 7 MiNT 17 osf4.0 7 OS/390 18 osf3.2 7 VM/ESA 19 FreeBSD 8 Linux/390 20 SCO_SV 9 Linux/ESA 20
The header structure defines the format of the header and signature section of an RPM package file. The choice of the names is a bit confusing and it is maintained for historical reasons. The header structure is quite complicated and it models a small database where it is possible to store and retrieve arbitrary data by the means of keys, called tags. The header structure is composed of several header entries that logically provide the actual data. An entry is characterized by the following attributes:
The format of the header structure, on the other hand, is made of the following fields:
The signature section contains one or more digital signatures for assessing the origin of the package. The signature section is stored by using the header structure format described in Section 1.2.2. The signature section may contain multiple signatures. However every RPM package must have at least a signature which specifies the size of the package (identified by the tag RPMTAG_SIGSIZE) and a signature which gives the MD5 hash of the package (identified by the tag RPMTAG_SIGMD5). Multiple cryptographic signatures, identified by the relative tags (e.g., RPMTAG GPG, RPMTAGPGP, etc.) could present, but are not required.
The header contains all the metadata information regarding the RPM package itself. It is stored by using the header structure format described in 1.2.2 and provides all the information needed to handle a given RPM package. A detailed description of the relevant metadata attributes is presented in Section 1.3
The payload section contains the actual archive with all the files belonging to the RPM package. The format of the payload is a gzipped cpio archive which is uncompressed, depending on the directives specified in the package metadata, when the package is actually installed. The format of the cpio archive is SVR4 with a CRC checksum.
In this section we will examine the most relevant package metadata that are present the header section (Section 1.2.2) of an RPM package. In particular we will focus on those metadata describing package relationships with other RPM packages (i.e., dependency information).
It is clear that all the metadata are encoded using the header structure format described in Section 1.2.2 by means of tags and their associated data. For the sake of clarity, in order to refer to package metadata we will use descriptive names instead of actual tag ID associated to the data.
Moreover, the descriptive names are the ones used in spec files. These are files used by automated packaging tools in order to create RPM packages. A spec file contains all the directive and the metadata information specified in a textual and readable format. Starting from the spec file package tools are able to build a standard RPM package in the format described in Section 1.2.2.
In the following section we will use the syntax and the tags taken from the standard spec file format for describing how to build RPM packages. We will not describe all the directives of the spec file format because this will be out of scope for this report. However it is possible to find a quite complete description of these directives in [RT]
Descriptive metadata allows the packager to specify informational attributes regarding the package itself. In the following we will detail the most relevant metadata attributes.
As already hinted in Section 1.2.1, every package is characterized by a version that is used extensively in package metadata in order to specify relationships between packages.
Generally a complete version specification has the following format:
where:
Obviously package versions impose an ordering on packages which is used when it is necessary to specify package relationships with other packages. The comparison algorithm breaks up the package version and is basically a segmented comparison. The version is broken up in segments, each of them containing either alphabetical character or digits. The segments are compared in order, with the rightmost segment being the least significant.
The alphabetical segments are compared using a lexicographical ascii ordering, while the digit segments are compared the same way after having removed any leading zero. If the two digit segments are equal, the longer the bigger.
No additional knowledge is embedded in the algorithm, so a version number 5.6 will be older than 5.0000503 because the comparison will be made between 6 and 503 (i.e. 0000503 without leading zeroes), and 503 > 6.
The Name tag specifies the name of the software being packages. It follows the naming conventions described in Section 1.2.1.
The Version tag specifies the version of the software being packages. Usually it matches the version number of the software itself and it specifies the version part of the complete version specification described in Section 1.3.1.
The Release tag specifies the version part of the complete version specification described in Section 1.3.1.
The Epoch tag specifies the epoch part of the complete version specification described in Section 1.3.1.
The Description tag is used to provide an in-depth description of the packaged software while the Summary tag is used only for giving brief description of the same packaged software.
The Group tag provides a way to organize packages into groups. A group specification consists of a series of strings separated by the ’/’ character. This allows the specification of subgroups as if they were subdirectories, e.g. Application/Editors.
Dependency metadata are used to establish relationships between packages. Those relationships are used in order to ensure that once the packaged software is installed, the system will provide anything it needs to work properly (i.e., other packaged software, libraries, etc.)
By (correctly) specifying dependency metadata it is possible to guarantee that package management operations (see Section 1.4) will not break the consistency of the system when they are performed.
In this section we describe the metadata tags that are used in RPM packages in order to explicit relationships between packages, and we will detail their semantics.
A dependency relation is always specified by using the name of a package and, in case, some additional constraint defined by using arithmetic comparison operator. This is possible because, as described in Section 1.3.1, version number are totally ordered.
The RPM package format allows the usage of the following comparison operators: <, <=, =, >=, =, when specifying dependency relation constraints. The semantics of those operators is the standard one, applied to version numbers.
Dependency relation specifications establish relations between the current package in its current version and the set of other packages entailed by the dependency specification:
The Requires tag is used to specify what are the packages that are needed in order to make the packaged software work. At least one of the packages identified by the dependency specification must be present when installing the current package. Actually (see Section 1.4) the required packages could even not be currently installed in the system. However it is important that during the installation process, in which multiple packages may be processed, there is a package which satisfy the dependency relation. Obviously, if only a single package is requested for installation, then there should be an already installed package that satisfies the dependency relation.
The previous example shows a Requires dependency where the current package needs whatever version of the package ncurses and a version greater or equal to version 2.3 of the package libmpeg installed.
The PreReq tag has exactly the same semantics of the Requires tag (see Section 1.3.2) but it mandates that at least one the packages identified by the dependency constraint must be already installed in the system before attempting to install the current package.
The Conflict tag is the complement of the Requires tag and is used to specify what are the packages that must not be installed in order to make the packaged software work. All the packages identified by the specification must not be present when trying to install the current package (neither already installed in the system, nor in the package list to be processed).
The previous example shows a Conflicts dependency where the current package cannot coexist with any version of the sendmail package.
The Provides tag is used to declare a capability which is provided by the current packages. Actually the provided capability is often referred to as Virtual package, that is actually an alias that can be used in dependency relation to refer to those real packages which provide it.
The Provides tag offers also a way to group packages together. In fact, by specifying a dependency relation using a virtual package or capability identifier, we can implicitly identify all the actual packages which provide that virtual package through a Provides declaration.
Usually, the Provides tag is also used to provide file dependencies as if they were virtual packages (e.g., Provides: /bin/sh). This is particularly useful when RPM packages are used on systems in part managed by an RPM package management system. The advantage of doing so is that a package providing a virtual package /bin/sh can be safely removed without actually removing the file /bin/sh.
The previous example shows a package which provides a virtual package lda (abbreviation for Local Delivery Agent). The identifier lda can be used, for example, by the sendmail package, in its Requires dependency, in order to model the fact that sendmail to properly work needs a local delivery agent (lda).
It is important to notice that when virtual packages are used to specify dependency relations, it is not possible to use version constraint on them. This is obvious because a virtual package may be provided by different package types whose version numbers are, of course, incomparable.
Finally, often there are packages that Provides a virtual package and Requires the same virtual package. For example the package bash provides the virtual package bash and also requires it in order to be installed. This could be seen as a contradiction, but for what we have said in Section 1.3.2, the package could be nevertheless installed because bash will provide all the requirements the package itself.
The Obsoletes tag says which packages are obsoleted by this one. Older versions of the package are automatically obsoleted.
When building a RPM package, a set of dependency relation are implicitly declared. In order to do so, starting from the list of the files that make up the package, for each file in the list the following operations are performed:
Even if automatically provided and required library names may seem file names, they are actually capability identifiers that are not related to actual file names contained in the package itself.
For example, a package containing the command ls will automatically require the following libraries:
linux-gate.so.1 |
librt.so.1 |
libc.so.6 |
libpthread.so.0 |
/lib/ld-linux.so.2 |
In a RPM package it is possible to find metadata which provide an operational behavior that is executed at some stage, after having issued an operation on a package. These metadata simply specify shell-script or script written in some other interpreted language, and are executed by the RPM package management system.
Many script metadata are used to handle source RPM packages, in order to automate the building process. The following ones, however, are used when actions are performed on binary packages, in particular, during installation and removal of RPM packages.
The %pre script is executed just before the package is to be installed.
The %post script is executed after the package has been installed. It usually contain some setup command, such as the execution of ldconfig to update the system library cache, or the editing of system wide configuration files (e.g., when a new shell is installed the /etc/shell is updated accordingly).
The %preun script is executed just before the removal of a package.
The %postun script is executed just before the removal of a package. It usually contains cleanup code and complementary action with respect to those specified in the %post script.
The RPM package management system is build around a single command line utility rpm, which provides the user all the functionalities to:
rpm make use of a central database where it stores all the information about the packages that are already present in the system. Each operation provided by rpm queries this database in order to perform consistency checks with respect to package dependencies.
When rpm executes a package installation it performs the following steps:
When rpm executes a package removal it performs the following steps:
When rpm executes a package upgrade it basically performs first an installation of the package and then a removal of the upgraded ones taking care of correctly handling the various config files that are present in the packages.
Most important features for package management are common to RPM and Debian (dependencies, versioning, informational metadata, and the like) but certain features are quite different, and we list here the more relevant ones.
The following table has been taken from [rpm] and shows a comparison matrix between the two package formats DEB and RPM.
Feature | deb | rpm |
Security, authentication, and verification | ||
signed packages | yes[1] | yes |
checksums | yes | yes |
permissions, owners, etc | yes | yes |
Usability by standard linux tools recognizable by file | yes | yes |
data unpackable by standard tools | yes [2] | no [3] |
metadata accessible by standard tools | yes | no |
creatable by standard tools | yes | no |
Metadata | ||
name | yes | yes |
version | yes | yes |
description | yes | yes |
dependencies | yes | yes |
recommendations | yes | no |
suggestions | yes | no |
conflicts | yes | yes |
virtual packages and provides | yes | yes |
versioned dependencies and conflicts | yes | yes |
boolean package relationships | yes | no [4] |
file dependencies | no | yes |
copyright info | no [5] | yes |
grouping | yes | yes |
priority | yes | no |
Special files | ||
config files | yes | yes |
documentation files | no | yes |
ghost files | no | yes |
Package programs | ||
binary programs allowed | yes | no |
pre-install program | yes | yes |
post-install program | yes | yes |
pre-remove program | yes | yes |
post-remove program | yes | yes |
verify program | no | yes |
triggers | no | yes |
Scalability | ||
no hard-coded limits | yes | yes [6] |
new metadata | yes | yes [7] |
new section | yes | no |
format version data | yes | yes |
Remarks:
The remark made with respect to the boolean package relationships (remark number 4), is not quite exact. In fact, by using the features provided by the RPM package format, it is possible, by using the provides mechanism, to express boolean OR package relationships. In order to do so, for every distinct OR relationship to be specified we would have to introduce an unique identifier P and tag every package participating in the relationship with a provides P. However, this solution is so tricky and impractical to implement that it is not a viable alternative in the currently available RPM package management system.
The ports system, as used under various names by FreeBSD, NetBSD (pkgsrc) and Gentoo (portage), is different from the DEB and RPM formats in that it focuses mainly on source packages instead of binary packages; the standard way of installing software is not by installing a binary package, but by compiling it from the original source.
The core of the ports system is a collection of build scripts. In the FreeBSD ports collection and the NetBSD pkgsrc, these are Makefiles; in the Gentoo portage system, they are bash scripts (called “Ebuilds”). These build scripts contain all the instructions for building the software. At the minimum, the build script contains the location where the original source can be found, but there are many possibilities for customization, e.g. the use of the GNU autoconf and automake programs, patches, specific compiler options, etc.
The ports system (under NetBSD and FreeBSD) consists of a directory tree, with every package having its own directory. These directories contain the Makefiles. In order to install a package, one simply cds to the appropriate directory and types make install.
There are minor differences between the ports system as used by FreeBSD, NetBSD and Gentoo, but the basic ideas remain the same. Therefore, we will use the NetBSD pkgsrc system as an example for the rest of this section.
Let us look at the NetBSD makefile for the ocaml package given in Figure 3.
# $NetBSD: Makefile,v 1.38 2005/06/14 21:00:41 minskim Exp $.include "Makefile.common"
CONFIGURE_ARGS+= -no-tk CONFIGURE_ENV+= disable_x11=yes
BUILD_TARGET= world .if (${MACHINE_ARCH} == "i386") || \ (${MACHINE_ARCH} == "powerpc") || \ (${MACHINE_ARCH} == "sparc") BUILD_TARGET+= opt opt.opt PLIST_SRC= ${PKGDIR}/PLIST.opt . if ${OPSYS} != "Darwin" PLIST_SRC+= ${PKGDIR}/PLIST.prof . endif PLIST_SRC+= ${PKGDIR}/PLIST .endif
.if ${OPSYS} == "Darwin" PLIST_SRC+= ${PKGDIR}/PLIST.stub .endif
.include "../../mk/bsd.pkg.mk"
We see that this Makefile invokes another Makefile, called Makefile.common, given in Figure 4. The reason for this is that there is also another package, called ocaml-graphics, which provides the OCaml language with support for X11 graphics (the ocaml package does not require X11). The common settings for both packages are in Makefile.common, whereas the settings that are only for ocaml are in the main Makefile seen above.
We see that configure arguments are set in order to disable X11, and that the compilation options are changed depending on the architecture (OCaml native compilation is only available on a few architectures). Also, the Darwin operating systems requires some additional options.
Lastly, the bsd.pkg.mk file is included; this is the general file that contains all the code that takes care of automatic downloading, extracting, patching, building and installing.
DISTNAME= ocaml-3.08.4 CATEGORIES= lang MASTER_SITES= http://caml.inria.fr/pub/distrib/ocaml-3.08/ EXTRACT_SUFX= .tar.bz2MAINTAINER= adam@NetBSD.org HOMEPAGE= http://caml.inria.fr/ocaml/ COMMENT= The latest implementation of the Caml dialect of ML
DISTINFO_FILE= ${.CURDIR}/../../lang/ocaml/distinfo PATCHDIR= ${.CURDIR}/../../lang/ocaml/patches
USE_TOOLS+= gmake HAS_CONFIGURE= yes CONFIGURE_ARGS+= -prefix ${PREFIX} CONFIGURE_ARGS+= -libs "${LDFLAGS}" CONFIGURE_ARGS+= -with-pthread CONFIGURE_ENV+= BDB_LIBS=${BDB_LIBS} \ BDB_BUILTIN=${USE_BUILTIN.${BDB_TYPE}} CPPFLAGS+= -DDB_DBM_HSEARCH
.include "../../mk/bsd.prefs.mk"
.if ${OPSYS} == "Darwin" || ${OPSYS} == "Linux" INSTALL_UNSTRIPPED= yes .endif
.include "../../mk/bdb.buildlink3.mk"
post-extract: cp-power-bsd cp-gnu-config
cp-power-bsd: @${CP} ${WRKSRC}/asmrun/power-elf.S ${WRKSRC}/asmrun\ /power-bsd.S
cp-gnu-config: @${CP} ${PKGSRCDIR}/mk/gnu-config/config.guess ${WRKSRC}\ /config/gnu/ @${CP} ${PKGSRCDIR}/mk/gnu-config/config.sub ${WRKSRC}\ /config/gnu/
.include "../../mk/pthread.buildlink3.mk"
A lot more options are set in the common Makefile. Firstly, a few options to do with the original source code (the ‘distfiles’): how the file is called, where it can be downloaded, and how it can be decompressed.
Also, some information about the software: the person responsible for packaging (the maintainer), the homepage, and a short description (‘comment’).
Then, the locations for distribution info (the size and MD5 keys of the distfiles, in order to make sure that they have not been corrupted) and patches are explicitly set; the Makefile can be used by different packages, so it could be called from different directories.
Then follow some variables that influence build behaviour; ocaml uses GNU make instead of the standard BSD make, it has a configure script that needs several arguments, and there are some options that need to be set in order to properly use the Berkeley DB system.
After this, the included bsd.prefs.mk once again is a system file that contains code to interpret all these variables.
Then, it is specified that the strip utility must not be used under Darwin or Linux.
The post-extract target is defined to specify actions that have to be taken directly after decompressing the distribution file(s). There are six phases in the build process, and for each of these, pre- and post- targets can be defined:
In the case of the ocaml package, some files must be copied after decompressing the distribution files; this is specified under the cp-power-bsd and cp-gnu-config targets.
Also, the files bdb.buildlink3.mk and pthread.buildlink3.mk files are included. These files take care of dependencies; apparently, the ocaml package needs the Berkeley DB and the pthread libraries. Here is the buildlink3 file for ocaml is given in Figure 5.
# $NetBSD: buildlink3.mk,v 1.12 2005/02/04 21:35:51 adrianp Exp $BUILDLINK_DEPTH:= ${BUILDLINK_DEPTH}+ OCAML_BUILDLINK3_MK:= ${OCAML_BUILDLINK3_MK}+
.if !empty(BUILDLINK_DEPTH:M+) BUILDLINK_DEPENDS+= ocaml .endif
BUILDLINK_PACKAGES:= ${BUILDLINK_PACKAGES:Nocaml} BUILDLINK_PACKAGES+= ocaml BUILDLINK_DEPMETHOD.ocaml?= build
.if !empty(OCAML_BUILDLINK3_MK:M+) BUILDLINK_DEPENDS.ocaml+= ocaml>=3.08.2 BUILDLINK_PKGSRCDIR.ocaml?= ../../lang/ocaml
. include "../../mk/bsd.prefs.mk" . if ${OPSYS} == "Darwin" INSTALL_UNSTRIPPED= yes . endif
PRINT_PLIST_AWK+= /^@dirrm lib\/ocaml$$/ \ { print "@comment in ocaml: " $$0; next }
BUILDLINK_TARGETS+= ocaml-wrappers OCAML_WRAPPERS= ocaml ocamlc ocamlc.opt ocamlcp ocamlmklib ocamlmktop \ ocamlopt ocamlopt.opt
ocaml-wrappers: ${PKGSILENT}${PKGDEBUG} \ for w in ${OCAML_WRAPPERS}; do \ ${SED} -e 's|@SH@|${SH}|g' \ -e 's|@OCAML_PREFIX@|${BUILDLINK_PREFIX.ocaml}|g' \ -e 's|@CFLAGS@|${CFLAGS}|g' \ -e 's|@LDFLAGS@|${LDFLAGS}|g' \ <${.CURDIR}/../../lang/ocaml/files/wrapper.sh \ >${BUILDLINK_DIR}/bin/$$w; \ ${CHMOD} +x ${BUILDLINK_DIR}/bin/$$w; \ done
.endif # OCAML_BUILDLINK3_MK
BUILDLINK_DEPTH:= ${BUILDLINK_DEPTH:S/+$//}
Here, the actual dependency variables are set; the version depended on is ocaml, greater than or equal to 3.08.2.
There are many other variables that can influence the build process in different ways. It is, for example, possible to set compile options both per package and generally (i.e. for multiple packages; the option "ssl" for example will result in all packages being compiled with SSL support, if available).
This system is very flexible, and it is possible, using the Makefile syntax, to specify very complicated build procedures. However, for most packages, specifically those that use a GNU configure script, it is enough to simply specify the location of the source files and a few options.
Virtual packages, as in the DEB and RPM formats, are not supported (possibly because it would require searching through the entire pkgsrc tree in order to find a package that provides the virtual package). They can, however, be emulated (by creating a normal package that depends on the packages that are to provide the virtual package).
This section is based on reverse-engineering.
A binary package, under NetBSD, is simply a tarred and gzipped file that contains the files that are part of the package. Besides that, there are some special files that contain the meta-data:
This is arguably the most important file of a binary package, since it contains the package metadata. The metadata are stored in the form of a list of build variables. An example, again from the ocaml package, is given in Figure 6.
BDB_TYPE=db1 BDBBASE=/usr PLISTIGNORE_FILES= DISTFILES=ocaml-3.08.3.tar.bz2 PATCHFILES= PKGSYSCONFBASEDIR=/usr/pkg/etc PKGSYSCONFDIR=/usr/pkg/etc PKGPATH=lang/ocaml OPSYS=NetBSD OS_VERSION=2.0 MACHINE_ARCH=i386 MACHINE_GNU_ARCH=i386 CPPFLAGS= -DDB_DBM_HSEARCH -I/usr/include CFLAGS=-O2 -I/usr/include FFLAGS=-O LDFLAGS= -L/usr/lib -Wl,-R/usr/lib -Wl,-R/usr/pkg/lib CONFIGURE_ENV=BDB_LIBS= BDB_BUILTIN=yes PTHREAD_CFLAGS=\ -pthread\ PTHREAD_LDFLAGS=\ -pthread PTHREAD_LIBS= PTHREADBASE=/usr disable_x11=yes CC=cc CFLAGS=-O2\ -I/usr/include CPPFLAGS=-DDB_DBM_HSEARCH\ -I/usr/include CXX=c++ CXXFLAGS=-O2\ -I/usr/include COMPILER_RPATH_FLAG=-Wl,-R F77=f77 FC=f77 FFLAGS=-O LANG=C LC_COLLATE=C LC_CTYPE=C LC_MESSAGES=C LC_MONETARY=C LC_NUMERIC=C LC_TIME=C LDFLAGS=-L/usr/lib\ -Wl,-R/usr/lib\ -Wl,-R/usr/pkg/lib LINKER_RPATH_FLAG=-R PATH=/usr/tmp/lang/ocaml/work/.wrapper/bin: /usr/tmp/lang/ocaml/work/.buildlink/bin:/usr/tmp/lang/ocaml/work/.gcc/bin: /usr/tmp/lang/ocaml/work/.tools/bin:/usr/pkg/bin:/sbin:/usr/sbin:/bin: /usr/bin:/usr/pkg/sbin:/usr/pkg/bin:/usr/X11R6/bin:/usr/local/sbin: /usr/local/bin:/usr/pkg/bin:/usr/X11R6/bin PREFIX=/usr/pkg PKG_SYSCONFDIR=/usr/pkg/etc INSTALL_INFO=/usr/tmp/lang/ocaml/work/.tools/bin/install-info MAKEINFO=/usr/tmp/lang/ocaml/work/.tools/bin/makeinfo MAKE=make WRAPPER_DEBUG="yes" WRAPPER_UPDATE_CACHE="yes" CONFIGURE_ARGS=-prefix /usr/pkg -libs " -L/usr/lib -Wl,-R/usr/lib -Wl,-R/usr/pkg/lib" -with-pthread -no-tk OBJECT_FMT=ELF LICENSE= RESTRICTED= NO_SRC_ON_FTP= NO_SRC_ON_CDROM= NO_BIN_ON_FTP= NO_BIN_ON_CDROM= CC_VERSION=gcc-3.3.3 GMAKE=GNU Make 3.80 PKGTOOLSVER=20050318 REQUIRES=/usr/lib/libc.so.12 REQUIRES=/usr/lib/libcurses.so.6 REQUIRES=/usr/lib/libm.so.0 REQUIRES=/usr/lib/libm387.so.0 REQUIRES=/usr/lib/libpthread.so.0
We see that all kinds of information are stored in the file, including information on dependencies, though the dependencies here are on libraries, not on packages.
Conspicuously absent are the package name and version. These are stored in the +CONTENTS file.
Gentoo Linux is using a packaging system very different from other distributions. It is inspired by the BSD ports system, with new advanced features. This system, called Portage allows to install programs, by compiling them automatically from sources, with all the optimizations for your computer, according to your choices.
More informations about Portage can be found in the Gentoo Handbook [GPb], the Gentoo Developer Handbook [GPa], and Gentoo manual pages [AGMGNJMF].
Some of the advanced features of Portage are:
Gentoo’s release model is based on the following ideas: There is only one package repository which is evolving continuously. Each package lives together with other versions of the same program and you can decide which version you want on your system. Packages are tagged by keywords, indicating for each hardware architecture whether it is available, not available, or available but not tested sufficiently. For example, if a package is tagged “x86 ppc ~alpha -hppa ~amd64”, it means that it is available on x86 and ppc, not available on hppa, not tested sufficiently on alpha and amd64, not tested on other architectures. Packages with a tag “-” or “~” are masked, that means that by default they won’t be installed (but you can decide to override the flag).
Package developers can allow two different versions of a package to be installed in the same system.
The portage tree is a directory tree on your system where all the informations on packages are stored. There is one directory for each package, containing all the versions of the package. All the informations about a version are in a file called ebuild. Ebuilds are bash shell scripts defining variables (DESCRIPTION, HOMEPAGE, DEPEND, KEYWORDS, etc.) and bash functions (pkg_setup, src_unpack, src_compile, src_install, pkg_preinst, pkg_postinst, pkg_config, etc.), which can use a set of predefined function. An excerpt from an ebuild is given in Figure 7
# Copyright 1999-2005 Gentoo FoundationDistributed under the terms of the GNU General Public License v2
$Header: /var/cvsroot/gentoo-x86/app-editors/emacs/emacs-21.4-r1.
ebuild, v 1.15 2005/08/23 03:12:54 agriffis Exp $
DESCRIPTION="An incredibly powerful, extensible text editor" HOMEPAGE="http://www.gnu.org/software/emacs" SRC_URI="mirror://gnu/emacs/${P}a.tar.gz leim? ( mirror://gnu/emacs/leim-${PV}.tar.gz )"
LICENSE="GPL-2" SLOT="21" KEYWORDS="alpha amd64 arm hppa ia64 ppc ppc64 s390 ~sh sparc x86" IUSE="X Xaw3d gnome leim lesstif motif nls nosendmail"
RDEPEND="sys-libs/ncurses sys-libs/gdbm X? ( virtual/x11 >=media-libs/giflib-4.1.0.1b >=media-libs/jpeg-6b-r2 >=media-libs/tiff-3.5.5-r3 >=media-libs/libpng-1.2.1 !arm? ( Xaw3d? ( x11-libs/Xaw3d ) motif? ( lesstif? ( x11-libs/lesstif ) !lesstif? ( >=x11-libs/openmotif-2.1.30 ) ) gnome? ( gnome-base/gnome-desktop ) ) ) nls? ( sys-devel/gettext ) !nosendmail? ( virtual/mta )" DEPEND="${RDEPEND} >=sys-devel/autoconf-2.58"
PROVIDE="virtual/emacs virtual/editor" SANDBOX_DISABLED="1"
DFILE=emacs-${SLOT}.desktop
src_unpack() { ... }
src_compile() { ... }
...
pkg-ver{suf{#}}{-r#}.ebuild
where suf
is one of
alpha
< beta
< pre
< rc
< (no
suffix) < p
and -r#
is gentoo specific revision
number. For example, linux-2.4.0pre10-r2.ebuild
.
When you install a Gentoo system, you need to define “USE” flags. USE variables are used to tell portage:
E.g. if you don’t put the kde keyword in your USE flags, packages that have optional KDE support will be compiled without it packages that have optional KDE dependency will be installed without installing the KDE libraries (as dependencies). Default USE is defined in /etc/make.profile/make.defaults:
USE="oss apm arts avi berkdb bitmap-fonts crypt cups encode fortran f77 font-server foomaticdb gdbm gif gpm gtk gtk2 imlib jpeg kde gnome libg++ libwww mad mikmod motif mpeg ncurses nls oggvorbis opengl pam pdflib png python qt quicktime readline sdl spell ssl svga tcpd truetype truetype-fonts type1-fonts X xml2 xmms xv zlib"
You can add your own flags in /etc/make.conf, for example: USE="-kde -qt msn yahoo jabber". You can declare USE-flags for individual packages (not system-wide), or just for one installation. A list of available USE-flags in /usr/portage/profiles/use.desc is:
gtk - Adds support for x11-libs/gtk+ (The GIMP Toolkit) gtk2 - Use gtk+-2.0.0 over gtk+-1.2 in cases where a program supports both. gtkhtml - Adds support for gnome-extra/gtkhtml imap - Adds support for IMAP ...
You can also use local USE-flags:
app-editors/emacs:multi-tty - Add multi-tty support app-editors/emacs:nosendmail - If you do not want to install any MTA
Some packages do not only listen to USE-flags, but also provide USE-flags. When you install such a package, the USE-flags they provide are added to your USE setting (for example: kde provided by kde-base/kdebase).
Dependencies between packages are described in ebuilds in the variables DEPEND and RDEPEND. DEPEND tells Portage about which packages are needed to build the package. The RDEPEND variable specifies which packages are needed for the package to run.
In dependencies, you write gentoo packages names:
RDEPEND="sys-libs/ncurses sys-libs/gdbm"
meaning that any version of these packages will fit.
You may also specify a version number, for example:
RDEPEND=">=media-libs/giflib-4.1.0.1b =media-libs/jpeg-6b-r2 ~sys-apps/qux-1.0 =sys-apps/foo-1.2* !sys-libs/gdbm"
which means that you need a version of giflib newer or equal to 4.1.0.1b, exactly jpeg-6b-r2, (you can also have <, >, or <=), and:
~sys-apps/qux-1.0
will select the newest portage revision of qux-1.0.
=sys-apps/foo-1.2*
will select the newest member of the 1.2
series, but will ignore 1.3 and later/earlier series. That is, foo-1.2.3 and foo-1.2.0 are both valid, while foo-1.3.3,
foo-1.3.0, and foo-1.1.0 are not.
!sys-libs/gdbm
will prevent this package from being emerged while gdbm is already emerged.
As you can see in the example, Portage allows to do conditional dependencies. For example X? means that the following parenthesis will be in the dependencies only if X is in the USE flags. !arm? means that the following parenthesis will be in the dependencies only if arm is not in the USE flags.
A package can also depend on either a package or another one. Examples:
DEPEND="|| ( app-games/unreal-tournament app-games/unreal-tournament-goty )" DEPEND="|| ( sdl? ( media-libs/libsdl ) svga? ( media-libs/svgalib ) opengl? ( virtual/opengl ) ggi? ( media-libs/libggi ) virtual/x11 )"
In the last example, one of the packages will be chosen, and the order of preference is determined by the order in which they appear.
A package can provide a virtual package, so that other packages can depend on it. This is useful for example when a package depends on a system logger or a mail transport agent, but not on a particular one.
The variable PDEPEND contains a list of all packages that will have to be installed after the program has been compiled.