the magic of localepurge

Date

Localpurge is a nice little package that helps you to save a bit of space by removing unused locales. And surely I’m not the first one that writes about it…

abate@zed.fr:~$sudo aptitude install localepurge
The following NEW packages will be installed:
  localepurge 
0 packages upgraded, 1 newly installed, 0 to remove and 103 not upgraded.
Need to get 44.9 kB of archives. After unpacking 172 kB will be used.
Get:1 http://ftp.debian.org/debian/ unstable/main localepurge all 0.6.2+nmu2 [44.9 kB]
Fetched 44.9 kB in 0s (94.6 kB/s) 
Reading package fields... Done
Reading package status... Done
Retrieving bug reports... Done
Parsing Found/Fixed information... Done
Preconfiguring packages ...
Selecting previously deselected package localepurge.
(Reading database ... 187978 files and directories currently installed.)
Unpacking localepurge (from .../localepurge_0.6.2+nmu2_all.deb) ...
Processing triggers for man-db ...
Setting up localepurge (0.6.2+nmu2) ...

Creating config file /etc/locale.nopurge with new version

abate@zed.fr:~$sudo localepurge 
localepurge: Disk space freed in /usr/share/locale: 428676 KiB
localepurge: Disk space freed in /usr/share/man: 5332 KiB
localepurge: Disk space freed in /usr/share/gnome/help: 100468 KiB
localepurge: Disk space freed in /usr/share/omf: 3016 KiB

Total disk space freed by localepurge: 537492 KiB


Predicting Upgrade Failures Using Dependency Analysis

The next 16 of april in hannover at the hotswup workshop we’ll present a joint work with Roberto Di Cosmo prepared in the context of the mancoosi project. Since we used debian for our experiments, we’re also very much interested in the feedback from the community regarding our method. However keep in mind that this is still work in progress and to be considered as research more then a proposal for a concrete application. This is part of our on-going effort of exploring different areas with the goal of providing tools and ideas to enhance the quality of FOSS distributions.

Abstract

Upgrades in component based systems can disrupt other components. Being able to predict the possible consequence of an upgrade just by analyzing inter-component dependencies can avoid errors and downtime. In this paper we precisely identify in a repository the components p whose upgrades force a large set of others components to be upgraded. We are also able to discriminate whether all the future versions of p have the same impact, or whether there are different classes of future versions that have different impacts. We perform our analysis on Debian, one of the largest FOSS distributions.


you can find more info about this paper here


performances tweaking - dose3

Lately I’ve been concerned about the performances of dose3. Soon we will have a package in the official debian archive (containing the new distcheck) and we also plan to use dose3 as foundation of an upcoming apt-get future (external solvers !). This week I tackled a couple of problems.

First I wanted to understand the poor performances of my parser for the debian Packages format. The parser itself (written by J. Voullion for dose2) is a home brewed parser, it uses a Str based tokenizer and it is pretty efficient. On the top of it I built the rest of the parsing infrastructure. Because of laziness (well, I followed the [http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf “avoid premature optimization”] mantra) I used a lot of regular expressions using the standard library (Str) module to parse various chunks of the file. Since Str has the reputation of not being the fastest reg exp library in the world (I know I should use Pcre), I started my journey by removing all calls to this library and substituting the with calls to the module String.

====Lesson n. 1==== If you do not need a regular expression to parse a string, you are better off using String.index, String.sub and friends instead. Maybe your function will be a bit longer, but certainly faster. Sscanf is also your friend.

This was only the tip of the iceberg. Second I noticed I used String.lowercase (I use ExtLib.String) a bit every where… I realized I could simply remove all these calls and have a bit more faith in the user input. If the user does not respect the standard it’s his problem, not mine.

====Lesson n. 2==== Calling a String function a zillion times slow you down considerably !!!!

I knew there was something more to do. Following the advices of my colleges, we decided to take a look at what really was happening under the wood. Using ocamlbuild and gproof, this is easily done.

first you need to rebuild your binary using debug and the profiling tags. This can be done once off from the command line :

ocamlbuild -tag debug -tag profile apt-backend.native

Then you have to run the binary as you normally do, to collect profiling information, and in the end you fire up gprof to see what’s going on.

$gprof apt-backend.native | less
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  8.52      0.79     0.79 63008857     0.00     0.00  caml_gc_set
  5.61      1.31     0.52   934076     0.00     0.00  _fini
  5.07      1.78     0.47 55818126     0.00     0.00  caml_MD5Transform
  4.53      2.20     0.42   296139     0.00     0.00  caml_gc_stat
  4.21      2.59     0.39 10910905     0.00     0.00  caml_parse_engine
  2.91      2.86     0.27                             compare_val
  2.48      3.09     0.23 11707884     0.00     0.00  caml_final_register

compare_val !!! This is a bad sign. It means I’m using the generic comparing function instead of a monomorphic comparison function. After a bit of head scratching I realized that in one of my data structures I was using a generic List.assoc . This function uses compare_val ! Bingo.

rewriting the assoc function lowered the number of calls to compare_val ten-folds giving me a considerable speed-up

let rec assoc (n : string) = function
  |(k,v)::_ when k = n -> v
  |_::t -> assoc n t
  |[] -> raise Not_found
;;

====Lesson n.3==== Before using a generic function think twice !

On the same vein, I specialized also a couple of hash tables (for integers and strings) with their monomorphic counterparts.

During my tests I’ve also noticed that I was spending a lot of time resizing my hash tables. In my case this was easily avoidable using a more sensitive default when creating the hash table. This is not always the case because sometimes the default is tied to a value that in not known in advance.

The only think left in the parsing function is to get rid of the last call to Str that I use to tokenize my stream. I think writing few lines of ocamllex would give me an additional speedup, but I’ll leave this for next week…


Since was in the mood for hacking I decided to understand what was wrong in a different part of dose3, that is, the translation from debian Packages format to propositional logic (that will be then used by a SAT solver to perform various installability analysis).

What I immediately noticed looking at my code, is that I had a couple of List.unique functions called by a very important function. Ah ! My first naive solution to this problems was to use the ExtLib List.unique function that forces you to pass a comparison function with it. With this change I noticed a small speed-up (compare_val strikes back), but it was clearly not enough. The obvious solution was to rewrite the routine using a set (of integers in this case) and drop completely the List.unique.

====Lesson n.4 ==== List.unique is slow, ExtLib.List.unique is better, If you can, use Sets.

Last improvement is related to the SAT solver we use. It’s a very specialized and optimized SAT solver (inherited from dose2) and it is written in ocaml. Using again grpof I noticed that the Gc overhead was substantial enough to warrant a bit of Gc tweaking.

    Gc.set { (Gc.get()) with
      Gc.minor_heap_size = 4 * 1024 * 1024; (*4M*)
      Gc.major_heap_increment = 32 * 1024 * 1024; (*32M*)
      Gc.max_overhead = 150;
    } ;

This corresponds to CAMLRUNPARAM=s=4M,i=32M,o=150

====Lesson n.5 ==== Gc tweaking can make the difference sometimes !

After all this work I was quite pleased of the result:

Before (r2454) :
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/lenny.packages
background-packages: 22311
foreground-packages: 22311
broken-packages: 0

real    0m11.535s
user    0m11.409s
sys     0m0.112s
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/sid.packages
background-packages: 29589
foreground-packages: 29589
broken-packages: 143

real    0m19.799s
user    0m19.621s
sys     0m0.152s

After (r2467) :
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/lenny.packages
background-packages: 22311
foreground-packages: 22311
broken-packages: 0

real    0m8.738s
user    0m8.589s
sys     0m0.132s
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/sid.packages
background-packages: 29589
foreground-packages: 29589
broken-packages: 143

real    0m14.026s
user    0m13.817s
sys     0m0.172s

I shaved about 4 seconds from my processing time. Considering that these applications are going to be called many times per day on the entire debian archive or thousand or times during our experiments, 4 seconds here and there can save quite a bit of time.


Xen brouter setup

A while ago I received a new desktop machine (8 cores, 8Gb of memory …) at work. Since for the moment I kinda happy to work on my laptop using an external screen, I decided to put the hw to a good use and to explore a bit more some more exotic (at least for me) xen features.

In particular I spend half a day playing with different xen network settings. The bridge model, that should work out of the shelf, is the easiest one. To setup this up, you basically need to specify a couple of options in the xend-config file and you’re done. This is the “default” network configuration and is should work out of the box in most situations. Using this method, since all VMs’ interfaces are bridged together (surprise !) with the public interface, your network card is left in promiscuous mode (not a big problem if you ask me…). Once your VMs are up, you can then decided to use your default dhcp server, autoconf your VMs with ipv6 only, or do nothing as you please.

An other popular method, albeit a bit more complex, is to setup a natted network using the script network-nat (this one is an evolution of the third method that is ‘network-routed’) . I played with it, but since I wanted to have all my DomU on the same subnet, this setup wasn’t satisfying for me. In particular, by default, ‘network-nat’ assigns a different subnet to each DomU. Using the natted set up you can also configure a local dhcp server to give private IPs to your VMs all done transparently by the xen network scripts. I’ve noticed that there is a bug in the xen script that does not make it very squeeze friendly. Since the default dhcp server in squeeze is isc-dhcp and few configuration files got shuffled in the process (notably /etc/dhcp3.conf is not /etc/dhcp/dhcp.conf) , the script needs a little fix to work properly. I’ll report this bug sometimes soon…

Goggling around I found a different setup that is called brouter, that is a hybrid between a bridge configuration and a routed configuration. This is the original (??) article well hidden in an old suse wiki.

I’ve done few modifications here to add natting. So basically, all virtual interfaces connected each to one DomU are linked together by a bridge (xenbr0). The bridge with address 10.0.0.1 is also the router of the subnet. All DomU are configured to used dhcp that assigns a new ip and specifies the router of the subnet. The dhcp server is configured to answer requests only on the xenb0 interface avoiding problems on the public network…

routing is configured using iptables :

    iptables -t nat -A POSTROUTING -o ${netdev} -j MASQUERADE
    iptables -A FORWARD -i ${bridge} -j ACCEPT
    echo 1 >/proc/sys/net/ipv4/ip_forward
    /etc/init.d/isc-dhcp-server restart

Note that since the dhcp server is configured to give addresses only on the virtual network, we need to restart it after creating the bridge interface, otherwise isc-dhcp-server will refuse to run. Mum says that I should configure the bridge in /etc/network/interfaces to make the dhcp server happy at startup, but I felt a bit lazy, so I let to task to xen…

In the next episode, I’ll add ipv6 connectivity to the virtual subnet and then start playing with puppet… ipv6 is almost done, puppet… I started with the doc…

The complete script from the suse wiki and with my modifications is below (only lightly tested):

#!/bin/sh
#============================================================================
# Default Xen network start/stop script.
# Xend calls a network script when it starts.
# The script name to use is defined in /etc/xen/xend-config.sxp
# in the network-script field.
#
# This script creates a bridge (default xenbr${vifnum}), gives it an IP address
# and the appropriate route. Then it starts the SuSEfirewall2 which should have
# the bridge device in the zone you want it.
#
# If all goes well, this should ensure that networking stays up.
# However, some configurations are upset by this, especially
# NFS roots. If the bridged setup does not meet your needs,
# configure a different script, for example using routing instead.
#
# Usage:
#
# vnet-brouter (start|stop|status) {VAR=VAL}*
#
# Vars:
#
# bridgeip   Holds the ip address the bridge should have in the
#            the form ip/mask (10.0.0.1/24).
# brnet      Holds the network of the bridge (10.0.0.1/24).
# 
# vifnum     Virtual device number to use (default 0). Numbers >=8
#            require the netback driver to have nloopbacks set to a
#            higher value than its default of 8.
# bridge     The bridge to use (default xenbr${vifnum}).
#
# start:
# Creates the bridge
# Gives it the IP address and netmask
# Adds the routes to the routing table.
#
# stop:
# Removes all routes from the bridge
# Removes any devices on the bridge from it.
# Deletes bridge
#
# status:
# Print addresses, interfaces, routes
#
#============================================================================

#set -x

dir=$(dirname "$0")
. "$dir/xen-script-common.sh"
. "$dir/xen-network-common.sh"

findCommand "$@"
evalVariables "$@"

vifnum=${vifnum:-0}
bridgeip=${bridgeip:-10.6.7.1/24}
brnet=${brnet:-10.6.7.0/24}
netmask=${netmask:-255.255.255.0}
bridge=${bridge:-xenbr${vifnum}}

##
# link_exists interface
#
# Returns 0 if the interface named exists (whether up or down), 1 otherwise.
#
link_exists()
{
    if ip link show "$1" >/dev/null 2>/dev/null
    then
        return 0
    else
        return 1
    fi
}


# Usage: create_bridge bridge
create_bridge () {
    local bridge=$1

    # Don't create the bridge if it already exists.
    if [ ! -d "/sys/class/net/${bridge}/bridge" ]; then
        brctl addbr ${bridge}
        brctl stp ${bridge} off
        brctl setfd ${bridge} 0
    fi
    ip link set ${bridge} up
}

# Usage: add_to_bridge bridge dev
add_to_bridge () {
    local bridge=$1
    local dev=$2
    # Don't add $dev to $bridge if it's already on a bridge.
    if ! brctl show | grep -wq ${dev} ; then
        brctl addif ${bridge} ${dev}
    fi
}

# Usage: show_status dev bridge
# Print interface configuration and routes.
show_status () {
    local dev=$1
    local bridge=$2

    echo '============================================================'
    ip addr show ${dev}
    ip addr show ${bridge}
    echo ' '
    brctl show ${bridge}
    echo ' '
    ip route list
    echo ' '
    route -n
    echo '============================================================'
    echo ' '
    iptables -L
    echo ' '
    iptables -L -t nat
    echo '============================================================'

}

op_start () {
    if [ "${bridge}" = "null" ] ; then
        return
    fi

    create_bridge ${bridge}

    if link_exists "$bridge"; then
        ip address add dev $bridge $bridgeip
        ip link set ${bridge} up arp on
        ip route add to $brnet dev $bridge
    fi

    if [ ${antispoof} = 'yes' ] ; then
        antispoofing
    fi
    iptables -t nat -A POSTROUTING -o ${netdev} -j MASQUERADE
    iptables -A FORWARD -i ${bridge} -j ACCEPT
    echo 1 >/proc/sys/net/ipv4/ip_forward
    /etc/init.d/isc-dhcp-server restart
}

op_stop () {
    if [ "${bridge}" = "null" ]; then
        return
    fi
    if ! link_exists "$bridge"; then
        return
    fi

    ip route del to $brnet dev $bridge
    ip link set ${bridge} down arp off
    ip address del dev $bridge $bridgeip
    ##FIXME: disconnect the interfaces from the bridge 1st
    brctl delbr ${bridge}
    /etc/init.d/isc-dhcp-server restart
}

case "$command" in
    start)
        op_start
        ;;

    stop)
        op_stop
        ;;

    status)
        show_status ${netdev} ${bridge}
        ;;

    *)
        echo "Unknown command: $command" >&2
        echo 'Valid commands are: start, stop, status' >&2
        exit 1
esac