Lately I’ve been concerned about the performances of dose3. Soon we will have a package in the official debian archive (containing the new distcheck) and we also plan to use dose3 as foundation of an upcoming apt-get future (external solvers !). This week I tackled a couple of problems.

First I wanted to understand the poor performances of my parser for the debian Packages format. The parser itself (written by J. Voullion for dose2) is a home brewed parser, it uses a Str based tokenizer and it is pretty efficient. On the top of it I built the rest of the parsing infrastructure. Because of laziness (well, I followed the [http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf “avoid premature optimization”] mantra) I used a lot of regular expressions using the standard library (Str) module to parse various chunks of the file. Since Str has the reputation of not being the fastest reg exp library in the world (I know I should use Pcre), I started my journey by removing all calls to this library and substituting the with calls to the module String.

====Lesson n. 1==== If you do not need a regular expression to parse a string, you are better off using String.index, String.sub and friends instead. Maybe your function will be a bit longer, but certainly faster. Sscanf is also your friend.

This was only the tip of the iceberg. Second I noticed I used String.lowercase (I use ExtLib.String) a bit every where… I realized I could simply remove all these calls and have a bit more faith in the user input. If the user does not respect the standard it’s his problem, not mine.

====Lesson n. 2==== Calling a String function a zillion times slow you down considerably !!!!

I knew there was something more to do. Following the advices of my colleges, we decided to take a look at what really was happening under the wood. Using ocamlbuild and gproof, this is easily done.

first you need to rebuild your binary using debug and the profiling tags. This can be done once off from the command line :

ocamlbuild -tag debug -tag profile apt-backend.native

Then you have to run the binary as you normally do, to collect profiling information, and in the end you fire up gprof to see what’s going on.

$gprof apt-backend.native | less
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  8.52      0.79     0.79 63008857     0.00     0.00  caml_gc_set
  5.61      1.31     0.52   934076     0.00     0.00  _fini
  5.07      1.78     0.47 55818126     0.00     0.00  caml_MD5Transform
  4.53      2.20     0.42   296139     0.00     0.00  caml_gc_stat
  4.21      2.59     0.39 10910905     0.00     0.00  caml_parse_engine
  2.91      2.86     0.27                             compare_val
  2.48      3.09     0.23 11707884     0.00     0.00  caml_final_register

compare_val !!! This is a bad sign. It means I’m using the generic comparing function instead of a monomorphic comparison function. After a bit of head scratching I realized that in one of my data structures I was using a generic List.assoc . This function uses compare_val ! Bingo.

rewriting the assoc function lowered the number of calls to compare_val ten-folds giving me a considerable speed-up

let rec assoc (n : string) = function
  |(k,v)::_ when k = n -> v
  |_::t -> assoc n t
  |[] -> raise Not_found
;;

====Lesson n.3==== Before using a generic function think twice !

On the same vein, I specialized also a couple of hash tables (for integers and strings) with their monomorphic counterparts.

During my tests I’ve also noticed that I was spending a lot of time resizing my hash tables. In my case this was easily avoidable using a more sensitive default when creating the hash table. This is not always the case because sometimes the default is tied to a value that in not known in advance.

The only think left in the parsing function is to get rid of the last call to Str that I use to tokenize my stream. I think writing few lines of ocamllex would give me an additional speedup, but I’ll leave this for next week…


Since was in the mood for hacking I decided to understand what was wrong in a different part of dose3, that is, the translation from debian Packages format to propositional logic (that will be then used by a SAT solver to perform various installability analysis).

What I immediately noticed looking at my code, is that I had a couple of List.unique functions called by a very important function. Ah ! My first naive solution to this problems was to use the ExtLib List.unique function that forces you to pass a comparison function with it. With this change I noticed a small speed-up (compare_val strikes back), but it was clearly not enough. The obvious solution was to rewrite the routine using a set (of integers in this case) and drop completely the List.unique.

====Lesson n.4 ==== List.unique is slow, ExtLib.List.unique is better, If you can, use Sets.

Last improvement is related to the SAT solver we use. It’s a very specialized and optimized SAT solver (inherited from dose2) and it is written in ocaml. Using again grpof I noticed that the Gc overhead was substantial enough to warrant a bit of Gc tweaking.

    Gc.set { (Gc.get()) with
      Gc.minor_heap_size = 4 * 1024 * 1024; (*4M*)
      Gc.major_heap_increment = 32 * 1024 * 1024; (*32M*)
      Gc.max_overhead = 150;
    } ;

This corresponds to CAMLRUNPARAM=s=4M,i=32M,o=150

====Lesson n.5 ==== Gc tweaking can make the difference sometimes !

After all this work I was quite pleased of the result:

Before (r2454) :
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/lenny.packages
background-packages: 22311
foreground-packages: 22311
broken-packages: 0

real    0m11.535s
user    0m11.409s
sys     0m0.112s
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/sid.packages
background-packages: 29589
foreground-packages: 29589
broken-packages: 143

real    0m19.799s
user    0m19.621s
sys     0m0.152s

After (r2467) :
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/lenny.packages
background-packages: 22311
foreground-packages: 22311
broken-packages: 0

real    0m8.738s
user    0m8.589s
sys     0m0.132s
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/sid.packages
background-packages: 29589
foreground-packages: 29589
broken-packages: 143

real    0m14.026s
user    0m13.817s
sys     0m0.172s

I shaved about 4 seconds from my processing time. Considering that these applications are going to be called many times per day on the entire debian archive or thousand or times during our experiments, 4 seconds here and there can save quite a bit of time.