Lately I’ve been concerned about the performances of
dose3. Soon we will have a
package in the official debian archive (containing the new distcheck)
and we also plan to use dose3 as foundation of an upcoming apt-get
future (external solvers !). This week I tackled a couple of problems.
First I wanted to understand the poor performances of my parser for
the debian Packages format. The parser itself (written by J. Voullion
for dose2) is a home brewed parser, it uses a Str based tokenizer
and it is pretty efficient. On the top of it I built the rest of the
parsing infrastructure. Because of laziness (well, I followed the
[http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf “avoid
premature optimization”] mantra) I used a lot of regular expressions
using the standard library (Str) module to parse various chunks of
the file. Since Str has the reputation of not being the fastest
reg exp library in the world (I know I should use Pcre), I started my
journey by removing all calls to this library and substituting the
with calls to the module String.
====Lesson n. 1==== If you do not need a regular expression to parse a
string, you are better off using String.index, String.sub and friends
instead. Maybe your function will be a bit longer, but certainly
faster. Sscanf is also your friend.
This was only the tip of the iceberg. Second I noticed I used
String.lowercase (I use ExtLib.String) a bit every where… I realized
I could simply remove all these calls and have a bit more faith in the
user input. If the user does not respect the standard it’s his
problem, not mine.
====Lesson n. 2==== Calling a String function a zillion times slow you
down considerably !!!!
I knew there was something more to do. Following the advices of my
colleges, we decided to take a look at what really was happening under
the wood. Using ocamlbuild and gproof, this is easily done.
first you need to rebuild your binary using debug and the
profiling tags. This can be done once off from the command line :
ocamlbuild -tag debug -tag profile apt-backend.native
Then you have to run the binary as you normally do, to collect
profiling information, and in the end you fire up gprof to see
what’s going on.
$gprof apt-backend.native | less
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
8.52 0.79 0.79 63008857 0.00 0.00 caml_gc_set
5.61 1.31 0.52 934076 0.00 0.00 _fini
5.07 1.78 0.47 55818126 0.00 0.00 caml_MD5Transform
4.53 2.20 0.42 296139 0.00 0.00 caml_gc_stat
4.21 2.59 0.39 10910905 0.00 0.00 caml_parse_engine
2.91 2.86 0.27 compare_val
2.48 3.09 0.23 11707884 0.00 0.00 caml_final_register
compare_val !!! This is a bad sign. It means I’m using the generic
comparing function instead of a monomorphic comparison function. After
a bit of head scratching I realized that in one of my data structures
I was using a generic List.assoc . This function uses compare_val
! Bingo.
rewriting the assoc function lowered the number of calls to
compare_val ten-folds giving me a considerable speed-up
let rec assoc (n : string) = function
|(k,v)::_ when k = n -> v
|_::t -> assoc n t
|[] -> raise Not_found
;;
====Lesson n.3==== Before using a generic function think twice !
On the same vein, I specialized also a couple of hash tables (for
integers and strings) with their monomorphic counterparts.
During my tests I’ve also noticed that I was spending a lot of time
resizing my hash tables. In my case this was easily avoidable using a
more sensitive default when creating the hash table. This is not
always the case because sometimes the default is tied to a value that
in not known in advance.
The only think left in the parsing function is to get rid of the last
call to Str that I use to tokenize my stream. I think writing few
lines of ocamllex would give me an additional speedup, but I’ll leave
this for next week…
Since was in the mood for hacking I decided to understand what was
wrong in a different part of dose3, that is, the translation from
debian Packages format to propositional logic (that will be then used
by a SAT solver to perform various installability analysis).
What I immediately noticed looking at my code, is that I had a couple
of List.unique functions called by a very important function. Ah !
My first naive solution to this problems was to use the ExtLib
List.unique function that forces you to pass a comparison function
with it. With this change I noticed a small speed-up (compare_val
strikes back), but it was clearly not enough. The obvious solution was
to rewrite the routine using a set (of integers in this case) and drop
completely the List.unique.
====Lesson n.4 ==== List.unique is slow, ExtLib.List.unique is better,
If you can, use Sets.
Last improvement is related to the SAT solver we use. It’s a very
specialized and optimized SAT solver (inherited from dose2) and it is
written in ocaml. Using again grpof I noticed that the Gc overhead was
substantial enough to warrant a bit of Gc tweaking.
Gc.set { (Gc.get()) with
Gc.minor_heap_size = 4 * 1024 * 1024; (*4M*)
Gc.major_heap_increment = 32 * 1024 * 1024; (*32M*)
Gc.max_overhead = 150;
} ;
This corresponds to CAMLRUNPARAM=s=4M,i=32M,o=150
====Lesson n.5 ==== Gc tweaking can make the difference sometimes !
After all this work I was quite pleased of the result:
Before (r2454) :
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/lenny.packages
background-packages: 22311
foreground-packages: 22311
broken-packages: 0
real 0m11.535s
user 0m11.409s
sys 0m0.112s
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/sid.packages
background-packages: 29589
foreground-packages: 29589
broken-packages: 143
real 0m19.799s
user 0m19.621s
sys 0m0.152s
After (r2467) :
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/lenny.packages
background-packages: 22311
foreground-packages: 22311
broken-packages: 0
real 0m8.738s
user 0m8.589s
sys 0m0.132s
abate@zed.fr:~/Projects/git-svn-repos/dose3/applications$time
./distcheck.native deb://tests/sid.packages
background-packages: 29589
foreground-packages: 29589
broken-packages: 143
real 0m14.026s
user 0m13.817s
sys 0m0.172s
I shaved about 4 seconds from my processing time. Considering that
these applications are going to be called many times per day on the
entire debian archive or thousand or times during our
experiments,
4 seconds here and there can save quite a bit of time.