Time regression testing

Often, when I change something in my project, I wonder if my modifications had any impact on other modules. On the one hand there is correctness. If I change something I want to be confident that I didn’t break anything else in a dependent part of the code. In order to alleviate this problem, I often use a battery of unit tests that I ran before committing my changes. This does not give me a proof that I didn’t break anything, but at least some level of confidence that if I broke something, this is not something I thought of before… As ocaml library I use the excellent oUnit library.

Another hunting question is about performances. This is more difficult to test. To be sure I didn’t degrade the performances of my code, I need access performance related information of my code sometimes in the past. If you use a scm to manage your code like git, there are facilities to run this kind of tests (that of course need some heavy scripting abilities if you want to check all your functions) . Otherwise you are a bit left to your own destiny…

Starting from the Benchmark module, I cooked up a companion module ExtBenchmark to take care of time regression testing for you.

This is the .mli file that I hope somehow readable…

(* a time-stamped collection of samples *)
type benchmark

(* create a time-stamped collection of samples *)
val make_benchmark : Benchmark.samples -> benchmark

(** parse a string representing a value of type Benchmark.t
    Benchmark.to_string ~fdigits:6 (Benchmark.make (Int64.of_int 4))
    46.194004 WALL (45.626852 usr + 0.144009 sys = 45.770861 CPU) @ 0.087392/s (n=4)
 *)
val parse_test : string -> Benchmark.t

(** parse a string representing a sample of the form :
    func_name : 46.194004 WALL (45.626852 usr + 0.144009 sys = 45.770861 CPU) @ 0.087392/s (n=4)
  *)
val parse_sample : string -> string * Benchmark.t list

(** parse a benchmark file of the form :
Ex. :
date 1283531842
fname1 : 43.240758 WALL (43.222701 usr + 0.012001 sys = 43.234702 CPU) @ 0.092518/s (n=4)
fname2 : 46.194004 WALL (45.626852 usr + 0.144009 sys = 45.770861 CPU) @ 0.087392/s (n=4)
fname3 : 43.600401 WALL (43.358710 usr + 0.028002 sys = 43.386712 CPU) @ 0.092194/s (n=4)
 *)
val parse_benchmark : string -> benchmark

(** save a benchmark *)
val save_benchmark : ?dirname:string -> benchmark -> unit

(** parse all benchmarks in the benchmark's directory (.benchmark by default) *) 
val parse_benchmarks : ?days:int -> ?dirname:string -> unit -> benchmark list

(** pretty print a [Benchmark.sample] *)
val pp_benchmark : Format.formatter -> benchmark -> unit

(** pretty print a table *)
val pp_benchmarks : Format.formatter -> benchmark list -> unit

The idea of the module is pretty easy. Every time I change my code, I run my battery of tests and save them in a file on disk with the timestamp, the id of the machine and the results. Next time, these tests will be used to compare the new tests and the old ones, to check if out modifications had an impact on some part of the code. I give a small example below borrowing a bit of code from the examples of the benchmark module.

First we declare three functions that we are going to test. Then we run the tests in the function run() and in the main function we actually use the module ExtBenchmark. We execute all benchmarks and we obtain a test sample, then we save the sample on disk in a time stamped file. In the second part, we load all samples, and we print a comparison table. The printing function takes care of showing if the running time of a function increased w.r.t. the lowest running time of the same function and it is also able to print samples that contain different functions, making it easy to add tests along the way.

let rec_loop (a : float array) =
  let rec loop i =
    if i < Array.length a then begin
      a.(i) <- a.(i) +. 1.;
      loop (i + 1)
    end in
  loop 0

let rec_loop2 (a : float array) =
  let len = Array.length a in
  let rec loop i =
    if i < len then begin
      a.(i) <- a.(i) +. 1.;
      loop (i + 1)
    end in
  loop 0

let for_loop (a : float array) =
  for i = 0 to Array.length a - 1 do
    a.(i) <- a.(i) +. 1.
  done

let run () =
  let a = Array.make 10000000 1. in
  Benchmark.latencyN (Int64.of_int 100) [
    ("rec_loop",rec_loop,a);
    ("rec_loop2",rec_loop2,a);
    ("for_loop",for_loop,a)
  ]
;;

let main () =
  (* execute all benchmarcks *)
  let b = ExtBenchmark.make_benchmark (run ()) in
  (* save a timestamped representation of the current run on disk *)
  ExtBenchmark.save_benchmark b;
  (* read all benchmarks file from disk *)
  let l = ExtBenchmark.parse_benchmarks () in
  (* display a table with the current result *)
  Format.printf "%a@." ExtBenchmark.pp_benchmarks l
;;

main ();;

Notice that I run the program twice in order to generate two trace. The results by default are save in the .benchmarks directory and are in a simple textual format. To make it a bit more reliable I’ll also add a machine id in the future, so to avoid mixing benchmarks that were run on different hosts. Moreover time regressions of more then 0.001 seconds are marked with an asterisk in the table so to pinpoint possible problems. I’m aware that these results must be taken with a grain of salt. A function must be run with many repetitions to avoid false positive. I think anyway this is a good starting point to enhance the benchmark module.

$./loops.native 
Latencies for 100 iterations of "rec_loop", "rec_loop2", "for_loop":
 rec_loop:  6.62 WALL ( 6.51 usr +  0.02 sys =  6.53 CPU) @ 15.31/s (n=100)
rec_loop2:  5.99 WALL ( 5.93 usr +  0.01 sys =  5.94 CPU) @ 16.83/s (n=100)
 for_loop:  5.44 WALL ( 5.42 usr +  0.00 sys =  5.42 CPU) @ 18.44/s (n=100)

Date            for_loop rec_loop rec_loop2 
06/9/2010-10:27 0.054    0.065    0.059

$./loops.native 
Latencies for 100 iterations of "rec_loop", "rec_loop2", "for_loop":
 rec_loop:  6.29 WALL ( 6.21 usr +  0.00 sys =  6.22 CPU) @ 16.09/s (n=100)
rec_loop2:  5.90 WALL ( 5.78 usr +  0.01 sys =  5.79 CPU) @ 17.28/s (n=100)
 for_loop:  5.40 WALL ( 5.33 usr +  0.00 sys =  5.33 CPU) @ 18.77/s (n=100)

Date            for_loop rec_loop rec_loop2 
06/9/2010-10:28 0.053    0.062    0.058
06/9/2010-10:27 0.054    0.065    0.059

The code is available as part of the dose3 framework I’m writing for the mancoosi project. You can download it here. If of interest I’ll might ask to merge it with the benchmark project. At the moment the module does not have dependencies to the rest of the code. Enjoy !