important things in life

Linux for the masses … so what is important for the average user ? Well. I’ve realized I managed to use only Linux since ‘94 because I’m the antithesis of average user. I’ve asked a friend what is important. She told me

I want to listen my music
I want to download and organize my photos
I want to sync my data with my iphone (sic!)
I want to browse the net
I want from time to time to write a document and be able to print it at work.

Easy ? not really :( and I’m a bit disgruntled about this. Let’s go in order. Here I’m using ubuntu 10.04.

Music

I’ve started with rhythembox. It’s the default music player in ubuntu. I’ve been using it for years. While configuring it on a new collection of music, the very first bug I’ve encountered is the [bug

537272](https://bugs.launchpad.net/rhythmbox/+bug/537272) . No way to

get out of it of this loop. And if you use rhythembox on a collection that comes straight from windows media player, you might imagine the frustration as wma is the default encoding. To solve this problem I converted overnight all the wma files to ogg with ffmepeg. This solves the import problem, but it should be easier…

So far, so good, but now I want to sync these songs with his iphone. Well. rhythembox does not allow you to do that, or maybe it does, but certainly not out-of-the box and in a way that I can explain to joy user. The iphone is there in the menu, but there is no way to copy music on it (it an iphone with the version 4.0 of the firmware fresh from an apple store). I’ve tried to look everywhere. It seems a very common problem and syncing indeed works with older version of the firmware, but no luck with the latest “updates”.

On top of that there are various shortcoming of the interface : searching (it seems that rhythembox is not able to search at the same time in the song title, filename, meta data and provide a list of songs that match your request) jump to the current song (I’ve to install a plugin for it even ) lyrics (right click, property, lyric .dahhhhh a bit less intuitive way ? other player do this better) well not pretty (I’ve a friend that choose a model of a laptop just because it has a white keyboard… so you can imaginable that for some people these things are important)

So I tried Exile . It’s a more usable player. The interface is more “human” with many features that make joy-user to feel home . But It crashes far too often and it does not even see the iphone.

Then I tried Listen music player that is a pretty nice player. Written in python and pretty stable. But Alas, not iphone. But much better then ryhtmbox as a player and I settled on this one for the moment.

I’ve tried also Gtkpod but there it seems it only works with older versions of the firmware.

Last I tried Amarok (on a gnome system, but I hope that the dependency system is able to install it in any case). I hoped this would work as everybody says on the net that is the most mature player out there… For me it failed to import the music library and out of frustration I gave up trying the iphone support…

Photos

I’ve been using f-spot for years. It gets better and better, but it seems that it’s default behavior of copying and reorganizing the photo collections by date is completely against the mind setting of most of the people. Why is this the default ? Let people live their life as they want ? More you try to bend them toward a different schema, more you risk to loose them in the process. F-spot has also a number of shortcomings, sometimes it crashes, but all in all it pretty usable. And it is also able to read and index photos from the iphone. Yeeee :) Something that bluffed me is the time it takes to export photos to a folder. Copying 1000 photos to a usb pen took 2 hours !!!!!!! while copying photo directly from nautilus takes only 30 mins … This was not cool.

F-spot is also terribly slow with its slide show. It does not allow to rotate photos during the slide show and sometimes it stalls up 5 to 5 seconds to redraw a new photo…

I’ve tried also showtell, the upcoming default ubuntu photo manager… It failed to import from f-spot large part of my photos on the first run. I kinda of gave up, but I’ve high hopes…

Other

I’ve to say that we often tend to focus on negative aspects. On ubuntu the chrome broweser runs very well delivering a more then acceptable user experience. Openoffice also works nicely both with old word documents and to edit new document. Thumb up here. Nothing to say.

Conclusion

When I started writing this article I had a very negative picture. I’ve to say that on a second thought things are not that bad after all. The main problem are not surprisingly proprietary formats. From wma files to the f***ing iphone. They are all playing against FOSS applications and the have the high hand all the time. We can try to catch up, but it’s difficult. And trying to explain this to the average user is difficult. They says that simple things in life should be simple. And while this is true in general, they fail to understand is that what is simple for windows / apple user is extremely difficult in the FOSS world because of the lack of open standards and commercial practice that force this state of affairs. I use an openmoko as day phone and I never had any of these problems :)

command line parsing in python

if a good API is meant to let you write code more concisely, I would say the argparse achieves this goal perfectly. For example, with getopt you usually write something like :

try :
    opts, cmdline = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="])
except getopt.GetoptError, err:
    print str(err)
    usage()
    sys.exit(1)
output = None
verbose = False
for o, a in opts:
    if o == "-v":
        verbose = True
    elif o in ("-h", "--help"):
        usage()
        sys.exit()
    elif o in ("-o", "--output"):
        output = a
    else:
        assert False, "unhandled option"
if not cmdline :
    usage()
    sys.exit(1)

This is just to add two command line options, help and verbose and without taking care of positional arguments… Now, the magic of argparse:

    parser = argparse.ArgumentParser(description='description of you program')
    parser.add_argument('-v', '--verbose')
    parser.add_argument('timestamp', type=int, nargs=1, help="a unix timestamp")
    parser.add_argument('inputfile', type=str, nargs=1, help="input file")
    args = parser.parse_args()
    parser.print_help()

tadaaaaa .

usage: import.py [-h] [-v VERBOSE] timestamp inputfile

description of you program

positional arguments:
  timestamp         a unix timestamp
  inputfile             input file

optional arguments:
  -h, --help            show this help message and exit
  -v VERBOSE, --verbose VERBOSE

And this has all the bells and whistle you want. Like checking if you pass both positional arguments, if the file exists, if the timestamp is really and integer file, etc. Very nice indeed !

welcome to the brave new world of git-svn

I’m definitely fed up with the lack of feature of svn. I always end up making backup copies of files, committing the wrong patch set, being unable to cherry-pick what to commit, not to talk about branching, merging and other marry activities. So today I overcome my laziness and setup a git svn repository for dose3 (since it’s all there, go ahead, and be happy!).

The concept of git svn if pretty easy. You work with git, and from time to time, you commit in svn. Best of both worlds, and ppl won’t even notice that you are not using their pet SCM, apart for the happy look of your face :)

First let’s clone the svn repo :

git svn clone https://gforge.info.ucl.ac.be/svn/mancoosi/trunk/dose3

This will probably hangup if you have big files in the svn repo. I don’t think it’s a problem with svn itself, but more likely a problem with the forge at ucl.ac.be. Anyway, if this happen, just go in the dose3 directory and finish it up with a git svn fetch

Now it’s time to sync all you changes and test from your old svn tree. I choose to rsync everything away :

rsync -avz --exclude=.svn --exclude=_build ../mancoosi-public/trunk/dose3/ .

of course we want to exclude all the compilation leftovers and .svn directories …

Ahhhhhhh freedom. Now I can edit, stash away changes, commit, uncommit, rebase, cherry-pick… I feel home.

At the end of the day we have to check if somebody updated the svn repo with git svn rebase, maybe resolve a couple of conflicts and then I can commit all my work with git svn dcommit.

I’ve been working with git svn only for a couple of days now, but I feel it pretty stable and trustworthy.

list unique

Well … it seems that planet ocaml is faster then light to index new pages… I often start editing one story to publish it few days later to avoid stupid mistakes. This time I published the page by accident and I think it stayed published for less then two minutes… This is the final versions …

Today i did a small audit on my code to check which are the functions that are often used and can slow down my code. One in particular took my attention, ExtLib.List.unique . This function (below) takes quadratic time on the length of the input list. On big lists, this function is a killer. The algorithm is very simple. Since it accepts a cmp optional argument, it can be much faster if we use a monomorphic comparing function.

let rec unique ?(cmp = ( = )) l =
        let rec loop dst = function
                | [] -> ()
                | h :: t ->
                        match exists (cmp h) t with
                        | true -> loop dst t
                        | false ->
                                let r = { hd =  h; tl = [] }  in
                                dst.tl <- inj r;
                                loop r t
        in
        let dummy = dummy_node() in
        loop dummy l;
        dummy.tl

This is a small benchmark (benchmarking is addictive indeed !!) using, the polymorphic and monomorphic variant of List.unique :

open ExtLib ;;

Random.self_init ();;

let list_unique l = List.unique l ;;

let list_unique_mono l =
  let cmp (x : string) (y : string) = x = y in
  List.unique ~cmp l
;;

let run () =
  let rec gen_strings acc = function
    |0 -> acc
    |n -> gen_strings ((string_of_int(Random.int 100))::acc) (n-1)
  in
  let a = gen_strings [] 100000 in
  Benchmark.latencyN (Int64.of_int 10) [
    ("list_unique",list_unique,a);
    ("list_unique_mono",list_unique_mono,a);
    ("hash_unique",hash_unique,a);
  ]
;;
run ();;

However if you want to go even faster, you can use a stupid implementation based on hash tables.

let hash_unique l =
  let h = Hashtbl.create (List.length l) in
  let add n =
    if not(Hashtbl.mem h n) then
      Hashtbl.add h n ()
  in
  List.iter add l;
  Hashtbl.fold (fun k _ acc -> k::acc) h []

The results are quite clear…

Latencies for 10 iterations of "list_unique", "list_unique_mono", "hash_unique":
     list_unique:  5.34 WALL ( 5.24 usr +  0.06 sys =  5.30 CPU) @  1.89/s (n=10)
     list_unique_mono:  1.41 WALL ( 1.31 usr +  0.08 sys =  1.39 CPU) @  7.18/s (n=10)
     hash_unique:  0.21 WALL ( 0.18 usr +  0.03 sys =  0.21 CPU) @ 48.07/s (n=10)

In this test with a long list and many repetition, the difference is remarkable. the function hash_unique is not stable, but if you don’t care, it does the job pretty well. If you want an even faster implementation based on list that is also not stable, you can write a small function that remove duplicates on a sorted list.

Time regression testing

Often, when I change something in my project, I wonder if my modifications had any impact on other modules. On the one hand there is correctness. If I change something I want to be confident that I didn’t break anything else in a dependent part of the code. In order to alleviate this problem, I often use a battery of unit tests that I ran before committing my changes. This does not give me a proof that I didn’t break anything, but at least some level of confidence that if I broke something, this is not something I thought of before… As ocaml library I use the excellent oUnit library.

Another hunting question is about performances. This is more difficult to test. To be sure I didn’t degrade the performances of my code, I need access performance related information of my code sometimes in the past. If you use a scm to manage your code like git, there are facilities to run this kind of tests (that of course need some heavy scripting abilities if you want to check all your functions) . Otherwise you are a bit left to your own destiny…

Starting from the Benchmark module, I cooked up a companion module ExtBenchmark to take care of time regression testing for you.

This is the .mli file that I hope somehow readable…

(* a time-stamped collection of samples *)
type benchmark

(* create a time-stamped collection of samples *)
val make_benchmark : Benchmark.samples -> benchmark

(** parse a string representing a value of type Benchmark.t
    Benchmark.to_string ~fdigits:6 (Benchmark.make (Int64.of_int 4))
    46.194004 WALL (45.626852 usr + 0.144009 sys = 45.770861 CPU) @ 0.087392/s (n=4)
 *)
val parse_test : string -> Benchmark.t

(** parse a string representing a sample of the form :
    func_name : 46.194004 WALL (45.626852 usr + 0.144009 sys = 45.770861 CPU) @ 0.087392/s (n=4)
  *)
val parse_sample : string -> string * Benchmark.t list

(** parse a benchmark file of the form :
Ex. :
date 1283531842
fname1 : 43.240758 WALL (43.222701 usr + 0.012001 sys = 43.234702 CPU) @ 0.092518/s (n=4)
fname2 : 46.194004 WALL (45.626852 usr + 0.144009 sys = 45.770861 CPU) @ 0.087392/s (n=4)
fname3 : 43.600401 WALL (43.358710 usr + 0.028002 sys = 43.386712 CPU) @ 0.092194/s (n=4)
 *)
val parse_benchmark : string -> benchmark

(** save a benchmark *)
val save_benchmark : ?dirname:string -> benchmark -> unit

(** parse all benchmarks in the benchmark's directory (.benchmark by default) *) 
val parse_benchmarks : ?days:int -> ?dirname:string -> unit -> benchmark list

(** pretty print a [Benchmark.sample] *)
val pp_benchmark : Format.formatter -> benchmark -> unit

(** pretty print a table *)
val pp_benchmarks : Format.formatter -> benchmark list -> unit

The idea of the module is pretty easy. Every time I change my code, I run my battery of tests and save them in a file on disk with the timestamp, the id of the machine and the results. Next time, these tests will be used to compare the new tests and the old ones, to check if out modifications had an impact on some part of the code. I give a small example below borrowing a bit of code from the examples of the benchmark module.

First we declare three functions that we are going to test. Then we run the tests in the function run() and in the main function we actually use the module ExtBenchmark. We execute all benchmarks and we obtain a test sample, then we save the sample on disk in a time stamped file. In the second part, we load all samples, and we print a comparison table. The printing function takes care of showing if the running time of a function increased w.r.t. the lowest running time of the same function and it is also able to print samples that contain different functions, making it easy to add tests along the way.

let rec_loop (a : float array) =
  let rec loop i =
    if i < Array.length a then begin
      a.(i) <- a.(i) +. 1.;
      loop (i + 1)
    end in
  loop 0

let rec_loop2 (a : float array) =
  let len = Array.length a in
  let rec loop i =
    if i < len then begin
      a.(i) <- a.(i) +. 1.;
      loop (i + 1)
    end in
  loop 0

let for_loop (a : float array) =
  for i = 0 to Array.length a - 1 do
    a.(i) <- a.(i) +. 1.
  done

let run () =
  let a = Array.make 10000000 1. in
  Benchmark.latencyN (Int64.of_int 100) [
    ("rec_loop",rec_loop,a);
    ("rec_loop2",rec_loop2,a);
    ("for_loop",for_loop,a)
  ]
;;

let main () =
  (* execute all benchmarcks *)
  let b = ExtBenchmark.make_benchmark (run ()) in
  (* save a timestamped representation of the current run on disk *)
  ExtBenchmark.save_benchmark b;
  (* read all benchmarks file from disk *)
  let l = ExtBenchmark.parse_benchmarks () in
  (* display a table with the current result *)
  Format.printf "%a@." ExtBenchmark.pp_benchmarks l
;;

main ();;

Notice that I run the program twice in order to generate two trace. The results by default are save in the .benchmarks directory and are in a simple textual format. To make it a bit more reliable I’ll also add a machine id in the future, so to avoid mixing benchmarks that were run on different hosts. Moreover time regressions of more then 0.001 seconds are marked with an asterisk in the table so to pinpoint possible problems. I’m aware that these results must be taken with a grain of salt. A function must be run with many repetitions to avoid false positive. I think anyway this is a good starting point to enhance the benchmark module.

$./loops.native 
Latencies for 100 iterations of "rec_loop", "rec_loop2", "for_loop":
 rec_loop:  6.62 WALL ( 6.51 usr +  0.02 sys =  6.53 CPU) @ 15.31/s (n=100)
rec_loop2:  5.99 WALL ( 5.93 usr +  0.01 sys =  5.94 CPU) @ 16.83/s (n=100)
 for_loop:  5.44 WALL ( 5.42 usr +  0.00 sys =  5.42 CPU) @ 18.44/s (n=100)

Date            for_loop rec_loop rec_loop2 
06/9/2010-10:27 0.054    0.065    0.059

$./loops.native 
Latencies for 100 iterations of "rec_loop", "rec_loop2", "for_loop":
 rec_loop:  6.29 WALL ( 6.21 usr +  0.00 sys =  6.22 CPU) @ 16.09/s (n=100)
rec_loop2:  5.90 WALL ( 5.78 usr +  0.01 sys =  5.79 CPU) @ 17.28/s (n=100)
 for_loop:  5.40 WALL ( 5.33 usr +  0.00 sys =  5.33 CPU) @ 18.77/s (n=100)

Date            for_loop rec_loop rec_loop2 
06/9/2010-10:28 0.053    0.062    0.058
06/9/2010-10:27 0.054    0.065    0.059

The code is available as part of the dose3 framework I’m writing for the mancoosi project. You can download it here. If of interest I’ll might ask to merge it with the benchmark project. At the moment the module does not have dependencies to the rest of the code. Enjoy !