Pietro Abate

matplotlib and multiple y-axis scales

This week I had to create a plot using two different scales in the same graph to show the evolution of two related, but not directly comparable, variables. This operation is described in this FAQ on the matplot lib website. Nonetheless I’d like to give a small step by step example…

Consider my input data of the form date release total broken outdated .

20110110T034549Z unstable 29989 133 3
20110210T034103Z wheezy 28900 8 0
20110210T034103Z unstable 30125 209 11
20110310T060132Z wheezy 29179 8 0
20110310T060132Z unstable 30230 945 28
20110410T040442Z wheezy 29487 8 0
20110410T040442Z unstable 31142 991 12
20110510T034745Z wheezy 30247 8 0
20110510T034745Z unstable 31867 610 31
20110610T041209Z wheezy 30328 9 0
20110610T041209Z unstable 32395 328 15
20110710T030855Z wheezy 31403 9 0

I want to create one graph containing three sub graphs, each one containing data for unstable and wheezy. For the sub graph plotting the total number of packages, since the data is kinda uniform, the plot is pretty and self explanatory. The problem arise if we compare the non installable packages in unstable and wheezy, since the data from unstable will squash the plot for wheezy, making it useless.

Below I’ve added the commented python code and the resulting graph. You can get the full source of this example here.

# plot two distribution with different scales
def plotmultiscale(dists,dist1,dist2,output) :

    fig = plt.figure()
# add the main title for the figure
    fig.suptitle("Evalution during wheezy release cycle")
# set date formatting. This is important to have dates pretty printed 
    fig.autofmt_xdate()

# we create the first sub graph, plot the two data sets and set the legend
    ax1 = fig.add_subplot(311,title='Total Packages vs Time')
    ax1.plot(dists[dist1]['date'],dists[dist1]['total'],'o-',label=dist1.capitalize())
    ax1.plot(dists[dist2]['date'],dists[dist2]['total'],'s-',label=dist2.capitalize())
    ax1.legend(loc='upper left')

# we need explicitly to remove the labels for the x axis  
    ax1.xaxis.set_visible(False)

# we add the second sub graph and plot the first data set
    ax2 = fig.add_subplot(312,title='Non-Installable Packages vs Time')
    ax2.plot(dists[dist1]['date'],dists[dist1]['broken'],'o-',label=dist1.capitalize())
    ax2.xaxis.set_visible(False)

# now the fun part. The function twinx() give us access to a second plot that 
# overlays the graph ax2 and shares the same X axis, but not the Y axis 
    ax22 = ax2.twinx()
# we plot the second data set
    ax22.plot(dists[dist2]['date'],dists[dist2]['broken'],'gs-',label=dist2.capitalize())
# and we set a nice limit for our data to make it prettier
    ax22.set_ylim(0, 20)

# we do the same for the third sub graph
    ax3 = fig.add_subplot(313,title='Outdated Packages vs Time')
    ax3.plot(dists[dist1]['date'],dists[dist1]['outdated'],'o-',label=dist1.capitalize())

    ax33 = ax3.twinx()
    ax33.plot(dists[dist2]['date'],dists[dist2]['outdated'],'gs-',label=dist2.capitalize())
    ax33.set_ylim(0, 10)

# this last function is necessary to reset the date formatting with 30 deg rotation
# that somehow we lost while using twinx() ...
    plt.setp(ax3.xaxis.get_majorticklabels(), rotation=30)

# And we save the result
    plt.savefig(output)

Parse French dates on a en_US machine

Immagine you work in France, but you are really fond of your good old en_US locales. I’m sure one day you would invariably face the task to use python to play with some french text. I just find out that this can’t be easier. You just need to set create and set the correct locales for your python script and voila’ !

In this case I need to parse a french date to build an ical file. First, if you haven’t already done it for other reasons, you should rebuild your locales and select a freench encoding, for example fr_FR.UTF-8.

On debian , this is just one command away : sudo dpkg-reconfigure locales

Now you are ready to play :

import locale, datetime
#locale.setlocale(locale.LC_TIME, 'fr_FR.ISO-8859-1')
locale.setlocale(locale.LC_TIME, 'fr_FR.UTF-8')

date_from = "Dimanche 3 Juin 2012"
DATETIME_FORMAT = "%A %d %B %Y"
d = datetime.datetime.strptime(date_from, DATETIME_FORMAT)
print d

Update

If you want to set the date for a particular time zone, this is equally easy once you discover how to do it with standard library function. At the end of the previous snippet add :

from dateutil.tz import *
d = d.replace(tzinfo=gettz('Europe/Paris'))

This is the script I was working on. It uses the vobject library to generate ical files and itertools.groupby to parse the input file.

import vobject
from itertools import groupby
import re
import string
from dateutil.tz import *

import locale, datetime
locale.setlocale(locale.LC_TIME, 'fr_FR.UTF-8')

def test(line) :
    if re.match("^Dimanche.*\n$",line) is not None :
        return True
    else :
        return False

l = []
with open("example") as f :
    for key, group in groupby(f, test):
        if key :
            a = list(group)
        else :
            l.append(a+list(group))

DATETIME_FORMAT = "%A %d %B %Y "

cal = vobject.iCalendar()

for ev in l :
    date_from = ev[0]
    d = datetime.datetime.strptime(date_from, DATETIME_FORMAT)
    d = d.replace(tzinfo=gettz('Europe/Paris'))

    vevent = cal.add('vevent')
    vevent.add('categories').value = ["test category"]
    vevent.add('dtstart').value = d.replace(hour=15)
    vevent.add('dtend').value = d.replace(hour=18)
    vevent.add('summary').value = unicode("Test event")
    vevent.add('description').value = unicode(string.join(ev[1:]),encoding='UTF')

icalstream = cal.serialize()
print icalstream

Input :

Dimanche 6 Mai 2012 

- text text
- more text

Dimanche 13 Mai 2012 

- text text
- more text

Dimanche 3 Juin 2012 

- text text
- more text

Dimanche 10 Juin 2012 

- text text
- more text

Output

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//PYVOBJECT//NONSGML Version 1//EN
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:20111123T133829Z-53948@zed
DTSTART;TZID=CET:20120506T150000
DTEND;TZID=CET:20120506T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n  \n
SUMMARY:Test event
END:VEVENT
BEGIN:VEVENT
UID:20111123T133829Z-19906@zed
DTSTART;TZID=CET:20120513T150000
DTEND;TZID=CET:20120513T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n  \n
SUMMARY:Test event
END:VEVENT
BEGIN:VEVENT
UID:20111123T133829Z-70980@zed
DTSTART;TZID=CET:20120603T150000
DTEND;TZID=CET:20120603T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n  \n
SUMMARY:Test event
END:VEVENT
BEGIN:VEVENT
UID:20111123T133829Z-44400@zed
DTSTART;TZID=CET:20120610T150000
DTEND;TZID=CET:20120610T180000
CATEGORIES:test category
DESCRIPTION:        \n - text text\n - more text\n \n
SUMMARY:Test event
END:VEVENT
END:VCALENDAR

easy cudf parsing in python

With the forth run of Misc live, you might wonder how to you can quickly write a parser for a cudf document. If you are writing your solver in C / C++ , I advice to either grab the legacy ocaml parser and use the C bindings or reuse a parser written by other competitors (all frontends have a FOSS-compatible licence).

If you want to write a dirty and quick frontend in python, maybe the following 10 lines of python might help you:

from itertools import groupby

cnf_fields = ['conflict','depends','provides','recommends']

def cnf(k,s) :
    if k in cnf_fields :
        l = s.split(',')
        ll = map(lambda s : s.split('|'), l)
        return ll
    else :
        return s

records = []
for empty, record in groupby(open("universe.cudf"), key=str.isspace):
  if not empty:
    l = map(lambda s : s.split(': '), record)
    # we ignore the preamble here ...
    if 'preamble' not in l[0] :
        pairs = ([k, cnf(k,v.strip())] for k,v in l)
        records.append(dict(pairs))

for i in records :
    print i

we use the function groupby from itertools to create a list of stanzas and then we just trasfrom each of them in a dictionary that should be pretty easy to manipulate. We ignore the preamble, but adding support for it should be straigthforward… I got the idea from this forum post.

the result :

#python cudf.py
{'recommends': [['perl-modules '], [' libio-socket-inet6-perl']], 'package': '2ping', 'replaces': '', 'number': '1.0-1', 'sourceversion': '1.0-1', 'source': '2ping', 'depends': [['perl']], 'version': '4806', 'architecture': 'all', 'conflicts': '2ping'}0.5-3', 'source': '2vcard', 'version': '1523', 'architecture': 'all', 'conflicts': '2vcard', 'recommends': [['true!']]}'package': '3270-common', 'number': '3.3.10ga4-2', 'sourceversion': '3.3.10ga4-2', 'source': 'ibm-3270', 'depends': [['libc6 >= 9784 '], [' libssl0.9.8 >= 2840']], 'version': '11009', 'architecture': 'amd64', 'conflicts': '3270-common', 'recommends': [['true!']]}chess', 'depends': [['libc6 >= 9578 '], [' libx11-6 '], [' libxext6 '], [' libxmu6 '], [' libxpm4 '], [' libxt6 '], [' xaw3dg >= 6582']], 'version': '2409', 'architecture': 'amd64', 'conflicts': '3dchess', 'recommends': [['true!']]} [' libxpm4 '], [' libxt6 '], [' xaw3dg >= 6582']], 'version': '2410', 'architecture': 'amd64', 'conflicts': '3dchess', 'recommends': [['true!']]}6 >= 8923 '], [' libfreetype6 >= 8856 '], [' libftgl2 >= 8661 '], [' libgcc1 >= 14906 '], [' libgl1-mesa-glx ', ' libgl1--virtual ', ' libgl1 '], [' libglu1-mesa ', ' libglu1--virtual ', ' libglu1 '], [' libgomp1 >= 11829 '], [' libmgl5 '], [' libpng12-0 >= 5996 '], [' libstdc++6 >= 11843 '], [' libwxbase2.8-0 >= 9714 '], [' libwxgtk2.8-0 >= 9714 '], [' libxml2 >= 9624 '], [' zlib1g >= 14223']], 'version': '116', 'architecture': 'amd64', 'conflicts': '3depict', 'recommends': [['true!']]} '], [' libstdc++6 >= 11664 '], [' libwxbase2.8-0 >= 9714 '], [' libwxgtk2.8-0 >= 9714 '], [' libxml2 >= 9624 '], [' zlib1g >= 14223']], 'version': '138', 'architecture': 'amd64', 'conflicts': '3depict', 'recommends': [['true!']]}': '14987', 'architecture': 'amd64', 'conflicts': '9base', 'recommends': [['true!']]}.8-5', 'sourceversion': '1.8-5', 'source': '9menu', 'depends': [['libc6 >= 8923 '], [' libx11-6']], 'version': '7010', 'architecture': 'amd64', 'conflicts': '9menu', 'recommends': [['true!']]}sion': '1.2-9', 'source': '9wm', 'depends': [['libc6 >= 9578 '], [' libx11-6 '], [' libxext6']], 'version': '5712', 'architecture': 'amd64', 'provides': [['x-window-manager--virtual']], 'conflicts': '9wm', 'recommends': [['true!']]}
{'replaces': '', 'package': 'abook', 'number': '0.5.6-7+b1', 'sourceversion': '0.5.6-7', 'source': 'abook', 'depends': [['libc6 >= 9022 '], [' libncursesw5 >= 12348 '], [' libreadline5 >= 12239 '], [' debconf >= 1510 ', ' debconf-2.0--virtual ', ' debconf-2.0']], 'version': '1712', 'architecture': 'amd64', 'conflicts': 'abook', 'recommends': [['true!']]}
...

update

Maybe a small example of the input file would help :)

package: m4
version: 3
depends: libc6 >= 8

package: openssl
version: 11
depends: libc6 >= 18, libssl0.9.8 >= 8, zlib1g >= 1
conflicts: ssleay < 1

connect django and rfoo

This evening I spent 30 minutes to try out rconsole in the package rfoo . It’s a simple environment to inspect and modify the namespace of a running script.

If you are on debian, you need to install two packages :

sudo aptitude install cython python-dev

Then download the source code. If you want to try it out without installing you have to compile it with the —inplace option :

python setup.py build_ext --inplace

Now you’re ready to go. Add in your views.py file the following code:

from rfoo.utils import rconsole
rconsole.spawn_server()

In a console type python scripts/rconsole. Keep in mind that you have to adjust your import search path in order to use the rconsole script without installing the library.

Then you can now directly call all methods in your views from the console. For example, imagine you have a search view, then you can call it with :

>>> from django.http import HttpRequest
>>> request = HttpRequest()
>>> search(request,"debian")
<django.http.HttpResponse object at 0x2bc7490>
>>> search(request,"debian").content
'<?xml version="1.0" encoding="UTF-8"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml">\n  <head>\n    
....
>>>

I’ve to say that using rconsole for debugging it not very useful. pdb or winpdb are much more powerful and versatile. It was worth a try anyway…

Update

After getting in touch with the author of rconsole, I think it is important to put in context this post. I’ve tried rconsole with django in mind. On one hand, I was looking for a debugger that I could use in an early development stage of a project. In this context, I think a bloking debugger can do a much better job then rconsole to help the programmer to inspect variables and insert break points. rconsole is a non-blocking debugger and it is not the right tool.

On the other hand, rconsole can be of great help when debugging a live application when you don’t have the luxury to stop your server. In this regard rconsole is very lightweight and unobtrusive, and I think it can be of great help.

I had the impression I’ve been a bit unfair in my judgment…

python itertools and groupby

who said that ignorance is a bliss didn’t try python :) This is the assignment : you have a list of dictionaries with a field date and you want to group all these dictionaries in a map date -> list of dictionaries with this date.

The first solution that came to my mind was something ugly like :

def group_by_date(qs):
    by_date = {}
    for r in qs :
        l = by_date.get(r['date'],[])
        l.append(r)
        by_date[r['date']] = l
    return by_date

for example :

In [36]: group_by_date([{'date' : 1},{'date' : 2}]) 
Out[36]: {1: [{'date': 1}], 2: [{'date': 2}]}

6 lines of python !!! unacceptable. It hurt my eyes and it is not easy to read. The good people on the #python irc channel adviced me to check the collections.defaultdict and this is actually pretty neat. Now I can write something like

from collections import defaultdict
def group_by_date(qs):
    by_date = defaultdict(list)
    for r in qs :
        by_date[r['date']].append(r)
    return by_date

In [47]: group_by_date([{'date' : 1},{'date' : 2}]) 
Out[47]: defaultdict(<type 'list'>, {1: [{'date': 1}], 2: [{'date': 2}]})

Nice, but still … and we can do better ! itertools.groupby on the rescue :

from itertools import groupby
qs = [{'date' : 1},{'date' : 2}]
[(name, list(group)) for name, group in itertools.groupby(qs, lambda p:p['date'])]

Out[77]: [(1, [{'date': 1}]), (2, [{'date': 2}])]

Ah ! Nirvana :) I’ve to admit the most readable solution is using defaultdict, but this solution using groupby is a wonderful power-tool. If you understand list comprehension, this is a very natural solution to the problem.