
This work is licensed under a Creative Commons Attribution-Share Alike 2.0 France License.
And don't miss the talk from Ralf at the crossdistro devroom [1] ...
Sat 05/02 18:00 - 19:00: Mancoosi tools for the analysis and quality assurance of FOSS distributions (Ralf Treinen)
[1]http://fosdem.org/2011/preview-saturday#crossdistro_devroom
There is an easier method to do all this using gbp-clone as described here. Ah !
Then to build the package, you just need to suggest git-buildpackage where to find the pristin-tar :
or you could simply describe (as suggested) the layout in debian/gbp.conf.
Easy !!!
I've found a lot of different recipes and howtos about git debian packaging, but I failed to find one simple recipe to create a debian package from scratch when upstream is using git. Of course the following is a big patchwork from many different sources.
First we need to do a bit of administrative work to setup the repository :
Then, since I'm interested in tracking upstream development branch I'm going to add a remote branch to my repo:
at this point I need to fetch upstream and create a branch for it.
Now in my repo I have a master branch and an upstream branch. So far, so good. Let's add the debian branch based on master:
It's in the debian branch where I'm going to keep the debian related files. I'm finally read for hacking git add / git commit / git remove ...
When I'm done, I can switch to master, merge the debian branch into it and use git-buildpackage to build the package.
Suppose I want to put everything on gitourious for example. I'll create an acocunt, set up my ssh pub key and then I've to add an origin ref in my .git/config . Something like :
The only thing left to do is to push everything on gitourious. the --all is important.
People willing to pull your work from girourious have to follow the following script :
Maybe there is an easier way to pull all remote branches at once, but I'm not aware of it. Any better way ?
And this is all True !!! well, for the 5 people reading this blog, I assure you, I'm not selling your data or tracking you in any way :)
At COMPANY _______ we value your privacy a great deal. Almost as much as we value the ability to take the data you give us and slice, dice, julienne, mash, puree and serve it to our business partners, which may include third-party advertising networks, data brokers, networks of affiliate sites, parent companies, subsidiaries, and other entities, none of which we’ll bother to list here because they can change from week to week and, besides, we know you’re not really paying attention.
We’ll also share all of this information with the government. We’re just suckers for guys with crew cuts carrying subpoenas.
Remember, when you visit our Web site, our Web site is also visiting you. And we’ve brought a dozen or more friends with us, depending on how many ad networks and third-party data services we use. We’re not going to tell which ones, though you could probably figure this out by carefully watching the different URLs that flash across the bottom of your browser as each page loads or when you mouse over various bits. It’s not like you’ve got better things to do.
Each of these sites may leave behind a little gift known as a cookie -- a text file filled with inscrutable gibberish that allows various computers around the globe to identify you, including your preferences, browser settings, which parts of the site you visited, which ads you clicked on, and whether you actually purchased something.
Those same cookies may let our advertising and data broker partners track you across every other site you visit, then dump all of your information into a huge database attached to a unique ID number, which they may sell ad infinitum without ever notifying you or asking for permission.
Also: We collect your IP address, which might change every time you log on but probably doesn’t. At the very least, your IP address tells us the name of your ISP and the city where you live; with a legal court order, it can also give us your name and billing address (see guys with crew cuts and subpoenas, above).
Besides your IP, we record some specifics about your operating system and browser. Amazingly, this information (known as your user agent string) can be enough to narrow you down to one of a few hundred people on the Webbernets, all by its lonesome. Isn’t technology wonderful?
The data we collect is strictly anonymous, unless you’ve been kind enough to give us your name, email address, or other identifying information. And even if you have been that kind, we promise we won’t sell that information to anyone else, unless of course our impossibly obtuse privacy policy says otherwise and/or we change our minds tomorrow.
We store this information an indefinite amount of time for reasons even we don’t fully understand. And when we do eventually get around to deleting it, you can bet it’s still kicking around on some network backup drives in somebody’s closet. So once we have it, there’s really no getting it back. Hell, we can’t even find our keys half the time -- how do you expect us to keep track of this stuff?
Not to worry, though, because we use the very bestest security measures to protect your data against hackers and identity thieves, though no one has actually ever bothered to verify this. You’ll pretty much just have to take our word for it.
So just to recap: Your information is extremely valuable to us. Our business model would totally collapse without it. No IPO, no stock options; all those 80-hour weeks and bupkis to show for it. So we’ll do our very best to use it in as many potentially profitable ways as we can conjure, over and over, while attempting to convince you there’s nothing to worry about.
(Hey, Did somebody hold a gun to your head and force you to visit this site? No, they did not. Did you run into a pay wall on the home page demanding your Visa number? No, you did not. You think we just give all this stuff away because we’re nice guys? Bet you also think every roomful of manure has a pony buried inside.)
This privacy policy may change at any time. In fact, it’s changed three times since we first started typing this. Good luck figuring out how, because we’re sure as hell not going to tell you. But then, you probably stopped reading after paragraph three.
(Source : http://www.itworld.com/print/129778 )
Sometimes I think I'm a bit lazy to change my habits.
I've been looking at tiling window managers for a while, but I always failed to adopt one because of the big shift in habits it would have implied. This time, thanks to zack's applet I decided to jump on the awesome bandwagon. And I've to say that I'm really happy about it. Since now I had never noticed how annoying was to move windows around. It is true that in the last year I've used guake as main console. Since it is a drop down and tabbed console, effectively it saved me the stress of placing a new window all the time I needed a new console. However there were still a lot of applications that were popping windows everywhere and I had no choice to find a place for them.
Awesome solves all these problems. Windows are positioned automatically. You can easily change from one layout to another using a key combination, it is very flexible (it's configuration file is a program written in lua !!) and now thanks to zack's applet is perfectly integrated with my gnome desktop. It basically replaces out of the box metacity (the default gnome window manager) just doing what awesome was meant to do : the window manager. You can disable all the extra features like the awesome panel and menus from the panel, and keep using gnome for everything else.
The other great tool I've just discovered thanks to an article on arstechnica is gtg . This is also a very handy tool. I've been using sticky notes for quite a while (both electronic and real ) but I'm far to be satisfied with it. I refuse to use tomboy as it uses mono and I prefer to avoid it (on religious grounds) . GTG follows the Get Things Done GTD methodology. At the end is just a friendly note taking tool with a lot plug-ins designed to be used with a keyboard. I've just started to use it and despite it already crashed on me few times I like it a lot.
Third and last (for this blog post) tool is a mind mapping tool. I've done mind mapping for a while : let's say that I'm not a compulsive mind mapper but I enjoy putting things in place when I've time. In the past I've used freemind that is nice, but it crashed on me too many times. Alas, is written in java, and given my allergy to this language, I think this is just bad karma flowing in both directions... A couple of weeks ago I stumbled upon another mind mapping tool vym that is much more stable and a real pleasure to use. It didn't crash once yet, it is very usable from the keyboard (essential when you are taking notes !) and it has a nice look & feel and a rich feature set. I'm happy it it.
I tried out chromium in the last 3 weeks. It is clearly faster then iceweasel/firefox . One one hand this is because of the architecture of chromium itself. People at google worked very hard to develop a competitive and lean browser. On the other hand, I think because on the firefox side I use a lot of extensions that can cripple performances and make the comparison a bit unfair. I should try yo start from scratch with with iceweasel using a clean profile and see how it goes.
Since everybody is trumpeting about this new browser I felt obliged to give it a try. Well... it doesn't cut it for me.
In conclusion I think chromium is not there for me yet. There are too many downside to justify the move. I'm sure I would reconsider my position in 6 months and I really hope that the google developers will show a bit more of support for the community.
This is the second post about distcheck. I want to give a quick overview of the differences between edos-distcheck and the new version. First despite using the same sat solver and encoding of the problem, Distcheck has been re-written from scratch. Dose2 has several architectural problems and not very well documented. Adding new features had become too difficult and error-prone, so this was a natural choice (at least for me). Hopefully Dose3 will survive the Mancoosi project and provide a base for dependency reasoning. The framework is well documented and the architecture pretty modular. It's is written in ocaml, so sadly, I don't expect many people to join the development team, but we'll be very open to it.
These are the main differences with edos-debcheck .
distcheck is about two times faster than edos-debcheck (from dose2), but it is a "bit" slower then debcheck (say the original debcheck), that is the tool wrote by Jerome Vouillon and that was then superseded in debian by edos-debcheck. The original debcheck was a all-in-one tool that did the parsing, encoding and solving without converting the problem to any intermediate format. distcheck trades a bit of speed for generality. Since it is based on Cudf, it can handle different formats and can be easily adapted in a range of situation just by changing the encoding of the original problem to cudf.
Below there are a couple of test I've performed on my machine (debian unstable). The numbers speak alone.
The second big difference is about different input format. In fact, at the moment, we have two different tools in debian, one edos-debcheck and the other edos-rpmcheck. Despite using the same underlying library these two tools have different code bases. distcheck basically is a multiplexer that convert different inputs to a common format and then uses it (agnostically) to solve the installation problem. It can be called in different ways (via symlinks) to behave similarly to its predecessors.
At the moment we are able to handle 5 different formats
distcheck handles gz and bz2 compressed file transparently . However if you care about performances, you should decompress your input file first and the parse it with distcheck and it often takes more time to decompress the file on the fly that run the installability test itself. There is also an experimental database backend that is not compiled by default at them moment.
Regarding the output, I've already explained the main differences in an old post. As a quick reminder, the old edos-debcheck had two output options. The first is a human readable - unstructured output - that was a handy source of information when running the tool interactively. The second was a xml based format (without a dtd or a schema, I believe) that was used for batch processing.
distcheck has only one output type in the YAML format that aims to be human and machine readable. Hopefully this will cater for both needs. Moreover, just recently I've added the output of distcheck a summary of who is breaking what. The output of edos-debcheck was basically a map of packages to the reasons of the breakage. In addition to this information distcheck gives also a maps between reason (a missing dependency or a conflict) to the list of packages that are broken by this problem.This additional info is off by default, but I think it can be nice to know what is the missing dependency that is responsible for the majority of problems in a distribution...
For example, calling distcheck with --summary :
Below I give a small example of the edos-debcheck output compared to the new yaml based output.
And an extract from the distcheck output (the order is different. I cut and pasted parts of the output here...)
bash (<= 2.7) to check all version of bash in the universe with version greater than 2.7.Here at mancoosi we have been working for quite a while to promote and advance solver technology for FOSS distributions. We are almost at the end of the project and it is important to make the mancoosi technology relevant for the community. On goal of the project is to provide a prototype that uses part of the results of mancoosi that can based to install/remove/upgrade packages on a user machine. We certainly don't want to create yet another meta installer. This would be very time consuming and certainly going beyond the scope of the project. The idea is to create a prototype, that can work as an apt-get drop in replacement that will allow everybody to play with different solvers and installation criteria.
A very first integration step is a small shell script apt-mancoosi that tries to put together different tools that we have implemented during the project. Roberto wrote extensively about his experience with apt-mancoosi a while ago showing that somehow the mancoosi tools are already usable, as proof of concept, to experiment with all solvers participating to the Misc competition.
On notable obstacle we encountered with apt-mancoosi is how to pipe the result of an external solver to apt-get to effectively install the packages proposed as solution. Apt-mancoosi fails to be a drop-in replacement for apt-get exactly for this reason.
The "problem" is quite simple : The idea at the beginning was to pass to apt-get, on the command line, a request that effectively represents a complete solution. We expected that, since this was already a locked-down solution, apt-get would have just installed all packages without any further modification to the proposed installation set. Of course, since apt-get is designed to satisfy a user request, and not just to install packages, we quickly realized that our evil plan was doomed to failure.
The only option left, was to use libapt directly, but the idea of programming in c++ quickly made me to desist. After a bit of research (not that much after all), I finally found a viable solution to our problems in ''python-apt'' that is a low level and wrapper around libapt. This definitely made my day.
Now the juicy details. the problem was to convince apt to completely bypass the solver part and just call the installer. First a small intro. python-apt has an extensive documentation with a couple of tutorials. Using python-apt is actually pretty easy (some snippet from the python-apt doco) :
here we open the cache (apt.Cache is a wrapper around the low level binding in the apt_pkg module), then we update the package list. This is equivalent to apt-get update . Installing a package is equally easy :
Now, the method mark_install of the module package will effectively run the solver to resolve and mark all the dependencies of the package python-apt. This is the default behavior when apt-get is used on the command line. This method has however three optional arguments that are just what I was looking for, that is autoFix, autoInst and fromUser . The explanation from the python-apt doco is quite clear.
What we want is to set autoFix and autoInst to false to completely bypass the solver. So imagine that an external solver can give use a string of the form : bash+ dash=1.4 baobab- that basically asks to install bash to the newest version, dash at version 1.4 and remove baobab. Suppose also that this is a complete solution, that is, all dependencies are satisfied and there are no conflicts.
The work flow of mpm (mancoosi package manager) is as follows :
We already have a first prototype on the mancoosi svn. It's not released yet as we are waiting to do more testing, add more options and make it stable enough for testing. Maybe one day, this will be uploaded to debian.
This is the trace of a successful installation of a package in a lenny chroot. The solver used here is the p2 solver
I think we'll keep working on this python prototype for a while, but this is not certainly what we want to propose to the community. The mancoosi package manager is probably going to be written in Ocaml and integrated with dose3 and libcudf. This will allow us to gain speed and have a solid language to develop with (nothing against python, but we don't feel that a scripting language is suitable for an essential component as a package manager). Time will tell. For the moment this is just vapor-ware ...
The other day I decided to add a small alphabetic filter to search among the broken packages in debian weather. Searching the net for a nice solution I've found few snippets, but none of them struck me as particularly flexible for my needs. I've also found a django module, but it seems to me overly complicated for such a simple thing.
I had a look at the code and I've generalized the _get_available_letters function that given a table and a filed gives you back a the list of letters used in the table for that specific field. I've generalized the code to integrate better with django relational model. Instead of acting directly on a table (using raw sql), I plug the raw sql statement UPPER(SUBSTR(%s, 1, 1)) in the django query using the extra function. The result is pretty neat as you don't need to know the underlying model and you can use this method with an arbitrary queryset. This is of course possible thanks to django laziness in performing sql queries...
This function also gets a request object in order to select the active letter. This is related to the template to display the result of this view.
In my specific case I wanted to have by default all letters with pagination, but then to be able to switch off pagination and select a specific letter. I use two variables to control all this. The first variable,page, comes with the generic view list_details. It is usually a number from 0 to the last page and it is used to control pagination. I've added a value all to switch off pagination altogether setting element_by_page to None . The second variable one is sw that I use to select a specific letter to display.
If at the end of your view, you return a generic view as above, the only thing you need is to add a choises field in your template to display the alphabetic filter that will look something like this :
This is pretty standard as it iterates over the list of letter linking the ones with content. You need to associate a small css to display the list horizontally. Put this is a file an embedd it where you want with an include statement: {% include "forecast/alphabet.html" %} .
The code for my application is here if you want to check out more details. You can have a look at the result debian here.
A while ago I wrote about the new distcheck tool upcoming in dose3. I've recently updated the proposal on the debian wiki to reflect recent changes in the yaml data structure. The idea was to remove redundant information, to make it easier to read and at the same time include enough details to make it easy to use from a script. I'll write down a small example to explain the format. A package can be broken because of a missing package or because of a conflict. For a missing package we'll have a stanza like this :
The first part gives details about the package libgnuradio-dev, specifying its status, source and architecture. The second part is the reason of the problem. In this case it is a missing package that is essential to install libgnuradio-dev. missindep is the dependency that cannot be satisfied is the package libgruel0 , in this case: libboost-thread1.40.0 (>= 1.40.0-1).
The paths component gives all possible depchains from the root package libgnuradio-dev to libgruel0 . Notice that we do not include the last node in the dependency chain to avoid a useless repetition. Of course there might be more then on path to reach libgruel0. Distcheck will unroll all of them. Because of the structure of debian dependencies usually there are not so many paths.
The other possible cause of a problem is a conflict. Consider the following :
This is the general case of a deep conflict. I use an artificial example here instead of a concrete one since this case is not very common and I was not able to find one. To put everything in context, this is the example I've used (it's in cudf format, but I think you get the gist of it):
Another important upcoming change is distcheck (to be implemented soon) it the ability to check if a package is in conflict with an Essential package. In the past edos-debcheck always check the installability of a package in the empty universe. This assumption is actually not true for debian as all essential packages should always be installed. For this reason, now distcheck will check the installability problem not in an empty universe, but in a universe with all essential packages installed.
This check is not going to be fool proof though. Because of the semantic of essential packages, despite is not possible to remove a package toutcourt, an essential package can be replaced by a non essential package via the replace mechanism. For example, poking with this feature I noticed that the package upstart in sid replace sysinit and it is in conflict with it. This is perfectly fine as it gives a mechanism to upgrade and replace essential components of the system. At the same time this does not fit in the edos-debcheck philosophy of checking packages for installation problems in the empty universe (or in a universe with all essential packages installed). At the moment we are still thinking how to address this problem (the solution will be in the long term to add the replace semantic in distcheck), but for the moment we will just provide an option to check packages w.r.t essential packages conscious the this can lead to false positives.
This work is of course done in collaboration with the mancoosi team in paris.
Dose3 is still not ready for prime time. We are preparing debian packages and we plan to upload them in experimental in the near feature.
While writing scientific papers often we feel this need to add evidence and data to our claims. This can be attained in different ways : tables, graphs, or nice pictures (or something else if you feel creative). The point is, that to produce this data, I often end-up writing ad-hoc scripts to analyze my data involving a million awk. sed, sort, unique etc etc ...
What I want is a more productive work flow to streamline the boring pipeline Producer | Analyzer | latex. First I need a suitable output for to collect data from my experiments. In the past I often collected row data in a non structured format, then used some kind of parser to extract the important information for a particular figure. Printing non structured data is a plain bad idea and a pain, as it has to be parsed again before it can be used. Moreover reusing an old parser is often difficult as the nature of the experiment can be completely different and so the format of the output.
The solution to this problem is to adopt a structured data structure to print your results. This will cut the need to re-write a new parser all the time, and also to try to be more consistent over all my experiments. The format itself is not very important. It can be xml for example or, if you are not so masochist, a something following the json or yaml standards. I've choose yaml that is a meta language designed to be at the same time human and machine readable. Yaml it's fairly easy to produce and very well supported in many programming languages. In particular, yaml is a superset of json, so for simple data structure you can also think of reusing a json printer if you don't have a yaml printer.
The second step is to parse and analyze the experimental data. I often accomplish this step in python. The choice here is quite simple: mangling text with python is very easy, there are a lot of libraries (both natives and bindings), and a very nice parser for yaml. If this is not enough python-numeric, python-matplotlib and python-stats should convince to adopt it for this task. Surely perl is another choice, but my sense of aesthetic doesn't allow me to go that way.
The third and final step is convert everything to latex. Yes, it is true that I could generate a latex-compatible output directly with python, but this would make the pipeline a bit less flexible as I might want to use the same data in a a web page, for example, without having to write a second printer for html. The solution is to have a generic csv printer and then perform the final conversion with an off-the-shelf tool.
For latex for example, and actually the entire post is about this, I've discovered the module '''datatool'''. This module is a pretty neat solution to embed csv tables (and I think it supports other formats as well) directly into your latex document and taking care of the formatting directly in the document.
For example, consider this sample data in csv format :
It is a simple '|' separated file, it has 5 columns and no header (I don't like commas). The datatool latex package is part of the texlive-latex-extra in debian and to use it you just need to add \usepackage{datatool} to you preamble. Now to produce a nice looking latex table, you first need to load the file with the \DTLloaddb command (and you have to specify a proper separator). Without hesitate further, you can now just use the \DTLdisplaydb{table1}'' command to produce the table. Awesome !
But this is not very nice as there are fields that you don't want to display. The datatool package is actually pretty flexible and this is how you print a table with only three columns :
There are a lot of nice short-cuts to print your table. Looking at the documentation it looks like a very powerful too to have. This made my day.
Recent comments
3 weeks 11 hours ago
3 weeks 3 days ago
11 weeks 5 days ago
12 weeks 5 days ago
17 weeks 37 min ago
23 weeks 3 days ago
25 weeks 15 hours ago
26 weeks 5 days ago
29 weeks 5 hours ago
29 weeks 4 days ago