how to create VMs with ganeti / xen and dnsmasq

I’ll start here a small series of posts about ganeti, xen and puppet. For my work I run few servers sitting on xen and it has always been a bit of a pain to create a new instance and keep it up to date. Up to now I’ve used the excellent xen-create-image tool to create my VMs, but I wanted to try something new and more sexy… Last week I finally found some time to learn (and a spare box to run my experiments) how to use ganeti. Ganeti is the only tool I tried out, but it seems to fit the bill for my use and it seems polished and mature project to me… Moreover I’ve seen a presentation about it in every FLOSS conference I’ve attended in the last few years and I thought it was time to give it a try.

Installing and configuring ganeti is fairly easy, there is a lot of documentation available and this post is not going to be about installing it, but rather how to create a new bare instance with ganeti-deboostrap-instance. There is also a way to create a new instance from an image, but I didn’t go that way yet.

This first post is about the first problem I’ve encountered, that is, how to automatically assign a network address and a name to each new instance created by gnt-instance add. Since all my instances should be able to communicate together on the same subnet, I’ve decided to configure xen to create a NATted private network and add each new instance to this network.

The first step is to create an interface in /etc/network/interfaces .

auto xen-br0
iface xen-br0 inet static
    address 10.0.0.1
    netmask 255.255.255.0
    bridge_stp off
    bridge_fd 0
    bridge_ports none

This is the standard debian way but since xen uses a different naming convention (here I’m using ganeti naming convention xenbr0 vs xen-br0), I need to convince tell xen what I intend to do by adding these lines in /etc/xen/xend.config :

(network-script 'network-virtual bridgeip="10.0.0.1/24" brnet="10.0.0.0/24" bridge="xen-br0"')
(vif-script     vif-bridge)

Next I have to connect my real network interface to the private network using few iptables rules in /etc/rc.local (probably there is a better place to do this…):

echo 1 > /proc/sys/net/ipv4/ip_forward
/sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
/sbin/iptables -A FORWARD -i eth0 -o xen-br0 -m state --state RELATED,ESTABLISHED -j ACCEPT
/sbin/iptables -A FORWARD -i xen-br0 -o eth0 -j ACCEPT

The xen setup is complete and every new image should have a vif connected to the subbet 10.0.0.0. The xen setup corresponds to the physical wiring of the network. The next step is to configure each instance so to allow them to communicate on this subnet. Since I build my VMs using ganeti-debootstrap-instance, and by default debootstrap does not configure the network, we need to add a new hook in the directory /etc/ganeti/instance-debootstrap/hooks.

#!/bin/bash
if [ -z "$TARGET" -o ! -d "$TARGET" ]; then
  echo "Missing target directory"
  exit 1
fi

if [ ! -d "$TARGET/etc/network" ]; then
  echo "Missing target network directory"
  exit 1
fi

if [ -z "$NIC_COUNT" ]; then
  echo "Missing NIC COUNT"
  exit 1
fi

if [ "$NIC_COUNT" -gt 0 ]; then

  cat > $TARGET/etc/network/interfaces <<EOF
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

EOF

fi

DAEMON_PID_FILES="/var/run/dnsmasq.pid /var/run/dnsmasq/dnsmasq.pid"

instance=$INSTANCE_NAME
[ -n "$instance" ] || exit 1
nic_count=$((NIC_COUNT - 1))
mac_var="NIC_${nic_count}_MAC"
echo $mac_var
echo $nic_count
mac=${!mac_var}
echo $mac
echo "dhcp-host=$mac,$instance" > /etc/dnsmasq.d/$instance.conf

This hook will do two things. First it will configure the interfaces of the new instance to get configured using dhcp, second, it will add an entry to the dnsmasq configuration file to make this instance known to the world. This basically boils down to add a file in /etc/dnsmasq.d/ with the mac address of the new instance and its designated name. Dnsmasq will then provide an ip address for this instance and add it to the dns.

dhcp-host=aa:00:00:24:6c:8a,node1

Configuring dnsmasq is pretty easy as well. First I want it to answer dhcp queries only on the internal network, second I want to configure my clients passing 10.0.0.1 as nameserver and gataway. You can just add the following lines in /etc/dnsmasq.d/general to get it going.

interface=xen-br0
interface=lo
dhcp-range=10.0.0.128,10.0.0.250
domain=localnet.org,10.0.0.128,10.0.0.250
dhcp-option=3,10.0.0.1
bogus-priv
#expand-hosts
local=/localnet.org/

To create your new instance you can just run the following command :

gnt-instance add -t plain -s 5g -B memory=1024 -o  debootstrap+unstable --no-ip-check --no-name-check node1

If you are running your dom0 on debian squeeze before running this command you should configure ganeti to pass the right xen parameters to the newly created instance :

gnt-cluster modify --hypervisor-parameter xen-pvm:root_path='/dev/xvda1'
gnt-cluster modify --hypervisor-parameter xen-pvm:initrd_path='/boot/initrd-2.6-xenU'

I use —no-ip-check and —no-name-check to skip ip and dns check performed by ganeti and to avoid a sort of chicken-egg problem, where the name and address of this new instance is yet unknown to dnsmasq and that node1 is the name that will be used by the hook to add an entry in the dnsmasq configuration. debootstrap+unstable is a variant of the default configuration and you need to add it to the list of variants used by ganeti-deboostrap-instance.

That should be it. The new instance should come up with a dynamically assigned ip address, able to talk to the outside world and automatically known by all the other machine on the subnet via dns.

Next post will be about how to add a swap hook for ganete-debootstrap-istance.


Xen brouter setup

A while ago I received a new desktop machine (8 cores, 8Gb of memory …) at work. Since for the moment I kinda happy to work on my laptop using an external screen, I decided to put the hw to a good use and to explore a bit more some more exotic (at least for me) xen features.

In particular I spend half a day playing with different xen network settings. The bridge model, that should work out of the shelf, is the easiest one. To setup this up, you basically need to specify a couple of options in the xend-config file and you’re done. This is the “default” network configuration and is should work out of the box in most situations. Using this method, since all VMs’ interfaces are bridged together (surprise !) with the public interface, your network card is left in promiscuous mode (not a big problem if you ask me…). Once your VMs are up, you can then decided to use your default dhcp server, autoconf your VMs with ipv6 only, or do nothing as you please.

An other popular method, albeit a bit more complex, is to setup a natted network using the script network-nat (this one is an evolution of the third method that is ‘network-routed’) . I played with it, but since I wanted to have all my DomU on the same subnet, this setup wasn’t satisfying for me. In particular, by default, ‘network-nat’ assigns a different subnet to each DomU. Using the natted set up you can also configure a local dhcp server to give private IPs to your VMs all done transparently by the xen network scripts. I’ve noticed that there is a bug in the xen script that does not make it very squeeze friendly. Since the default dhcp server in squeeze is isc-dhcp and few configuration files got shuffled in the process (notably /etc/dhcp3.conf is not /etc/dhcp/dhcp.conf) , the script needs a little fix to work properly. I’ll report this bug sometimes soon…

Goggling around I found a different setup that is called brouter, that is a hybrid between a bridge configuration and a routed configuration. This is the original (??) article well hidden in an old suse wiki.

I’ve done few modifications here to add natting. So basically, all virtual interfaces connected each to one DomU are linked together by a bridge (xenbr0). The bridge with address 10.0.0.1 is also the router of the subnet. All DomU are configured to used dhcp that assigns a new ip and specifies the router of the subnet. The dhcp server is configured to answer requests only on the xenb0 interface avoiding problems on the public network…

routing is configured using iptables :

    iptables -t nat -A POSTROUTING -o ${netdev} -j MASQUERADE
    iptables -A FORWARD -i ${bridge} -j ACCEPT
    echo 1 >/proc/sys/net/ipv4/ip_forward
    /etc/init.d/isc-dhcp-server restart

Note that since the dhcp server is configured to give addresses only on the virtual network, we need to restart it after creating the bridge interface, otherwise isc-dhcp-server will refuse to run. Mum says that I should configure the bridge in /etc/network/interfaces to make the dhcp server happy at startup, but I felt a bit lazy, so I let to task to xen…

In the next episode, I’ll add ipv6 connectivity to the virtual subnet and then start playing with puppet… ipv6 is almost done, puppet… I started with the doc…

The complete script from the suse wiki and with my modifications is below (only lightly tested):

#!/bin/sh
#============================================================================
# Default Xen network start/stop script.
# Xend calls a network script when it starts.
# The script name to use is defined in /etc/xen/xend-config.sxp
# in the network-script field.
#
# This script creates a bridge (default xenbr${vifnum}), gives it an IP address
# and the appropriate route. Then it starts the SuSEfirewall2 which should have
# the bridge device in the zone you want it.
#
# If all goes well, this should ensure that networking stays up.
# However, some configurations are upset by this, especially
# NFS roots. If the bridged setup does not meet your needs,
# configure a different script, for example using routing instead.
#
# Usage:
#
# vnet-brouter (start|stop|status) {VAR=VAL}*
#
# Vars:
#
# bridgeip   Holds the ip address the bridge should have in the
#            the form ip/mask (10.0.0.1/24).
# brnet      Holds the network of the bridge (10.0.0.1/24).
# 
# vifnum     Virtual device number to use (default 0). Numbers >=8
#            require the netback driver to have nloopbacks set to a
#            higher value than its default of 8.
# bridge     The bridge to use (default xenbr${vifnum}).
#
# start:
# Creates the bridge
# Gives it the IP address and netmask
# Adds the routes to the routing table.
#
# stop:
# Removes all routes from the bridge
# Removes any devices on the bridge from it.
# Deletes bridge
#
# status:
# Print addresses, interfaces, routes
#
#============================================================================

#set -x

dir=$(dirname "$0")
. "$dir/xen-script-common.sh"
. "$dir/xen-network-common.sh"

findCommand "$@"
evalVariables "$@"

vifnum=${vifnum:-0}
bridgeip=${bridgeip:-10.6.7.1/24}
brnet=${brnet:-10.6.7.0/24}
netmask=${netmask:-255.255.255.0}
bridge=${bridge:-xenbr${vifnum}}

##
# link_exists interface
#
# Returns 0 if the interface named exists (whether up or down), 1 otherwise.
#
link_exists()
{
    if ip link show "$1" >/dev/null 2>/dev/null
    then
        return 0
    else
        return 1
    fi
}


# Usage: create_bridge bridge
create_bridge () {
    local bridge=$1

    # Don't create the bridge if it already exists.
    if [ ! -d "/sys/class/net/${bridge}/bridge" ]; then
        brctl addbr ${bridge}
        brctl stp ${bridge} off
        brctl setfd ${bridge} 0
    fi
    ip link set ${bridge} up
}

# Usage: add_to_bridge bridge dev
add_to_bridge () {
    local bridge=$1
    local dev=$2
    # Don't add $dev to $bridge if it's already on a bridge.
    if ! brctl show | grep -wq ${dev} ; then
        brctl addif ${bridge} ${dev}
    fi
}

# Usage: show_status dev bridge
# Print interface configuration and routes.
show_status () {
    local dev=$1
    local bridge=$2

    echo '============================================================'
    ip addr show ${dev}
    ip addr show ${bridge}
    echo ' '
    brctl show ${bridge}
    echo ' '
    ip route list
    echo ' '
    route -n
    echo '============================================================'
    echo ' '
    iptables -L
    echo ' '
    iptables -L -t nat
    echo '============================================================'

}

op_start () {
    if [ "${bridge}" = "null" ] ; then
        return
    fi

    create_bridge ${bridge}

    if link_exists "$bridge"; then
        ip address add dev $bridge $bridgeip
        ip link set ${bridge} up arp on
        ip route add to $brnet dev $bridge
    fi

    if [ ${antispoof} = 'yes' ] ; then
        antispoofing
    fi
    iptables -t nat -A POSTROUTING -o ${netdev} -j MASQUERADE
    iptables -A FORWARD -i ${bridge} -j ACCEPT
    echo 1 >/proc/sys/net/ipv4/ip_forward
    /etc/init.d/isc-dhcp-server restart
}

op_stop () {
    if [ "${bridge}" = "null" ]; then
        return
    fi
    if ! link_exists "$bridge"; then
        return
    fi

    ip route del to $brnet dev $bridge
    ip link set ${bridge} down arp off
    ip address del dev $bridge $bridgeip
    ##FIXME: disconnect the interfaces from the bridge 1st
    brctl delbr ${bridge}
    /etc/init.d/isc-dhcp-server restart
}

case "$command" in
    start)
        op_start
        ;;

    stop)
        op_stop
        ;;

    status)
        show_status ${netdev} ${bridge}
        ;;

    *)
        echo "Unknown command: $command" >&2
        echo 'Valid commands are: start, stop, status' >&2
        exit 1
esac

More on Xen 4.0 setup on squeeze

Date Tags debian / xen

After the upgrade of last week, I didn’t have any major problems : xen 4 seems pretty stable and does its job well. One problem I encountered the other day was about the dom0 balloning. By default, xen sets dom0_min_mem to 196Mb and balloning set to true. This is all and good untill you try to use too much memory for your VMs, squeezing dom0 to its minimum amount of memory and causing all sort of problems. On the xen wiki, they reccomend as best practice to reserve a minimum of 512Mb to dom0 for its operations. This is done but setting dom0_mem=512M on the grub command line and the adjusting enable-dom0-ballooning to no and dom0-min-mem accordingly to the amount of memory you choose.

On debian, you can set the grub command line once for all just by adding in /etc/default/group the conf variable

GRUB_CMDLINE_XEN="dom0_mem=512M"

Another small problem is related to the reboot sequence. Since I’m using lvm on aoe, the default shutdown sequence (network down first, lvm later) is not going to work for me. As I’ve few lvm volumes on aoe and others on the physical disk, the proper solution to this problem is to write a custom shutdown script for the aoe lvm volumes and make it run before deconfiguring the network interfaces. In the mean time, to avoid the kernel hanging there forever, I’ve added these lines to in /etc/syscntl.d/panic.conf

# Reboot 5 seconds after panic
kernel.panic = 5

# Panic if a hung task was found
kernel.hung_task_panic = 1

# Setup timeout for hung task to 300 seconds
kernel.hung_task_timeout_secs = 120

This will instruct the kernel to panic and then reboot if there a task will not respond for more then 120 seconds.


xen 4 on debian squeeze

Date Tags debian / xen

It’s time to upgrade my xen servers to squeeze. I’ve already put this off too long and now I’ve to task to go from etch to squeeze in one long step. In order to avoid problems I just did a first upgrade etch -> lenny and then to squeeze. However, since so much has changed in the meantime, and so much twicking of essential components is needed (such as Xen !), I guess I could have gone directly from etch to squeeze in one go, and fix everthing in the process… Anyway, to late for this kind of considerations.

The xen debian wiki is full of invaluable information. Kudos to the xen team for their hard work. To get you started on squeeze you need to install the xen hypervisor. Everything is provided by one package:

aptitude install xen-linux-system-2.6-xen-amd64 xen-hypervisor-4.0-amd64

This will pull the latest linux kernel and xen-hypervisor to run on dom0 .

By default the hypervisor is probably not going to be the default kernel. If you want to change this, you should edit the grub default values :

vi /etc/default/grub

to make sure that the default kernel on dom0 is the xen-hypervisor. This is tricky, because grub let you define a default w.r.t the list of available kernels. so if you install a new kernel, you have to change the default accordingly with the list of kernels in /boot/grub/grub.cfg. It would be nice if I could define the default kernel with a label instead of a number… ( ref #505517 )

Alternatively, as suggested in the wiki, you can just move the xen kernel out of the way …

mv -i /etc/grub.d/10_linux /etc/grub.d/50_linux

When installing xen related tools, aptitude will also probably install by default rinse and xenwatch. The first one is to boostrap redhat machines and maybe you don’t need it. The second one is a GUI and will pull in a lot of X-related dependencies. If we have similar needs, you can just remove what is not needed…

aptitude purge rinse rpm rpm-common
aptitude purge xenwatch

Something that is new, is the new schema for virtual devices. Now all vms will see /dev/xdva1 instead of /dev/sda1 as before. This needs to be changed in the domU as well as in the xen config files (/etc/xen/vm.cfg).

One fantastic news is that xen 4 now uses pyGrub. It is not mandatory (so if you want, you can stick with the old configuration file). But if you use pygrub, on the domU you can install whatever kernel you want. Finally, your users will have complete freedom to pick and choose their kernels !

There was a small detail I didn’t notice on the debian wiki, that is, if you try to use grub2 in squeeze, it will fail when probing the device#601974 . The workaround described in the wiki is to use xvd{a,b,c,…} as device names (and not xvd{1,2,3,…}) to make grub happy. Once you have changed then naming schema, grub will be able to see the disks and install the bootloader. Another solution is to install the os-prober from unstable / experimental. It seems a patch is on the works.

On newly created images, you can also pass the —scsi parameter to xen-create-image to ignore this problem altogeher… I’m not sure if this will have other implications…

The console name is also changes from tty to xhv0 . To get back the console you should add this line in the inittab of all you VMs.

vc:2345:respawn:/sbin/getty 38400 hvc0

A last note is about the [http://blog.xen.org/index.php/2011/01/14/linux-2-6-37-first-upstream-linux-kernel-to-work-as-dom0/ merge upstream of the xen patch !!] \o/ yeiiii !!


backup xen images with dar

I’ve modified a script to backup live xen images with dar. This script uses lvm to snapshot a running VM disk, then mount it read only and uses dar to create an incremental backup. The script is a derivative of a script I’ve found on the net [1]. There is still a small problem with journaled file system that even if the fs is frozen before taking the snapshot, for some reason, even if I mount it read only, the kernel module tries to go through the journal to recover the fs. I’m worried that this might lead to data corruptions… There is this old thread [2] shading a bit of light on the problem.

The script is pretty simple. To create a full backup of a xen domain the command line is:

./xenBackup.sh -d domainname

to create an incremental backup :

./xenBackup.sh -d domainname -i 1

where -i is the sequence number of the incremental backup. Of course you need the previous incremental backup for the operation to be successful (if i = 1, you need a full backup) . You can use this script from cron, running a full backup on sunday and an incremental backup every day of the week.

Script attached.

[1] http://www.johnandcailin.com/blog/john/backing-your-xen-domains [2] http://www.nabble.com/Xen-backups-using-LVM-Snapshots-td19988096.html