KVM, DRBD, failover and backups

I’ve created a number of virtual machines as I’ve reinstalled my servers.  My old setup had lots of services all running on one main server, and had become a little fragile.  Upgrading any one aspect without breaking others was tricky, and I had poor documentation of what I’d done.  So in the new install I created a set of virtuals, both to ease the migration, and to give me a more stable setup.

The basic process here was that I:

  1. Upgraded my second server (test server) to a 64-bit kernel, which involves a full reinstall.  Got that running adequately
  2. Installed a couple of virtuals – these are development environments (a services node with mysql, git and a mediawiki; an apps node with ruby on rails) for something else I’m doing – I’ve been writing a few things in ruby over the last couple of years, most notably a watering system that I built and want to upgrade
  3. Stabilised that, and made sure it was all running reasonably stably
  4. Moved the core services off my main server – the media backend as one server, a mail server, a web server.  This leaves basically NFS and Samba on the main server
  5. Once that’s stable, then reinstall the main server, which now has much less stuff running on it to go wrong
  6. Then get the virtuals mobile across the two servers

This post deals with that last step – getting both the virtuals mobile so I can move from one to the other, and also a  method to back them up without taking them offline.

The technology I chose to use was DRBD, which is a distributed block device.  Basically this is a mirror across two servers, which runs active/passive – so one node is active, the other gets replication with a slight lag.  There are a bunch of different instructions on the web on how to do this, so I guess this is another one, again particular to my setup.

What I built was:

mdraid -> LVM device -> drbd device -> KVM virtual

So I have a standard set of raid devices (which I’ve been talking about in other posts).  These are all loaded into a volume group using LVM, and so far, so standard.

I then carve out a logical volume on each of the two servers for each virtual I’m creating – so for example, for the web server, I created a volume called “external-web-server” on each server, about 20G in size.

I then create a drbd resource for this, the resource file looks something like this:

resource external-web-server {
protocol C;
startup {
  # wfc-timeout  15;
  # degr-wfc-timeout 60;
}
syncer {
  verify-alg md5;
  rate 10M;
}
net {
  # cram-hmac-alg sha1;
  shared-secret "SecretPassword";
}
on server {
  device /dev/drbd_external-web-server minor 1;
  disk /dev/vgRaid/external-web-server;
  address 192.168.1.4:7790;
  meta-disk internal;
}
on testserver {
  device /dev/drbd_external-web-server minor 1;
  disk /dev/vgMain/external-web-server;
  address 192.168.1.7:7790;
  meta-disk internal;
}
}

I then issue some commands to bring up the drbd system across the servers.  We issue these on both servers:

  drbdadm create-md external-web-server
  drbdadm up external-web-server

This should bring both devices up, but they’re not synchronised yet.  To get them to synchronise, I have to consistently issue an invalidate on the secondary server, then set the other server to primary.

server2:  drbdadm invalidate external-web-server
server1:  drbdadm primary external-web-server

You can issue either

  drbd-overview

or

  cat /proc/drbd

to see the status on this, it takes a while to synch.

Once this is all up and synchronised (well, actually, you can do it straight away, but I generally found it easier to leave that synching and come back to it later) you can then install your VM on top of your drbd device – in this case

  /dev/drbd_external-web-server

At this point, you have a VM running on the primary machine, with a replica of the data on the secondary server.  So, there are two actions I wanted to be able to take.

Firstly, move that virtual from one machine to the other.  You can’t live migrate this configuration (that I have found yet), but so long as you have the virtual machine config on both servers (in my case, in /etc/libvirt/qemu/external-web-server.xml), then you can do the following:

server1: shutdown the VM on the server1 (I use virtual machine manager for this, there is probably a command line for it)

  server1:  drbdadm secondary external-web-server
  server2:  drbdadm primary external-web-server

server2: start the VM on server2 (again, VMM for this)

Simple, machine moved from one server to the other.  Better yet, if you try to start the machine on a machine that’s currently secondary, it refuses, so pretty safe.

Next, I wanted to backup.  On the web most people seem to be recommending LVM snapshots, but given I put drbd on top of LVM, it isn’t so easy.  I went instead with breaking the replication, backing up, then restarting the replication.  There is a configuration called “split brain” that deals with this, and so long as the secondary has no updates (i.e. don’t mount it) then it causes no issues.

Firstly, configure the split brain.  This is a little bit different between versions of drbd, I’m on 8.3, so I put this into the net section of global options:

  after-sb-0pri discard-zero-changes;
  after-sb-1pri call-pri-lost-after-sb;
  after-sb-2pri disconnect;

Then, when you want to to the backup, on the secondary node you issue:

  drbdadm disconnect external-web-server
  drbdadm primary external-web-server
  dd if=/dev/drbd_<resource> | gzip -9 > /home/backups/virtuals/drbd_<resource>.gz &

Once the backup completes:

  drbdadm secondary external-web-server
  drbdadm connect

Check /var/log/syslog to make sure it reconnects properly, and no issues arise.

Overall, very happy so far.

Advertisements

2 thoughts on “KVM, DRBD, failover and backups

  1. Pingback: Virtual machine backup size | technpol

  2. Pingback: Part Two: Install a virtualisation and a base virtual server | technpol

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s