How to compact a QCOW2 or a VMDK file

When you create a Virtual Machine (VM), you usually have the option of use a format that reserves the whole size of the disk (e.g. RAW), or to use a format that grows according to the used space in the disk (e.g. QCOW or VMDK).

The problem is that the space actually used in the disk grows as the disk files are written, but it is not decreased as they are deleted. But if you writed a lot of files and you deleted after they were needed, you’d probably have a lot of space reserved for the VMDK file, while that space is not actually used. I wanted to reclaim that space, to move the VMs using less space, and so this time…

I learned how to compact a VMDK file (the same method applies to QCOW2)

The method is, in fact, very easy… you simply have to re-encode the file using the same output format. If you have your original-disk.vmdk file, you simply have to issue a command like this one:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

And that will make the magic (easy, isn’t it?).

But if you want to compact it more, you can claim more space from the disk before re-enconding the disk. First, I’d go to the solution and then I’ll explain it:

If the VM is a linux-based, you can boot it and create a zero-file, and once the file has exhausted the disk, delete it:

$ dd if=/dev/zero of=/tmp/zerofile.raw...$ rm /tmp/zerofile.raw

If the VM is a Windows-based, you can get the command sdelete from Microsoft website decompress it and execute the following commandline:

c:\> sdelete -z c:

Now you can power off the VM and issue the qemu-img command. You’ll get a file that correspond to only the used space in the disk:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

Explanation

(Disclaimer: Please take into account that this is a simple and conceptual explanation)

If you knew about how the disks are managed, you’d probably know that when a file is deleted, it is not actually deleted from the disk. Instead, the space that it was using is marked as “ready to be used in case that it is needed”. So if a new file is created in the disk, it is possible that it uses that physical space (or not).

That is the trick from which the file recovery applications work: trying to find those “ready to be used” sectors. And that is why the “low-level format” exists: in order to “zero” the disk and to avoid that files are recovered.

When you created the /tmp/zerofile.raw file, you started to write zeros in the disk. When the physical empty space was exhausted, the disk controller started to use the “ready to be used” sectors, and the zerofile wrote zeros on them, and the zeros were written in the VMDK file.

The good thing here is that when a VMDK file is created (from any format… in our case, it is VMDK), the qemu-img application does not write those zeros in the file that contains the disk, and that is how the storage space is reclaimed.

Advertisements

How to set the hostname from DHCP in Ubuntu 14.04

I have a lot of virtual servers, and I like preparing a “golden image” and instantiate it many times. One of the steps is to set the hostname for the host, from my private DHCP server. It usually works, but sometimes it fails and I didn’t know why. So I got tired of such indetermination and this time…

I learned how to set the hostname from DHCP in Ubuntu 14.04

Well, so I have my dnsmasq server that acts both as a dns server and as a dhcp server. I investigated how a host gets its hostname from DHCP and it seems that it occurs when the DHCP server sends it by default (option hostname).

I debugged the DHCP messages using this command from other server

# tcpdump -i eth1 -vvv port 67 or port 68

And I saw that my dhcp server was properly sending the hostname (dnsmasq sends it by default), and I realized that the problem was in the dhclient script. I googled a lot and I found some clever scripts that got the name from a nameserver and were started as hooks from the dhcpclient script. But if the dhcp protocol sets the hostname, why do I have to create other script to set the hostname???.

Finally I got this blog entry and I realized that it was a bug in the dhclient script: if there exists an old dhcp entry in /var/lib/dhcp/dhclient.eth0.leases (or eth1, etc.), it does not set the hostname.

At this point you have two options:

  • The easy: in /etc/rc.local include a line that removes that file
  • The clever (suggested in the blog entry): include a hook script that unsets the hostname
echo unset old_host_name >/etc/dhcp/dhclient-enter-hooks.d/unset_old_hostname