How to compact a QCOW2 or a VMDK file

When you create a Virtual Machine (VM), you usually have the option of use a format that reserves the whole size of the disk (e.g. RAW), or to use a format that grows according to the used space in the disk (e.g. QCOW or VMDK).

The problem is that the space actually used in the disk grows as the disk files are written, but it is not decreased as they are deleted. But if you writed a lot of files and you deleted after they were needed, you’d probably have a lot of space reserved for the VMDK file, while that space is not actually used. I wanted to reclaim that space, to move the VMs using less space, and so this time…

I learned how to compact a VMDK file (the same method applies to QCOW2)

The method is, in fact, very easy… you simply have to re-encode the file using the same output format. If you have your original-disk.vmdk file, you simply have to issue a command like this one:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

And that will make the magic (easy, isn’t it?).

But if you want to compact it more, you can claim more space from the disk before re-enconding the disk. First, I’d go to the solution and then I’ll explain it:

If the VM is a linux-based, you can boot it and create a zero-file, and once the file has exhausted the disk, delete it:

$ dd if=/dev/zero of=/tmp/zerofile.raw...$ rm /tmp/zerofile.raw

If the VM is a Windows-based, you can get the command sdelete from Microsoft website decompress it and execute the following commandline:

c:\> sdelete -z c:

Now you can power off the VM and issue the qemu-img command. You’ll get a file that correspond to only the used space in the disk:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk


(Disclaimer: Please take into account that this is a simple and conceptual explanation)

If you knew about how the disks are managed, you’d probably know that when a file is deleted, it is not actually deleted from the disk. Instead, the space that it was using is marked as “ready to be used in case that it is needed”. So if a new file is created in the disk, it is possible that it uses that physical space (or not).

That is the trick from which the file recovery applications work: trying to find those “ready to be used” sectors. And that is why the “low-level format” exists: in order to “zero” the disk and to avoid that files are recovered.

When you created the /tmp/zerofile.raw file, you started to write zeros in the disk. When the physical empty space was exhausted, the disk controller started to use the “ready to be used” sectors, and the zerofile wrote zeros on them, and the zeros were written in the VMDK file.

The good thing here is that when a VMDK file is created (from any format… in our case, it is VMDK), the qemu-img application does not write those zeros in the file that contains the disk, and that is how the storage space is reclaimed.