How to use a Dell PS Equallogic as a backend for OpenStack Cinder

I have written several posts on installing OpenStack Rocky from scratch. They all have the tag #openstack. In the previous posts we…

  1. Installed OpenStack Rocky (part 1part 2, and part 3).
  2. Installed the Horizon Dashboard and upgraded noVNC (install horizon).
  3. Installed Cinder and integrated it with Glance (in this post).

And now that I have a Dell PS Equallogic, I learned…

How to use a Dell PS Equallogic as a backend for OpenStack Cinder

In the post in which we installed Cinder, we used a single disk and LVM as a backend to store and to serve the Cinder volumes. But in my lab, we own a Dell PS Equallogic, which is far better storage than a Linux server as a SAN. So I’d prefer to use it as a backend for Cinder.

In the last post we did “the hard work” and now Cinder, and setting the new backend is easier now. We’ll follow the official documentation of the plugin for the Dell EMC PS Series in this link.

Prior to replacing the storage backend, it is advisable to remove any volume that was stored in the other backend. And also the images that were volume-backed. Otherwise, Cinder will understand that the old volumes are stored in the new backend, and your installation will run into a weird state. If you plan to keep both storage backends (e.g. because you still have running volumes), you can add the new storage backend and set it as default.

In this guide, I will assume that every volume and image has been removed from the OpenStack deployment. Moreover, I assume that the PS Equallogic is up and running.

The SAN is in a separate data network. The IP address of the SAN is 192.168.1.11, and every node and the controller have IP addresses in that range. In my particular case, the controller has the IP address 192.168.1.49.

Create a user in the SAN

We need a user for OpenStack to access the SAN. So I have created user “osuser” with password “SAN_PASS”.

SAN
osuser is restricted to use only come features

It is important to check the connectivity via ssh. We can check it from the front-end:

osuser@192.168.1.11's password:
Last login: Wed May 6 15:45:56 2020 from 192.168.1.49 on tty??


Welcome to Group Manager

Copyright 2001-2016 Dell Inc.

group1>

Add the new backend to Cinder

Now that we have the user, we can just add the new backend to Cinder. So we’ll add the following lines to /etc/cinder/cinder.conf

[dell]
volume_driver = cinder.volume.drivers.dell_emc.ps.PSSeriesISCSIDriver
san_ip = 192.168.1.11
san_login = osuser
san_password = SAN_PASS
eqlx_group_name = group1
eqlx_pool = default

# Optional settings
san_thin_provision = true
use_chap_auth = false
eqlx_cli_max_retries = 3
san_ssh_port = 22
ssh_conn_timeout = 15
ssh_min_pool_conn = 1
ssh_max_pool_conn = 5

# Enable the volume-backed image cache
image_volume_cache_enabled = True

These lines just configure the access to your SAN. Please adjust the parameters to your settings and preferences.

Pay attention to variable “image_volume_cache_enabled” that I have also set to True for this backend. This enables the creation of the images by means of volume cloning, instead of uploading the images to the volume each time (you can read more about this in the part of integrating Cinder and Glance, in the previous post). In the end, this mechanism boosts the boot process of the VMs.

Now, we have to update the value of the enable backends in the [DEFAULT] section of file /etc/cinder/cinder.conf. In my case, I have disabled the other backends (i.e. LVM):

enabled_backends = dell

Finally, you have to restart the cinder services:

# service cinder-volume restart
# service cinder-scheduler restart

Testing the integration

If the integration was fine, you can find messages like the next ones in /var/log/cinder-volume.log:

2020-05-06 14:17:40.688 17855 INFO cinder.volume.manager [req-74aabdaa-5f88-4576-801e-cf923265d23e - - - - -] Starting volume driver PSSeriesISCSIDriver (1.4.6)
2020-05-06 14:17:40.931 17855 INFO paramiko.transport [-] Connected (version 1.99, client OpenSSH_5.0)
2020-05-06 14:17:42.096 17855 INFO paramiko.transport [-] Authentication (password) successful!
2020-05-06 14:17:42.099 17855 INFO cinder.volume.drivers.dell_emc.ps [req-74aabdaa-5f88-4576-801e-cf923265d23e - - - - -] PS-driver: executing "cli-settings confirmation off".
...

As I prepared a separate block for osuser, when I get to the SAN via SSH, I get no volume:

osuser@192.168.1.11's password:
Last login: Wed May 6 16:02:01 2020 from 192.168.1.49 on tty??


Welcome to Group Manager

Copyright 2001-2016 Dell Inc.

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
group1>

Now I am creating a new volume:

# openstack volume create --size 2 testvol
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
(...)
| id | 7844286d-869d-49dc-9c91-d7af6c2123ab |
(...)

And now, if I get to the SAN CLI, I will see a new volume whose name coincides with the ID of the just created volume:

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
volume-7844286d 2GB 0 online read-write 0 Y
-869d-49dc-9c
91-d7af6c2123
ab

Verifying the usage as image cache for volume backed instances

We are creating a new image, and it should be stored in the SAN, because we set Cinder as the default storage backend for Glance, in the previous post.

# openstack image create --public --container-format bare --disk-format qcow2 --file ./bionic-server-cloudimg-amd64.img "Ubuntu 18.04"
...
# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+

And the SAN’s CLI shows the newly created volume:

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
volume-41caf4ad 1GB 0 online read-write 0 Y
-2bbd-4311-90
03-00d39d009a
9f

At this point, we will boot a new VM which is volume backed and makes use of that image.

Captura de pantalla 2020-05-06 a las 16.44.55

 

First, we will see that a new volume is being created, and the volume that corresponds to the image is connected to “None”.

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+----------+------+-----------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+----------+------+-----------------------------------+
| dd4071bf-48e9-484c-80cd-89f52a4fa442 | | creating | 4 | |
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | in-use | 1 | Attached to None on glance_store |
+--------------------------------------+--------------------------------------------+----------+------+-----------------------------------+

This is because Cinder is grabbing the image to be able to prepare it and upload it to the storage backend as a special image to clone from. We can check that folder /var/lib/cinder/conversion/ contains a temporary file, which corresponds to the image.

root@menoscloud:~# ls -l /var/lib/cinder/conversion/
total 337728
-rw------- 1 cinder cinder 0 may 6 14:32 tmp3wcPzD
-rw------- 1 cinder cinder 345833472 may 6 14:31 tmpDsKUzxmenoscloud@dell

Once obtained the image, Cinder will convert it to raw format (using qemu-img), into the volume that it has just created.

root@menoscloud:~# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| dd4071bf-48e9-484c-80cd-89f52a4fa442 | | downloading | 3 | |
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 1 | |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
root@menoscloud:~# ps -ef | grep qemu
root 18571 17855 0 14:32 ? 00:00:00 sudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpDsKUzxmenoscloud@dell /dev/sda
root 18573 18571 2 14:32 ? 00:00:00 /usr/bin/python2.7 /usr/bin/cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpDsKUzxmenoscloud@dell /dev/sda
root 18575 18573 17 14:32 ? 00:00:02 /usr/bin/qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpDsKUzxmenoscloud@dell /dev/sda

And once the conversion procedure has finished, we’ll see that it appears a new volume that stores the image (in raw format), and it is ready to be cloned for the next volume-backed instances:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------------------------+
| dd4071bf-48e9-484c-80cd-89f52a4fa442 | | in-use | 4 | Attached to eql1 on /dev/vda |
| 61903a21-02ad-4dc4-8709-a039e7a65815 | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 3 | |
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-------------------------------+

If we start a new volume-backed instance, we can check that it boots much faster than the first one, and it happens because in this case, Cinder skips the “qemu-img convert” phase and just clones the volume. It is possible to check the commands in file /var/log/cinder/cinder-volume.log:

(...)
2020-05-06 14:39:39.181 17855 INFO cinder.volume.drivers.dell_emc.ps [req-0a0a625f-eb5d-4826-82b2-f1bce93535cd 22a4facfd9794df1b8db1b4b074ae6db 50ab438534cd4c04b9ad341b803a1587 - - -] PS-driver: executing "volume select volume-61903a21-02ad-4dc4-8709-a039e7a65815 clone volume-f3226260-7790-427c-b3f0-da6ab1b2291b".
2020-05-06 14:39:40.333 17855 INFO cinder.volume.drivers.dell_emc.ps [req-0a0a625f-eb5d-4826-82b2-f1bce93535cd 22a4facfd9794df1b8db1b4b074ae6db 50ab438534cd4c04b9ad341b803a1587 - - -] PS-driver: executing "volume select volume-f3226260-7790-427c-b3f0-da6ab1b2291b size 4G no-snap".
(...)

Obviously, you can check that the volumes have been created in the backend, by using the SAN’s CLI:

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
volume-41caf4ad 1GB 0 online read-write 0 Y
-2bbd-4311-90
03-00d39d009a
9f
volume-dd4071bf 4GB 0 online read-write 1 Y
-48e9-484c-80
cd-89f52a4fa4
42
volume-61903a21 3GB 0 online read-write 0 Y
-02ad-4dc4-87
09-a039e7a658
15
volume-f3226260 4GB 0 online read-write 1 Y
-7790-427c-b3
f0-da6ab1b229
1b

How to install Cinder in OpenStack Rocky and make it work with Glance

I have written several posts on installing OpenStack Rocky from scratch. They all have the tag #openstack. In the previous posts we…

  1. Installed OpenStack Rocky (part 1, part 2, and part 3).
  2. Installed the Horizon Dashboard and upgraded noVNC (install horizon).

So we have a working installation of the basic services (keystone, glance, neutron, compute, etc.). And now it is time to learn

How to install Cinder in OpenStack Rocky and make it work with Glance

Cinder is very straightforward to install using the basic mechanism: having a standard Linux server that will serve block devices as a SAN, by providing iSCSI endpoints. This server will use tgtadm and iscsiadm as the basic tools, and a backend for the block devices.

The other problem is to integrate the cinder server with an external SAN device, such as a Dell Equallogic SAN. Cinder has some plugins and each of them has its own problems.

In this post, we are following the standard cinder installation guide for Ubuntu (in this link), and what we’ll get is the standard SAN server with an LVM back-end for the block devices. Then we will integrate it with Glance (to be able to use Cinder as a storage for the OpenStack images) and we’ll learn a bit about how they work.

Installing Cinder

In the first place we are creating a database for Cinder:

mysql -u root -p <<< "CREATE DATABASE cinder;\
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' IDENTIFIED BY 'CINDER_DBPASS';\
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' IDENTIFIED BY 'CINDER_DBPASS';"

Now we are creating a user for Cinder and to create the service in OpenStack (we create both v2 and v3):

$ openstack user create --domain default --password "CINDER_PASS" cinder
$ openstack role add --project service --user cinder admin
$ openstack service create --name cinderv2 --description "OpenStack Block Storage" volumev2
$ openstack service create --name cinderv3 --description "OpenStack Block Storage" volumev3

Once we have the user and the service, we create the proper endpoints for both v2 and v3:

$ openstack endpoint create --region RegionOne volumev2 public http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev2 internal http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev2 admin http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 public http://controller:8776/v3/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 internal http://controller:8776/v3/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 admin http://controller:8776/v3/%\(project_id\)s

And now we are ready to install the cinder packages

$ apt install -y cinder-api cinder-scheduler

Once the packages are installed, we need to update the configuration file /etc/cinder/cinder.conf. The content will be something like the next:

[DEFAULT]
rootwrap_config = /etc/cinder/rootwrap.conf
api_paste_confg = /etc/cinder/api-paste.ini
iscsi_helper = tgtadm
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
auth_strategy = keystone
state_path = /var/lib/cinder
lock_path = /var/lock/cinder
volumes_dir = /var/lib/cinder/volumes
enabled_backends = lvm
transport_url = rabbit://openstack:RABBIT_PASS@controller
my_ip = 192.168.1.241
glance_api_servers = http://controller:9292
[database]
connection = mysql+pymysql://cinder:CINDER_DBPASS@controller/cinder
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = cinder
password = CINDER_PASS
[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
[lvm]
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group = cinder-volumes
iscsi_protocol = iscsi
iscsi_helper = tgtadm
image_volume_cache_enabled = True

You must adapt this file to your configuration. In special, the passwords of rabbit, cinder database and cinder service, and the IP address of the cinder server (which is stored in my_ip variable). In my case, I am using the same server as in the previous posts.

Using this configuration we suppose that cinder will use tgtadm to create the iSCSI endpoints, and the backend will be LVM.

Now we just have to add the following lines to file /etc/nova/nova.conf to enable cinder in OpenStack via nova-api:

[cinder]
os_region_name=RegionOne

Then, sync the cinder database by executing the next command:

$ su -s /bin/sh -c "cinder-manage db sync" cinder

And restart the related services:

$ service nova-api restart
$ service cinder-scheduler restart
$ service apache2 restart

Preparing the LVM backend

Now that we have configured cinder, we need a backend for the block devices. In our case, it is LVM. If you want to know a bit more about the concepts that we are using at this point and what we are doing, you can check my previous post in this link.

Now we are installing the LVM tools:

$ apt install lvm2 thin-provisioning-tools

LVM needs a partition or a whole disk to work. You can use any partition or disk (or even a file that can be used for testing purposes, as described in the section “testlab” in this link). In our case, we are using the whole disk /dev/vdb.

According to our settings, OpenStack expects to be able to use an existing LVM volume group with the name “cinder-volumes”. So we need to create it

$ pvcreate /dev/vdb
$ vgcreate cinder-volumes /dev/vdb

Once we have our volume group ready, we can install the cinder-volume service.

$ apt install cinder-volume

And that’s all about the installation of cinder. The last part will work because we included section [lvm] in /etc/cinder/cinder.conf and “enabled_backends = lvm”.

Verifying that cinder works

To verify that cinder works, we’ll just create one volume:

# openstack volume create --size 2 checkvol
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| consistencygroup_id | None                                 |
| created_at          | 2020-05-05T09:52:47.000000           |
| description         | None                                 |
| encrypted           | False                                |
| id                  | aadc24eb-ec1c-4b84-b2b2-8ea894b50417 |
| migration_status    | None                                 |
| multiattach         | False                                |
| name                | checkvol                             |
| properties          |                                      |
| replication_status  | None                                 |
| size                | 2                                    |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| status              | creating                             |
| type                | None                                 |
| updated_at          | None                                 |
| user_id             | 8c67fb57d70646d9b57beb83cc04a892     |
+---------------------+--------------------------------------+

After a while, we can check that the volume has been properly created and it is available.

# openstack volume list
+--------------------------------------+----------+-----------+------+-----------------------------+
| ID                                   | Name     | Status    | Size | Attached to                 |
+--------------------------------------+----------+-----------+------+-----------------------------+
| aadc24eb-ec1c-4b84-b2b2-8ea894b50417 | checkvol | available |    2 |                             |
+--------------------------------------+----------+-----------+------+-----------------------------+

If we are curious, we can check what happened in the backend:

# lvs -o name,lv_size,data_percent,thin_count
  LV                                          LSize  Data%  #Thins
  cinder-volumes-pool                         19,00g 0,00        1
  volume-aadc24eb-ec1c-4b84-b2b2-8ea894b50417  2,00g 0,00

We can see that we have a volume with the name volume-aadc24eb-ec1c-4b84-b2b2-8ea894b50417, with the ID that coincides with the ID of the volume that we have just created. Moreover, we can see that it has occupied 0% of space because it is thin-provisioned (i.e. it will only use the effective stored data like in qcow2 or vmdk virtual disk formats).

Integrating Cinder with Glance

The integration of Cinder with Glance can be made in two different parts:

  1. Using Cinder as a storage backend for the Images.
  2. Using Cinder as a cache for the Images of the VMs that are volume-based.

It may seem that it is the same, but it is not. To be able to identify what feature we want, we need to know how OpenStack works, and also acknowledge that Cinder and Glance are independent services.

Using Cinder as a backend for Glance

In OpenStack, when a VM is image-based (i.e. it does not create a volume), nova-compute will transfer the image to the host in which it has to be used. It happens regarding it is from a filesystem backend (i.e. stored in /var/lib/glance/images/), it is stored in swift (it is transferred using HTTP), or it is stored in cinder (it is transferred using iSCSI). So using Cinder as a storage backend for the Images will prevent the need of having extra storage in the controller. But it will not be useful for anything more.

Image based VM
Booting an image-based VM, which does not create a volume.

If you start a volume-based image, OpenStack will create a volume for your new VM (using cinder). In this case, cinder is very inefficient, because it connects to the existing volume, downloads it, and converts it to raw format and dumps it to the new volume (i.e. using qemu-img convert -O raw -f qcow2 …). So the creation of the image is extremely slow.

There is one way to boost this procedure by using efficient tools: if the image is stored in raw format and the owner is the same user that tries to use it (check image_upload_use_internal_tenant), and the allowed_direct_url_schemes option is properly set, the new volume will be created by cloning the volume that contains the image and resizing it using the backend tools (i.e. lvm cloning and resizing capabilities). That means that the new volume will be almost instantly created, and we’ll try to use this mechanism, if possible.

To enable cinder as a backend for Glance, you need to add the following lines to file /etc/glance/glance-api.conf

[glance_store]
stores = file,http,cinder
default_store = cinder
filesystem_store_datadir = /var/lib/glance/images/
cinder_store_auth_address = http://controller:5000/v3
cinder_store_user_name = cinder
cinder_store_password = CINDER_PASS
cinder_store_project_name = service

We are just adding “Cinder” as one of the mechanisms for Glance (apart from the others, like file or HTTP). In our example, we are setting Cinder as the default storage backend, because the horizon dashboard has not any way to select where to store the images.

It is possible to set any other storage backend as default storage backend, but then you’ll need to create the volumes by hand and execute Glance low-level commands such as “glance location-add <image-uuid> –url cinder://<volume-uuid>”. The mechanism can be seen in the official guide.

Variables cinder_store_user_name, cinder_store_password, and cinder_store_project_name are used to set the owner of the images that are uploaded to Cinder via Glance. And they are used only if setting image_upload_use_internal_tenant is set to True in the Cinder configuration.

And now we need to add the next lines to section [DEFAULT] in /etc/cinder/cinder.conf:

allowed_direct_url_schemes = cinder
image_upload_use_internal_tenant = True

Finally, you need to restart the services:

# service cinder-volume restart
# service cinder-scheduler restart
# service glance-api restart

It may seem a bit messy, but Cinder and Glance are configured in this way. I feel that if you use the configuration that I propose in this post, you’ll get the integration as expected.

Verifying the integration

We are storing a new image, but we’ll store it as a volume this time. Moreover, we will store it in raw format, to be able to use the “direct_url” method to clone the volumes instead of downloading them:

# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
(...)
# qemu-img convert -O raw bionic-server-cloudimg-amd64.img bionic-server-cloudimg-amd64.raw
# openstack image create --public --container-format bare --disk-format raw --file ./bionic-server-cloudimg-amd64.raw "Ubuntu 18.04"+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(...)
| id | 7fd1c4b4-783e-41cb-800d-4e259c22d1ab |
| name | Ubuntu 18 |
(...)
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

And now we can check what happened under the hood:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID                                   | Name                                       | Status    | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| 13721f57-c706-47c9-9114-f4b011f32ea2 | image-7fd1c4b4-783e-41cb-800d-4e259c22d1ab | available |    3 |             |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            11,57  16,11
  volume-13721f57-c706-47c9-9114-f4b011f32ea2 cinder-volumes Vwi-a-tz--  3,00g cinder-volumes-pool        73,31

We can see that a new volume has been created with the name “image-7fd1c4b4…”, which corresponds to the just created image ID. The volume has an ID 13721f57…, and LVM has a new logical volume with the name volume-13721f57 that corresponds to that new volume.

Now if we create a new VM that uses that image, we will notice that the creation of the VM is very quick (and this is because we used the “allowed_direct_url_schemes” method).

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| 219d9f92-ce17-4da6-96fa-86a04e460eb2 | | in-use | 4 | Attached to u1 on /dev/vda |
| 13721f57-c706-47c9-9114-f4b011f32ea2 | image-7fd1c4b4-783e-41cb-800d-4e259c22d1ab | available | 3 | |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
cinder-volumes-pool cinder-volumes twi-aotz-- 19,00g 11,57 16,11
volume-13721f57-c706-47c9-9114-f4b011f32ea2 cinder-volumes Vwi-a-tz-- 3,00g cinder-volumes-pool 73,31
volume-219d9f92-ce17-4da6-96fa-86a04e460eb2 cinder-volumes Vwi-aotz-- 4,00g cinder-volumes-pool volume-13721f57-c706-47c9-9114-f4b011f32ea2 54,98

Under the hood, we can see that the creation of the volume for the instance (id 219d9f92…) has a reference is LVM to the original volume (id 13721f57…) that corresponds to the volume of the image.

Cinder as an image-volume storage cache

If you do not need Cinder as a storage backend (either because you are happy with the filesystem backend or you are storing the images in swift, etc.), it is also possible to use it to boost the boot process of VMs that create a volume as the main disk.

Cinder enables a mechanism to be used as an image-volume storage cache. It means that when an image is being used for a volume-based VM, it will be stored in a special volume regarding it was stored in cinder or not. Then the volume that contains the image will be cloned and resized (using the backend tools; i.e. lvm cloning and resizing capabilities) for subsequent VMs that use that image.

During the first use of the image, it will be downloaded (either from the filesystem, cinder, swift, or wherever the image is stored), converted to raw format, and stored as a volume. The next uses of the image will work as using the “direct_url” method (i.e. cloning the volume).

To enable this mechanism, you need to get the id of project “server” and the id of user “cinder”:

# openstack project list
+----------------------------------+---------+
| ID | Name |
+----------------------------------+---------+
| 50ab438534cd4c04b9ad341b803a1587 | service |
(...)
+----------------------------------+---------+
# openstack user list
+----------------------------------+-----------+
| ID | Name |
+----------------------------------+-----------+
| 22a4facfd9794df1b8db1b4b074ae6db | cinder |
(...)
+----------------------------------+-----------+

Then you need to add the following lines to the [DEFAULT] section in file /etc/cinder/cinder.conf (configuring your ids):

cinder_internal_tenant_project_id = 50ab438534cd4c04b9ad341b803a1587
cinder_internal_tenant_user_id = 22a4facfd9794df1b8db1b4b074ae6db

And add the following line to the section of the backend that will act as a cache (in our case [lvm])

[lvm]
...
image_volume_cache_enabled = True

Then you just need to restart the cinder services:

root@menoscloud:~# service cinder-volume restart
root@menoscloud:~# service cinder-scheduler restart

Testing the cache

In this case, I am creating a new image, which is in qcow2 format:

# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
(...)
# openstack image create --public --container-format bare --disk-format qcow2 --file ./bionic-server-cloudimg-amd64.img "Ubuntu 18.04 - qcow2"

Under the hood, OpenStack created a volume (id c36de566…) for the corresponding image (id 50b84eb0…), that can be seen in LVM:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            1,70   11,31
  volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz--  1,00g cinder-volumes-pool        32,21

Now we create a VM (which is volume-based). And during the “block device mapping” phase, we can inspect what is happening under the hood:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| ID                                   | Name                                       | Status      | Size | Attached to |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| 60c19e3c-3960-4fe7-9895-0426070b3e88 |                                            | downloading |    3 |             |
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available   |    1 |             |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            3,83   12,46
  volume-60c19e3c-3960-4fe7-9895-0426070b3e88 cinder-volumes Vwi-aotz--  3,00g cinder-volumes-pool        13,54
  volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz--  1,00g cinder-volumes-pool        32,21
# ps -ef | grep qemu
root      9681  9169  0 09:42 ?        00:00:00 sudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88
root      9682  9681  0 09:42 ?        00:00:00 /usr/bin/python2.7 /usr/bin/cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88
root      9684  9682 29 09:42 ?        00:00:13 /usr/bin/qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88inder has created a new volume (id 26aee3ee...) and it is converting the content of the image (id 50b84eb0...) to raw format into that new volume.

Cinder created a new volume (id 60c19e3c…) that does not correspond with the size of the flavor I used (4Gb). And cinder is converting one image to that new volume. That image was previously downloaded from Cinder, from volume c36de566… to folder /var/lib/cinder/conversion mapping the iSCSI device and dumping its contents. In case that the image was not Cinder backed, it would have been downloaded using the appropriate mechanisms (e.g. HTTP or file copy from /var/lib/glance/images).

After a while (depending on the conversion process), the VM will start and we can inspect the backend…

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| 91d51bc2-e33b-4b97-b91d-3a8655f88d0f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 3 | |
| 60c19e3c-3960-4fe7-9895-0426070b3e88 | | in-use | 4 | Attached to q1 on /dev/vda |
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
cinder-volumes-pool cinder-volumes twi-aotz-- 19,00g 13,69 17,60
volume-60c19e3c-3960-4fe7-9895-0426070b3e88 cinder-volumes Vwi-aotz-- 4,00g cinder-volumes-pool 56,48
volume-91d51bc2-e33b-4b97-b91d-3a8655f88d0f cinder-volumes Vwi-a-tz-- 3,00g cinder-volumes-pool volume-60c19e3c-3960-4fe7-9895-0426070b3e88 73,31
volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz-- 1,00g cinder-volumes-pool 32,21

Now we can see that there is a new volume (id 91d51bc2…) which has been associated to the image (id 50b84eb0…). And that volume will be cloned using the LVM mechanisms in the next uses of the image for volume-backend instances. Now if you start new instances, they will boot much faster.

How to dynamically create on-demand services to respond to incoming TCP connections

Some time ago I had the problem of dynamically start virtual machines, when an incoming connection was received in a port. The exact problem was to have a VM that was powered off, and start it whenever an incoming ssh connection was received, and then forward the network traffic to that VM to serve the ssh request. In this way, I could have a server in a cloud provider (e.g. Amazon), and not to spend money if I was not using it.

This problem has been named “the sleeping beauty”, because of the tale. It was like having a sleeping virtual infrastructure (i.e. the sleeping beauty), that will be awaken as an incoming connection (i.e. the kiss) was received from the user (i.e. the prince).

Now I have figured out how to solve that problem, and that is why this time I learned

How to dynamically create on-demand services to respond to incoming TCP connections

The way to solve it is very straightforward, as it is fully based in the ​socat application.

socat is “a relay for bidirectional data transfer between two independent data channels”. And it can be used to forward the traffic received in a port, to other pair IP:PORT.

A simple example is:

$ socat tcp-listen:10000 tcp:localhost:22 &

And now we can SSH to localhost in the following way:

$ ssh localhost -p 10000

The interesting thing is that socat is able to exec one command upon receiving a connection (using the destination of the relay the address type EXEC or SYSTEM). But the most important thing is that socat will stablish a communication using stdin and stdout.

So it is possible to make this funny thing:

$ socat tcp-listen:10000 SYSTEM:'echo "hello world"' &
[1] 11136
$ wget -q -O- http://localhost:10000
hello world
$
[1]+ Done socat tcp-listen:10000 SYSTEM:'echo "hello world"'

Now that we know that the communication is stablished using stdin and stdout, we can somehow abuse of socat and try this even funnier thing:

$ socat tcp-listen:10000 SYSTEM:'echo "$(date)" >> /tmp/sshtrack.log; socat - "TCP:localhost:22"' &
[1] 27421
$ ssh localhost -p 10000 cat /tmp/sshtrack.log
mié feb 27 14:36:45 CET 2019
$
[1]+ Done socat tcp-listen:10000 SYSTEM:'echo "$(date)" >> /tmp/sshtrack.log; socat - "
TCP:localhost:22"'

The effect is that we can execute commands and redirect the connection to an arbitrary IP:PORT.

Now, it is easy to figure how to dinamically spawn servers to serve the incoming TCP resquests. An example to spawn a one-shot web server in port 8080 to serve requests in port 10000 is the next:

$ socat tcp-listen:10000 SYSTEM:'(echo "hello world" | nc -l -p 8080 -q 1 > /dev/null &) ; socat - "TCP:localhost:8080"' &
[1] 31586
$ wget -q -O- http://localhost:10000
hello world
$
[1]+ Done socat tcp-listen:10000 SYSTEM:'(echo "hello world" | nc -l -p 8080 -q 1 > /dev/null &) ; socat - "TCP:localhost:8080"'

And now you can customize your scripts to create the effective servers on demand.

The sleeping beauty application

I have used these proofs of concept to create the sleeping-beauty application. It is open source, and you can get it in github.

The sleeping beauty is a system that helps to implement serverless infrastructures: you have the servers aslept (or not even created), and they are awaken (or created) as they are needed. Later, they go back to sleep (or they are disposed).

In the sleeping-beauty, you can configure services that listen to a port, and the commands that socat should use to start, check the status or stop the effective services. Moreover it implements an idle-detection mechanism that is able to check whether the effective service is idle, and if it has been idle for a period of time, stop it to save resources.

Example: In the description of the use case, the command to be used to start the service, will contact Amazon AWS and will start a VM. The command to stop the service will contact Amazon AWS to stop the VM. And the command to check whether the service is idle or not will ssh the VM and execute the command ‘who’.

How to install OpenStack Rocky – part 1

This is the first post of a series in which I am describing the installation of a OpenStack site using the latest distribution at writting time: Rocky.

My project is very ambitious, because I have a 2 virtualization nodes (that have different GPU each), 10GbE, a lot of memory and disk, and I want to offer the GPUs to the VMs. The front-end is a 24 core server, with 32Gb. RAM and 6 Tb. disk, with 4 network ports (2x10GbE+2x1GbE), that will also act as block devices server.

We’ll be using Ubuntu 18.04 LTS for all the nodes, and I’ll try to follow the official documentation. But I will try to be very straighforward in the configuration… I want to make it work and I will try to explain how things work, instead of tunning the configuration.

How to install OpenStack Rocky – part 1

My setup for the OpenStack installation is the next one:

horsemen

In the figure I have annotated the most relevant data to identify the servers: the IP addresses for each interface, which is the volume server and the virtualization nodes that will share their GPUs.

At the end, the server horsemen will host the next services: keystone, glance, cinder, neutron and horizon. On the other side, fh01 and fh02 will host the services compute and neutron-related.

In each of the servers we need a network interface (eno1, enp1s0f1 and enp1f0f1) which is intended for administration purposes (i.e. the network 192.168.1.1/24). That interface has a gateway (192.168.1.220) that enables the access to the internet via NAT. From now on, we’ll call these interfaces as the “private interfaces“.

We need an additional interface that is connected to the provider network (i.e. to the internet). That network will hold the publicly routable IP addresses. In my case, I have the network 158.42.1.1/24, that is publicly routable. It is a flat network with its own network services (e.g. gateway, nameservers, dhcp servers, etc.). From now on, we’ll call these interfaces the “public interfaces“.

One note on “eno4” interface in horsemen: I am using this interface for accessing horizon. In case that you do not have a different interface, you can use interface aliasing or providing the IP address in the ifupdown “up” script.

An extra note on “eno2” interface in horsemen: It is an extra interface in the node. It will be left unused during this installation, but it will be configured to work in bond mode with “eno1”.

IMPORTANT: In the whole series of posts, I am using the passwords as they appear: RABBIT_PASS, NOVA_PASS, NOVADB_PASS, etc. You should change it, according to a secure password policy, but they are set as-is to ease understanding the installation. Anyway, most of them will be fine if you have an isolated network and the service listen only in the management network (e.g. mysql will only be configured to listen in the management interface).

Some words on the Openstack network (concepts)

The basic installation of Openstack considers two networks: the provider network and the management network. The provider network means “the network that is attached to the provider” i.e. the network where the VMs can have publicly routable IP addresses.  On the other side, the management network is a private network that is (probably) isolated from the provider one. The computers in that network have private IP addresses (e.g. 192.168.1.1/16, 10.0.0.0/8, 172.16.0.0/12).

In the basic deployment of Openstack, it considers that the controller node do not need to have a routable IP address. Instead, it can be accessed by the admin by the management network. That is why the “eno3” interface has not an IP address.

In the Openstack architecture, horizon is a separated piece, so horizon is the one that will need a routable IP address. As I want to install horizon also in the controller, I need a routable IP address and that is why I put a publicly routable IP address in “eno4” (158.42.1.1).

In my case, I had a spare network interface (eno4) but if you do not have one of them, you can create a bridge and add your “interface connected to the provider network” (i.e. “eno3”) to that bridge, and then add a publicly routable IP address to the bridge.

IMPORTANT: this is not part of my installation. Although it may be part of your installation.

brctl addbr br-publicbrctl addif br-public eno3ip link set dev br-public upip addr add 158.42.1.1/16 dev br-public

Configuring the network

One of the first things that we need to set up is to configure the network for the different servers.

Ubuntu 18.04 has moved to netplan, but at the time of writting this text, I have not found any mechanism to get an interface UP and not providing an IP address for it, using netplan. Moreover, when trying to use ifupdown, netplan is not totally disabled and interferes with options such as dns-nameservers for the static addresses. At the end I need to install ifupdown and make a mix of configuration using both netplan and ifupdown.

It is very important to disable IPv6 for any of the servers, because if not, you will probably face a problem when using the public IP addresses. You can read more in this link.

To disable IPv6, we need to execute the following lines in all the servers (as root):

# sysctl -w net.ipv6.conf.all.disable_ipv6=1
# sysctl -w net.ipv6.conf.default.disable_ipv6=1
# cat >> /etc/default/grub << EOT
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1"
GRUB_CMDLINE_LINUX="ipv6.disable=1"
EOT
# update-grub

We’ll disable IPv6 for the current session, and persist it by disabling at boot time. If you have customized your grub, you should check the options that we are setting.

Configuring the network in “horsemen”

You need to install ifupdown, to have an anonymous interface connected to the internet, to include neutron-related services

# apt update && apt install -y ifupdown

Edit the file /etc/network/interfaces and adjust it with a content like the next one:

auto eno3
iface eno3 inet manual
up ip link set dev $IFACE up
down ip link set dev $IFACE down

Now edit the file /etc/netplan/50-cloud-init.yaml to set the private IP address:

network:
  ethernets:
    eno4:
      dhcp4: true
    eno1:
      addresses:
        - 192.168.1.240/24
      gateway4: 192.168.1.221
      nameservers:
        addresses: [ 192.168.1.220, 8.8.8.8 ]
  version: 2

When you save these settings, you can issue the next commands:

# netplan generate
# netplan apply

Now we’ll edit the file /etc/hosts, and will add the addresses of each server. My file is the next one:

127.0.0.1       localhost.localdomain   localhost
192.168.1.240   horsemen controller
192.168.1.241   fh01
192.168.1.242   fh02

I have removed the entry 127.0.1.1 because I read thet it may interfer. And I also removed all the crap about IPv6 because I disabled it.

Configuring the network in “fh01” and “fh02”

Here is the short version of the configuration of fh01:

# apt install -y ifupdown
# cat >> /etc/network/interfaces << EOT
auto enp1s0f0
iface enp1s0f0 inet manual
up ip link set dev $IFACE up
down ip link set dev $IFACE down
EOT

Here is my file /etc/netplan/50-cloud-init.yaml for fh01:

network:
  ethernets:
    enp1s0f1:
      addresses:
        - 192.168.1.241/24
      gateway4: 192.168.1.221
    nameservers:
    addresses: [ 192.168.1.220, 8.8.8.8 ]
  version: 2

Here is the file /etc/hosts for fh01:

127.0.0.1 localhost.localdomain localhost
192.168.1.240 horsemen controller
192.168.1.241 fh01
192.168.1.242 fh02

You can export this configuration to fh02 by adjusting the IP address in the /etc/netplan/50-cloud-init.yaml file.

Reboot and test

Now it is a good moment to reboot your systems and test that the network is properly configured. If it is not, please make sure that it is working because. Otherwise the next steps will have no sense.

From each of the hosts you should be able to ping to the outern world and ping between the hosts. These are the tests from horsemen, but you need to be able to repeat them from each of the servers.

root@horsemen# ping -c 2 www.google.es
PING www.google.es (172.217.17.3) 56(84) bytes of data.
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=1 ttl=54 time=7.26 ms
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=2 ttl=54 time=7.26 ms

--- www.google.es ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 7.262/7.264/7.266/0.002 ms
root@horsemen# ping -c 2 fh01
PING fh01 (192.168.1.241) 56(84) bytes of data.
64 bytes from fh01 (192.168.1.241): icmp_seq=1 ttl=64 time=0.180 ms
64 bytes from fh01 (192.168.1.241): icmp_seq=2 ttl=64 time=0.113 ms

--- fh01 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.113/0.146/0.180/0.035 ms
root@horsemen# ping -c 2 fh02
PING fh02 (192.168.1.242) 56(84) bytes of data.
64 bytes from fh02 (192.168.1.242): icmp_seq=1 ttl=64 time=0.223 ms
64 bytes from fh02 (192.168.1.242): icmp_seq=2 ttl=64 time=0.188 ms

--- fh02 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1027ms
rtt min/avg/max/mdev = 0.188/0.205/0.223/0.022 ms

Prerrequisites for Openstack in the server (horsemen)

Remember: for simplicity, I will use obvious passwords like SERVICE_PASS or SERVICEDB_PASS (e.g. RABBIT_PASS). You should change these passwords, although most of them will be fine if you have an isolated network and the service listen only in the management network.

First of all, we are installing the prerrequisites. We will start with the NTP server, that will keep the hour synchronized between the controller (horsemen) and the virtualization servers (fh01 and fh02). We’ll instal chrony (recommended in the Openstack configuration) and allow any computer in our private network to connect to this new NTP server:

# apt install -y chrony
cat >> /etc/chrony/chrony.conf << EOT
allow 192.168.1.0/24
EOT
# service chrony restart

Now we are installing and configuring the database server (we’ll use mariadb as it is used in the basic installation):

# apt install mariadb-server python-pymysql
# cat > /etc/mysql/mariadb.conf.d/99-openstack.cnf << EOT
[mysqld]
bind-address = 192.168.1.240

default-storage-engine = innodb
innodb_file_per_table = on
max_connections = 4096
collation-server = utf8_general_ci
character-set-server = utf8
EOT
#service mysql restart

Now we are installing rabbitmq, that will be used to orchestrate message interchange between services (please change RABBIT_PASS).

# apt install rabbitmq-server
# rabbitmqctl add_user openstack "RABBIT_PASS"
# rabbitmqctl set_permissions openstack ".*" ".*" ".*"

At this moment, we have to install memcached and configure it to listen in the management interface:

# apt install memcached

# echo "-l 192.168.1.240" >> /etc/memcached.conf

# service memcached restart

Finally we need to install etcd and configure it to be accessible by openstack

# apt install etcd
# cat >> /etc/default/etcd << EOT
ETCD_NAME="controller"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-01"
ETCD_INITIAL_CLUSTER="controller=http://192.168.1.240:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.240:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.240:2379"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.240:2379"
EOT
# systemctl enable etcd
# systemctl start etcd

Now we are ready to with the installation of the OpenStack Rocky packages… (continue to part 2)

How to compact a QCOW2 or a VMDK file

When you create a Virtual Machine (VM), you usually have the option of use a format that reserves the whole size of the disk (e.g. RAW), or to use a format that grows according to the used space in the disk (e.g. QCOW or VMDK).

The problem is that the space actually used in the disk grows as the disk files are written, but it is not decreased as they are deleted. But if you writed a lot of files and you deleted after they were needed, you’d probably have a lot of space reserved for the VMDK file, while that space is not actually used. I wanted to reclaim that space, to move the VMs using less space, and so this time…

I learned how to compact a VMDK file (the same method applies to QCOW2)

The method is, in fact, very easy… you simply have to re-encode the file using the same output format. If you have your original-disk.vmdk file, you simply have to issue a command like this one:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

And that will make the magic (easy, isn’t it?).

But if you want to compact it more, you can claim more space from the disk before re-enconding the disk. First, I’d go to the solution and then I’ll explain it:

If the VM is a linux-based, you can boot it and create a zero-file, and once the file has exhausted the disk, delete it:

$ dd if=/dev/zero of=/tmp/zerofile.raw...$ rm /tmp/zerofile.raw

If the VM is a Windows-based, you can get the command sdelete from Microsoft website decompress it and execute the following commandline:

c:\> sdelete -z c:

Now you can power off the VM and issue the qemu-img command. You’ll get a file that correspond to only the used space in the disk:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

Explanation

(Disclaimer: Please take into account that this is a simple and conceptual explanation)

If you knew about how the disks are managed, you’d probably know that when a file is deleted, it is not actually deleted from the disk. Instead, the space that it was using is marked as “ready to be used in case that it is needed”. So if a new file is created in the disk, it is possible that it uses that physical space (or not).

That is the trick from which the file recovery applications work: trying to find those “ready to be used” sectors. And that is why the “low-level format” exists: in order to “zero” the disk and to avoid that files are recovered.

When you created the /tmp/zerofile.raw file, you started to write zeros in the disk. When the physical empty space was exhausted, the disk controller started to use the “ready to be used” sectors, and the zerofile wrote zeros on them, and the zeros were written in the VMDK file.

The good thing here is that when a VMDK file is created (from any format… in our case, it is VMDK), the qemu-img application does not write those zeros in the file that contains the disk, and that is how the storage space is reclaimed.