How to install Cinder in OpenStack Rocky and make it work with Glance

I have written several posts on installing OpenStack Rocky from scratch. They all have the tag #openstack. In the previous posts we…

  1. Installed OpenStack Rocky (part 1, part 2, and part 3).
  2. Installed the Horizon Dashboard and upgraded noVNC (install horizon).

So we have a working installation of the basic services (keystone, glance, neutron, compute, etc.). And now it is time to learn

How to install Cinder in OpenStack Rocky and make it work with Glance

Cinder is very straightforward to install using the basic mechanism: having a standard Linux server that will serve block devices as a SAN, by providing iSCSI endpoints. This server will use tgtadm and iscsiadm as the basic tools, and a backend for the block devices.

The other problem is to integrate the cinder server with an external SAN device, such as a Dell Equallogic SAN. Cinder has some plugins and each of them has its own problems.

In this post, we are following the standard cinder installation guide for Ubuntu (in this link), and what we’ll get is the standard SAN server with an LVM back-end for the block devices. Then we will integrate it with Glance (to be able to use Cinder as a storage for the OpenStack images) and we’ll learn a bit about how they work.

Installing Cinder

In the first place we are creating a database for Cinder:

mysql -u root -p <<< "CREATE DATABASE cinder;\
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' IDENTIFIED BY 'CINDER_DBPASS';\
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' IDENTIFIED BY 'CINDER_DBPASS';"

Now we are creating a user for Cinder and to create the service in OpenStack (we create both v2 and v3):

$ openstack user create --domain default --password "CINDER_PASS" cinder
$ openstack role add --project service --user cinder admin
$ openstack service create --name cinderv2 --description "OpenStack Block Storage" volumev2
$ openstack service create --name cinderv3 --description "OpenStack Block Storage" volumev3

Once we have the user and the service, we create the proper endpoints for both v2 and v3:

$ openstack endpoint create --region RegionOne volumev2 public http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev2 internal http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev2 admin http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 public http://controller:8776/v3/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 internal http://controller:8776/v3/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 admin http://controller:8776/v3/%\(project_id\)s

And now we are ready to install the cinder packages

$ apt install -y cinder-api cinder-scheduler

Once the packages are installed, we need to update the configuration file /etc/cinder/cinder.conf. The content will be something like the next:

[DEFAULT]
rootwrap_config = /etc/cinder/rootwrap.conf
api_paste_confg = /etc/cinder/api-paste.ini
iscsi_helper = tgtadm
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
auth_strategy = keystone
state_path = /var/lib/cinder
lock_path = /var/lock/cinder
volumes_dir = /var/lib/cinder/volumes
enabled_backends = lvm
transport_url = rabbit://openstack:RABBIT_PASS@controller
my_ip = 192.168.1.241
glance_api_servers = http://controller:9292
[database]
connection = mysql+pymysql://cinder:CINDER_DBPASS@controller/cinder
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = cinder
password = CINDER_PASS
[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
[lvm]
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group = cinder-volumes
iscsi_protocol = iscsi
iscsi_helper = tgtadm
image_volume_cache_enabled = True

You must adapt this file to your configuration. In special, the passwords of rabbit, cinder database and cinder service, and the IP address of the cinder server (which is stored in my_ip variable). In my case, I am using the same server as in the previous posts.

Using this configuration we suppose that cinder will use tgtadm to create the iSCSI endpoints, and the backend will be LVM.

Now we just have to add the following lines to file /etc/nova/nova.conf to enable cinder in OpenStack via nova-api:

[cinder]
os_region_name=RegionOne

Then, sync the cinder database by executing the next command:

$ su -s /bin/sh -c "cinder-manage db sync" cinder

And restart the related services:

$ service nova-api restart
$ service cinder-scheduler restart
$ service apache2 restart

Preparing the LVM backend

Now that we have configured cinder, we need a backend for the block devices. In our case, it is LVM. If you want to know a bit more about the concepts that we are using at this point and what we are doing, you can check my previous post in this link.

Now we are installing the LVM tools:

$ apt install lvm2 thin-provisioning-tools

LVM needs a partition or a whole disk to work. You can use any partition or disk (or even a file that can be used for testing purposes, as described in the section “testlab” in this link). In our case, we are using the whole disk /dev/vdb.

According to our settings, OpenStack expects to be able to use an existing LVM volume group with the name “cinder-volumes”. So we need to create it

$ pvcreate /dev/vdb
$ vgcreate cinder-volumes /dev/vdb

Once we have our volume group ready, we can install the cinder-volume service.

$ apt install cinder-volume

And that’s all about the installation of cinder. The last part will work because we included section [lvm] in /etc/cinder/cinder.conf and “enabled_backends = lvm”.

Verifying that cinder works

To verify that cinder works, we’ll just create one volume:

# openstack volume create --size 2 checkvol
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| consistencygroup_id | None                                 |
| created_at          | 2020-05-05T09:52:47.000000           |
| description         | None                                 |
| encrypted           | False                                |
| id                  | aadc24eb-ec1c-4b84-b2b2-8ea894b50417 |
| migration_status    | None                                 |
| multiattach         | False                                |
| name                | checkvol                             |
| properties          |                                      |
| replication_status  | None                                 |
| size                | 2                                    |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| status              | creating                             |
| type                | None                                 |
| updated_at          | None                                 |
| user_id             | 8c67fb57d70646d9b57beb83cc04a892     |
+---------------------+--------------------------------------+

After a while, we can check that the volume has been properly created and it is available.

# openstack volume list
+--------------------------------------+----------+-----------+------+-----------------------------+
| ID                                   | Name     | Status    | Size | Attached to                 |
+--------------------------------------+----------+-----------+------+-----------------------------+
| aadc24eb-ec1c-4b84-b2b2-8ea894b50417 | checkvol | available |    2 |                             |
+--------------------------------------+----------+-----------+------+-----------------------------+

If we are curious, we can check what happened in the backend:

# lvs -o name,lv_size,data_percent,thin_count
  LV                                          LSize  Data%  #Thins
  cinder-volumes-pool                         19,00g 0,00        1
  volume-aadc24eb-ec1c-4b84-b2b2-8ea894b50417  2,00g 0,00

We can see that we have a volume with the name volume-aadc24eb-ec1c-4b84-b2b2-8ea894b50417, with the ID that coincides with the ID of the volume that we have just created. Moreover, we can see that it has occupied 0% of space because it is thin-provisioned (i.e. it will only use the effective stored data like in qcow2 or vmdk virtual disk formats).

Integrating Cinder with Glance

The integration of Cinder with Glance can be made in two different parts:

  1. Using Cinder as a storage backend for the Images.
  2. Using Cinder as a cache for the Images of the VMs that are volume-based.

It may seem that it is the same, but it is not. To be able to identify what feature we want, we need to know how OpenStack works, and also acknowledge that Cinder and Glance are independent services.

Using Cinder as a backend for Glance

In OpenStack, when a VM is image-based (i.e. it does not create a volume), nova-compute will transfer the image to the host in which it has to be used. It happens regarding it is from a filesystem backend (i.e. stored in /var/lib/glance/images/), it is stored in swift (it is transferred using HTTP), or it is stored in cinder (it is transferred using iSCSI). So using Cinder as a storage backend for the Images will prevent the need of having extra storage in the controller. But it will not be useful for anything more.

Image based VM
Booting an image-based VM, which does not create a volume.

If you start a volume-based image, OpenStack will create a volume for your new VM (using cinder). In this case, cinder is very inefficient, because it connects to the existing volume, downloads it, and converts it to raw format and dumps it to the new volume (i.e. using qemu-img convert -O raw -f qcow2 …). So the creation of the image is extremely slow.

There is one way to boost this procedure by using efficient tools: if the image is stored in raw format and the owner is the same user that tries to use it (check image_upload_use_internal_tenant), and the allowed_direct_url_schemes option is properly set, the new volume will be created by cloning the volume that contains the image and resizing it using the backend tools (i.e. lvm cloning and resizing capabilities). That means that the new volume will be almost instantly created, and we’ll try to use this mechanism, if possible.

To enable cinder as a backend for Glance, you need to add the following lines to file /etc/glance/glance-api.conf

[glance_store]
stores = file,http,cinder
default_store = cinder
filesystem_store_datadir = /var/lib/glance/images/
cinder_store_auth_address = http://controller:5000/v3
cinder_store_user_name = cinder
cinder_store_password = CINDER_PASS
cinder_store_project_name = service

We are just adding “Cinder” as one of the mechanisms for Glance (apart from the others, like file or HTTP). In our example, we are setting Cinder as the default storage backend, because the horizon dashboard has not any way to select where to store the images.

It is possible to set any other storage backend as default storage backend, but then you’ll need to create the volumes by hand and execute Glance low-level commands such as “glance location-add <image-uuid> –url cinder://<volume-uuid>”. The mechanism can be seen in the official guide.

Variables cinder_store_user_name, cinder_store_password, and cinder_store_project_name are used to set the owner of the images that are uploaded to Cinder via Glance. And they are used only if setting image_upload_use_internal_tenant is set to True in the Cinder configuration.

And now we need to add the next lines to section [DEFAULT] in /etc/cinder/cinder.conf:

allowed_direct_url_schemes = cinder
image_upload_use_internal_tenant = True

Finally, you need to restart the services:

# service cinder-volume restart
# service cinder-scheduler restart
# service glance-api restart

It may seem a bit messy, but Cinder and Glance are configured in this way. I feel that if you use the configuration that I propose in this post, you’ll get the integration as expected.

Verifying the integration

We are storing a new image, but we’ll store it as a volume this time. Moreover, we will store it in raw format, to be able to use the “direct_url” method to clone the volumes instead of downloading them:

# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
(...)
# qemu-img convert -O raw bionic-server-cloudimg-amd64.img bionic-server-cloudimg-amd64.raw
# openstack image create --public --container-format bare --disk-format raw --file ./bionic-server-cloudimg-amd64.raw "Ubuntu 18.04"+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(...)
| id | 7fd1c4b4-783e-41cb-800d-4e259c22d1ab |
| name | Ubuntu 18 |
(...)
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

And now we can check what happened under the hood:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID                                   | Name                                       | Status    | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| 13721f57-c706-47c9-9114-f4b011f32ea2 | image-7fd1c4b4-783e-41cb-800d-4e259c22d1ab | available |    3 |             |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            11,57  16,11
  volume-13721f57-c706-47c9-9114-f4b011f32ea2 cinder-volumes Vwi-a-tz--  3,00g cinder-volumes-pool        73,31

We can see that a new volume has been created with the name “image-7fd1c4b4…”, which corresponds to the just created image ID. The volume has an ID 13721f57…, and LVM has a new logical volume with the name volume-13721f57 that corresponds to that new volume.

Now if we create a new VM that uses that image, we will notice that the creation of the VM is very quick (and this is because we used the “allowed_direct_url_schemes” method).

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| 219d9f92-ce17-4da6-96fa-86a04e460eb2 | | in-use | 4 | Attached to u1 on /dev/vda |
| 13721f57-c706-47c9-9114-f4b011f32ea2 | image-7fd1c4b4-783e-41cb-800d-4e259c22d1ab | available | 3 | |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
cinder-volumes-pool cinder-volumes twi-aotz-- 19,00g 11,57 16,11
volume-13721f57-c706-47c9-9114-f4b011f32ea2 cinder-volumes Vwi-a-tz-- 3,00g cinder-volumes-pool 73,31
volume-219d9f92-ce17-4da6-96fa-86a04e460eb2 cinder-volumes Vwi-aotz-- 4,00g cinder-volumes-pool volume-13721f57-c706-47c9-9114-f4b011f32ea2 54,98

Under the hood, we can see that the creation of the volume for the instance (id 219d9f92…) has a reference is LVM to the original volume (id 13721f57…) that corresponds to the volume of the image.

Cinder as an image-volume storage cache

If you do not need Cinder as a storage backend (either because you are happy with the filesystem backend or you are storing the images in swift, etc.), it is also possible to use it to boost the boot process of VMs that create a volume as the main disk.

Cinder enables a mechanism to be used as an image-volume storage cache. It means that when an image is being used for a volume-based VM, it will be stored in a special volume regarding it was stored in cinder or not. Then the volume that contains the image will be cloned and resized (using the backend tools; i.e. lvm cloning and resizing capabilities) for subsequent VMs that use that image.

During the first use of the image, it will be downloaded (either from the filesystem, cinder, swift, or wherever the image is stored), converted to raw format, and stored as a volume. The next uses of the image will work as using the “direct_url” method (i.e. cloning the volume).

To enable this mechanism, you need to get the id of project “server” and the id of user “cinder”:

# openstack project list
+----------------------------------+---------+
| ID | Name |
+----------------------------------+---------+
| 50ab438534cd4c04b9ad341b803a1587 | service |
(...)
+----------------------------------+---------+
# openstack user list
+----------------------------------+-----------+
| ID | Name |
+----------------------------------+-----------+
| 22a4facfd9794df1b8db1b4b074ae6db | cinder |
(...)
+----------------------------------+-----------+

Then you need to add the following lines to the [DEFAULT] section in file /etc/cinder/cinder.conf (configuring your ids):

cinder_internal_tenant_project_id = 50ab438534cd4c04b9ad341b803a1587
cinder_internal_tenant_user_id = 22a4facfd9794df1b8db1b4b074ae6db

And add the following line to the section of the backend that will act as a cache (in our case [lvm])

[lvm]
...
image_volume_cache_enabled = True

Then you just need to restart the cinder services:

root@menoscloud:~# service cinder-volume restart
root@menoscloud:~# service cinder-scheduler restart

Testing the cache

In this case, I am creating a new image, which is in qcow2 format:

# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
(...)
# openstack image create --public --container-format bare --disk-format qcow2 --file ./bionic-server-cloudimg-amd64.img "Ubuntu 18.04 - qcow2"

Under the hood, OpenStack created a volume (id c36de566…) for the corresponding image (id 50b84eb0…), that can be seen in LVM:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            1,70   11,31
  volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz--  1,00g cinder-volumes-pool        32,21

Now we create a VM (which is volume-based). And during the “block device mapping” phase, we can inspect what is happening under the hood:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| ID                                   | Name                                       | Status      | Size | Attached to |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| 60c19e3c-3960-4fe7-9895-0426070b3e88 |                                            | downloading |    3 |             |
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available   |    1 |             |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            3,83   12,46
  volume-60c19e3c-3960-4fe7-9895-0426070b3e88 cinder-volumes Vwi-aotz--  3,00g cinder-volumes-pool        13,54
  volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz--  1,00g cinder-volumes-pool        32,21
# ps -ef | grep qemu
root      9681  9169  0 09:42 ?        00:00:00 sudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88
root      9682  9681  0 09:42 ?        00:00:00 /usr/bin/python2.7 /usr/bin/cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88
root      9684  9682 29 09:42 ?        00:00:13 /usr/bin/qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88inder has created a new volume (id 26aee3ee...) and it is converting the content of the image (id 50b84eb0...) to raw format into that new volume.

Cinder created a new volume (id 60c19e3c…) that does not correspond with the size of the flavor I used (4Gb). And cinder is converting one image to that new volume. That image was previously downloaded from Cinder, from volume c36de566… to folder /var/lib/cinder/conversion mapping the iSCSI device and dumping its contents. In case that the image was not Cinder backed, it would have been downloaded using the appropriate mechanisms (e.g. HTTP or file copy from /var/lib/glance/images).

After a while (depending on the conversion process), the VM will start and we can inspect the backend…

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| 91d51bc2-e33b-4b97-b91d-3a8655f88d0f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 3 | |
| 60c19e3c-3960-4fe7-9895-0426070b3e88 | | in-use | 4 | Attached to q1 on /dev/vda |
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
cinder-volumes-pool cinder-volumes twi-aotz-- 19,00g 13,69 17,60
volume-60c19e3c-3960-4fe7-9895-0426070b3e88 cinder-volumes Vwi-aotz-- 4,00g cinder-volumes-pool 56,48
volume-91d51bc2-e33b-4b97-b91d-3a8655f88d0f cinder-volumes Vwi-a-tz-- 3,00g cinder-volumes-pool volume-60c19e3c-3960-4fe7-9895-0426070b3e88 73,31
volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz-- 1,00g cinder-volumes-pool 32,21

Now we can see that there is a new volume (id 91d51bc2…) which has been associated to the image (id 50b84eb0…). And that volume will be cloned using the LVM mechanisms in the next uses of the image for volume-backend instances. Now if you start new instances, they will boot much faster.

How to move from a linear disk to an LVM disk and join the two disks into an LVM-like RAID-0

I had the recent need for adding a disk to an existing installation of Ubuntu, to make the / folder bigger. In such a case, I have two possibilities: to move my whole system to a new bigger disk (and e.g. dispose of the original disk) or to convert my disk to an LVM volume and add a second disk to enable the volume to grow. The first case was the subject of a previous post, but this time I learned…

How to move from a linear disk to an LVM disk and join the two disks into an LVM-like RAID-0

The starting point is simple:

  • I have one 14 Gb. disk (/dev/vda) with a single partition that is mounted in / (The disk has a GPT table and UEFI format and so it has extra partitions that we’ll keep as they are).
  • I have an 80 Gb. brand new disk (/dev/vdb)
  • I want to have one 94 Gb. volume built from the two disks
root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
├─vda1 252:1 0 13.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 80G 0 disk /mnt
vdc 252:32 0 4G 0 disk [SWAP]

The steps are the next:

  1. Creating a boot partition in /dev/vdb (this is needed because Grub cannot boot from LVM and needs an ext or VFAT partition)
  2. Format the boot partition and put the content of the current /boot folder
  3. Create an LVM volume using the extra space in /dev/vdb and initialize it using an ext4 filesystem
  4. Put the contents of the current / folder into the new partition
  5. Update grub to boot from the new disk
  6. Update the mount point for our system
  7. Reboot (and check)
  8. Add the previous disk to the LVM volume.

Let’s start…

Separate the /boot partition

When installing an LVM system, it is needed to have a /boot partition in a common format (e.g. ext2 or ext4), because GRUB cannot read from LVM. Then GRUB reads the contents of that partition and starts the proper modules to read the LVM volumes.

So we need to create the /boot partition. In our case, we are using ext2 format, because has no digest (we do not need it for the content of /boot) and it is faster. We are using 1 Gb. for the /boot partition, but 512 Mb. will probably be enough:

root@somove:~# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (1-4, default 1):
First sector (2048-167772159, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-167772159, default 167772159): +1G

Created a new partition 1 of type 'Linux' and of size 1 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

root@somove:~# mkfs.ext2 /dev/vdb1
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 24618637-d2d4-45fe-bf83-d69d37f769d0
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

Now we’ll make a mount point for this partition, mount the partition and copy the contents of the current /boot folder to that partition:

root@somove:~# mkdir /mnt/boot
root@somove:~# mount /dev/vdb1 /mnt/boot/
root@somove:~# cp -ax /boot/* /mnt/boot/

Create an LVM volume in the extra space of /dev/vdb

First, we will create a new partition for our LVM system, and we’ll get the whole free space:

root@somove:~# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type
p primary (1 primary, 0 extended, 3 free)
e extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (2-4, default 2):
First sector (2099200-167772159, default 2099200):
Last sector, +sectors or +size{K,M,G,T,P} (2099200-167772159, default 167772159):

Created a new partition 2 of type 'Linux' and of size 79 GiB.

Command (m for help): w
The partition table has been altered.
Syncing disks.

Now we will create a Physical Volume, a Volume Group and the Logical Volume for our root filesystem, using the new partition:

root@somove:~# pvcreate /dev/vdb2
Physical volume "/dev/vdb2" successfully created.
root@somove:~# vgcreate rootvg /dev/vdb2
Volume group "rootvg" successfully created
root@somove:~# lvcreate -l +100%free -n rootfs rootvg
Logical volume "rootfs" created.

If you want to learn about LVM to better understand what we are doing, you can read my previous post.

Now we are initializing the filesystem of the new /dev/rootvg/rootfs volume using ext4, and then we’ll copy the existing filesystem except from the special folders and the /boot folder (which we have separated in the other partition):

root@somove:~# mkfs.ext4 /dev/rootvg/rootfs
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 20708352 4k blocks and 5177344 inodes
Filesystem UUID: 47b4b698-4b63-4933-98d9-f8904ad36b2e
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

root@somove:~# mkdir /mnt/rootfs
root@somove:~# mount /dev/rootvg/rootfs /mnt/rootfs/
root@somove:~# rsync -aHAXx --delete --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/boot/*,/lost+found} / /mnt/rootfs/

Update the system to boot from the new /boot partition and the LVM volume

At this point we have our /boot partition (/dev/vdb1) and the / filesystem (/dev/rootvg/rootfs). Now we need to prepare GRUB to boot using these new resources. And here comes the magic…

root@somove:~# mount --bind /dev /mnt/rootfs/dev/
root@somove:~# mount --bind /sys /mnt/rootfs/sys/
root@somove:~# mount -t proc /proc /mnt/rootfs/proc/
root@somove:~# chroot /mnt/rootfs/

We are binding the special mount points /dev and /sys to the same folders in the new filesystem which is mounted in /mnt/rootfs. We are also creating the /proc mount point which holds the information about the processes. You can find some more information about why this is needed in my previous post on chroot and containers.

Intuitively, we are somehow “in the new filesystem” and now we can update things as if we had already booted into it.

At this point, we need to update the mount point in /etc/fstab to mount the proper disks once the system boots. So we are getting the UUIDs for our partitions:

root@somove:/# blkid
/dev/vda1: LABEL="cloudimg-rootfs" UUID="135ecb53-0b91-4a6d-8068-899705b8e046" TYPE="ext4" PARTUUID="b27490c5-04b3-4475-a92b-53807f0e1431"
/dev/vda14: PARTUUID="14ad2c62-0a5e-4026-a37f-0e958da56fd1"
/dev/vda15: LABEL="UEFI" UUID="BF99-DB4C" TYPE="vfat" PARTUUID="9c37d9c9-69de-4613-9966-609073fba1d3"
/dev/vdb1: UUID="24618637-d2d4-45fe-bf83-d69d37f769d0" TYPE="ext2"
/dev/vdb2: UUID="Uzt1px-ANds-tXYj-Xwyp-gLYj-SDU3-pRz3ed" TYPE="LVM2_member"
/dev/mapper/rootvg-rootfs: UUID="47b4b698-4b63-4933-98d9-f8904ad36b2e" TYPE="ext4"
/dev/vdc: UUID="3377ec47-a0c9-4544-b01b-7267ea48577d" TYPE="swap"

And we are updating /etc/fstab to mount /dev/mapper/rootvg-rootfs as the / folder. But we need to mount partition /dev/vdb1 in /boot. Using our example, the /etc/fstab file will be this one:

UUID="47b4b698-4b63-4933-98d9-f8904ad36b2e" / ext4 defaults 0 0
UUID="24618637-d2d4-45fe-bf83-d69d37f769d0" /boot ext2 defaults 0 0
LABEL=UEFI /boot/efi vfat defaults 0 0
UUID="3377ec47-a0c9-4544-b01b-7267ea48577d" none swap sw,comment=cloudconfig 0 0

We are using the UUID to mount / and /boot folders because the devices may change their names or location and that may lead to breaking our system.

And now we are ready to mount our /boot partition, update grub, and to install it in the /dev/vda disk (because we are keeping both disks).

root@somove:/# mount /boot
root@somove:/# update-grub
Generating grub configuration file ...
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Found linux image: /boot/vmlinuz-4.15.0-43-generic
Found initrd image: /boot/initrd.img-4.15.0-43-generic
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Found Ubuntu 18.04.1 LTS (18.04) on /dev/vda1
done
root@somove:/# grub-install /dev/vda
Installing for i386-pc platform.
Installation finished. No error reported.

Reboot and check

We are almost done, and now we are exiting the chroot and rebooting

root@somove:/# exit
root@somove:~# reboot

And the result should be the next one:

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
├─vda1 252:1 0 13.9G 0 part
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 80G 0 disk
├─vdb1 252:17 0 1G 0 part /boot
└─vdb2 252:18 0 79G 0 part
└─rootvg-rootfs 253:0 0 79G 0 lvm /
vdc 252:32 0 4G 0 disk [SWAP]

root@somove:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 395M 676K 394M 1% /run
/dev/mapper/rootvg-rootfs 78G 993M 73G 2% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/vdb1 1008M 43M 915M 5% /boot
/dev/vda15 105M 3.6M 101M 4% /boot/efi
tmpfs 395M 0 395M 0% /run/user/1000

We have our / system mounted from the new LVM Logical Volume /dev/rootvg/rootfs, the /boot partition from /dev/vdb1, and the /boot/efi from the existing partition (just in case that we need it).

Add the previous disk to the LVM volume

Here we are facing the easier part, which is to integrate the original /dev/vda1 volume in the LVM volume.

Once we have double-checked that we had copied every file from the original / folder in /dev/vda1, we can initialize it for using it in LVM:

WARNING: This step wipes the content of /dev/vda1.

root@somove:~# pvcreate /dev/vda1
WARNING: ext4 signature detected on /dev/vda1 at offset 1080. Wipe it? [y/n]: y
Wiping ext4 signature on /dev/vda1.
Physical volume "/dev/vda1" successfully created.

Finally, we can integrate the new partition in our volume group and extend the logical volume to use the free space:

root@somove:~# vgextend rootvg /dev/vda1
Volume group "rootvg" successfully extended
root@somove:~# lvextend -l +100%free /dev/rootvg/rootfs
Size of logical volume rootvg/rootfs changed from <79.00 GiB (20223 extents) to 92.88 GiB (23778 extents).
Logical volume rootvg/rootfs successfully resized.
root@somove:~# resize2fs /dev/rootvg/rootfs
resize2fs 1.44.1 (24-Mar-2018)
Filesystem at /dev/rootvg/rootfs is mounted on /; on-line resizing required
old_desc_blocks = 10, new_desc_blocks = 12
The filesystem on /dev/rootvg/rootfs is now 24348672 (4k) blocks long.

And now we have the new 94 Gb. / folder which is made from /dev/vda1 and /dev/vdb2:

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
├─vda1 252:1 0 13.9G 0 part
│ └─rootvg-rootfs 253:0 0 92.9G 0 lvm /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 80G 0 disk
├─vdb1 252:17 0 1G 0 part /boot
└─vdb2 252:18 0 79G 0 part
└─rootvg-rootfs 253:0 0 92.9G 0 lvm /
vdc 252:32 0 4G 0 disk [SWAP]
root@somove:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 395M 676K 394M 1% /run
/dev/mapper/rootvg-rootfs 91G 997M 86G 2% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/vdb1 1008M 43M 915M 5% /boot
/dev/vda15 105M 3.6M 101M 4% /boot/efi
tmpfs 395M 0 395M 0% /run/user/1000

(optional) Having the /boot partition to /dev/vda

In case we wanted to have the /boot partition in /dev/vda, the procedure will be a bit different:

  1. Instead of creating the LVM volume in /dev/vdb1, I would prefer to create a single partition /dev/vdb1 (ext4) which does not imply the separation of /boot and /.
  2. Once created /dev/vdb1, copy the filesystem in /dev/vda1 to /dev/vdb1 and prepare to boot from /dev/vdb1 (chroot, adjust mount points, update-grub, grub-install…).
  3. Boot from the new partition and wipe the original /dev/vda1 partition.
  4. Create a partition /dev/vda1 for the new /boot and initialize it using ext2, copy the contents of /boot according to the instructions in this post.
  5. Create a partition /dev/vda2, create the LVM volume, initialize it and copy the contents of /dev/vdb1 except from /boot
  6. Prepare to boot from /dev/vda (chroot, adjust mount points, mount /boot, update-grub, grub-install…)
  7. Boot from the new /root+LVM / and decide whether you want to add /dev/vdb to the LVM volume or not.

Using this procedure, you will get from linear to LVM with a single disk. And then you can decide whether to make the LVM to grow or not. Moreover you may decide whether to create a LVM-Raid(1,5,…) with the new or other disks.

How to move an existing installation of Ubuntu to another disk

Under some circumstances, we may have the need of moving a working installation of Ubuntu to another disk. The most common case is when your current disk runs out of space and you want to move it to a bigger one. But you could also want to move to an SSD disk or to create an LVM raid…

So this time I learned…

How to move an existing installation of Ubuntu to another disk

I have a 14Gb disk that contains my / partition (vda), and I want to move to a new 80Gb disk (vdb).

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
└─vda1 252:1 0 13.9G 0 part /
vdb 252:16 0 80G 0 disk
vdc 252:32 0 4G 0 disk [SWAP]

First of all, I will create a partition for my / system in /dev/vdb.

root@somove:~# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-167772159, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-167772159, default 167772159):

Created a new partition 1 of type 'Linux' and of size 80 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

NOTE: The inputs from the user are: n for the new partition and the defaults (i.e. return) for any setting to get the whole disk. Then w to write the partition table.

Now that I have the new partition, we’ll create the filesystem (ext4):

root@somove:~# mkfs.ext4 /dev/vdb1
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 20971264 4k blocks and 5242880 inodes
Filesystem UUID: ea7ee2f5-749e-4e74-bcc3-2785297291a4
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

We have to transfer the content of the running filesystem to the new disk. But first, we’ll make sure that any other mount point except for / is unmounted (to avoid copying files in other disks).:

root@somove:~# umount -a
umount: /run/user/1000: target is busy.
umount: /sys/fs/cgroup/unified: target is busy.
umount: /sys/fs/cgroup: target is busy.
umount: /: target is busy.
umount: /run: target is busy.
umount: /dev: target is busy.
root@somove:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=2006900k,nr_inodes=501725,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=403912k,mode=755)
/dev/vda1 on / type ext4 (rw,relatime,data=ordered)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=403908k,mode=700,uid=1000,gid=1000)
root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
└─vda1 252:1 0 13.9G 0 part /
vdb 252:16 0 80G 0 disk
└─vdb1 252:17 0 80G 0 part
vdc 252:32 0 4G 0 disk [SWAP]

Now we will create a mount point for the new filesystem and we’ll copy everything from / to it, except for the special folders (i.e. /tmp, /sys, /dev, etc.). Once completed, we’ll create the Linux special folders:

root@somove:~# mkdir /mnt/vdb1
root@somove:~# mount /dev/vdb1 /mnt/vdb1/
root@somove:~# rsync -aHAXx --delete --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} / /mnt/vdb1/

Instead of using rsync, we could use cp -ax /bin /etc /home /lib /lib64 …, but you need make sure that all folders and files are copied. You also need to make sure that the special folders are created by running mkdir /mnt/vdb1/{boot,mnt,proc,run,tmp,dev,sys}. The rsync version is easier to control and to understand.

Now that we have the same directory tree, we just need to make the magic of chroot to prepare the new disk:

root@somove:~# mount --bind /dev /mnt/vdb1/dev
root@somove:~# mount --bind /sys /mnt/vdb1/sys
root@somove:~# mount -t proc /proc /mnt/vdb1/proc
root@somove:~# chroot /mnt/vdb1/

We need to make sure that the new system will try to mount in / the new partition (i.e. /dev/vdb1), but we cannot use /dev/vdb1 id because if we remove the other disk, it will modify its device name to /dev/vda1. So we are using the UUID of the disk. To get it, we can use blkid:

root@somove:/# blkid
/dev/vda1: UUID="135ecb53-0b91-4a6d-8068-899705b8e046" TYPE="ext4"
/dev/vdb1: UUID="eb8d215e-d186-46b8-bd37-4b244cbb8768" TYPE="ext4"

And now we have to update file /etc/fstab to mount the proper UUID in the / folder. The new file /etc/fstab for our example is the next:

UUID="eb8d215e-d186-46b8-bd37-4b244cbb8768" / ext4 defaults 0 0

At this point, we need to update grub to match our disks (it will get the UUID or labels), and install it in the new disk:

root@somove:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-43-generic
Found initrd image: /boot/initrd.img-4.15.0-43-generic
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Found Ubuntu 18.04.1 LTS (18.04) on /dev/vda1
done
root@somove:/# grub-install /dev/vdb
Installing for i386-pc platform.
Installation finished. No error reported.

WARNING: In case we get error “error: will not proceed with blocklists.”, please go to the end part of this post.

WARNING: If you plan to keep the original disk in its place (e.g. a Virtual Machine in Amazon or OpenStack), you must install grub in /dev/vda. Otherwise, it will boot the previous system.

Finally, you can exit from chroot, power off, remove the old disk, and boot using the new one. The result will be next:

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:16 0 80G 0 disk
└─vda1 252:17 0 80G 0 part /
vdc 252:32 0 4G 0 disk [SWAP]

What if we get error “error: will not proceed with blocklists.”

If we get this error (only if we get this error), we’ll need to wipe the gap between the init of the disk and the partition and then we’ll be able to install grub in the disk.

WARNING: make sure that you know what you are doing, or the disk is new, because this can potentialle erase the data of /dev/vdb.

$ grub-install /dev/vdb
Installing for i386-pc platform.
grub-install: warning: Attempting to install GRUB to a disk with multiple partition labels. This is not supported yet..
grub-install: warning: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..
grub-install: error: will not proceed with blocklists.

In this case, we need to check the partition table of /dev/vdb

root@somove:/# fdisk -l /dev/vdb
Disk /dev/vdb: 80 GiB, 85899345920 bytes, 167772160 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
/dev/vdb1 2048 167772159 167770112 80G 83 Linux

And now we will put zeros in /dev/vdb (skipping the first sector where the partition table is stored), up to the sector before to where our partition starts (in our case, partition /dev/vdb1 starts in sector 2048 and so we will zero 2047 sectors):

root@somove:/# dd if=/dev/zero of=/dev/vdb seek=1 count=2047
2047+0 records in
2047+0 records out
1048064 bytes (1.0 MB, 1.0 MiB) copied, 0.0245413 s, 42.7 MB/s

If this was the problem, now you should be able to install grub:

root@somove:/# grub-install /dev/vdb
Installing for i386-pc platform.
Installation finished. No error reported.

How to use LVM

LVM stands for Logical Volume Manager, and it provides logical volume management for Linux kernels. It enables us to manage multiple physical disks from a single manager and to create logical volumes that take profit from having multiple disks (e.g. RAID, thin provisioning, volumes that span across disks, etc.).

I needed using LVM multiple times, and in special it is of interest when dealing with LVM backed cinder in OpenStack.

So this time I learned…

How to use LVM

LVM is installed in multiple Linux distros and they are usually LVM-aware, to be able to boot from LVM volumes.

For the purpose of this post, we’ll consider LVM as a mechanism to manage multiple physical disks and to create logical volumes on them. Then LVM will show the operating system the logical volumes as if they were disks.

Testlab

LVM is intended for physical disks (e.g. /dev/sda, /dev/sdb, etc.). But we are creating a test lab to avoid the need of buying physical disks.

We are creating 4 fake disks of size 256Mb each. To create each of them we simply create a file of the proper size (that will store the data), and then we attach that file to a loop device:

root@s1:/tmp# dd if=/dev/zero of=/tmp/fake-disk-256.0 bs=1M count=256
...
root@s1:/tmp# dd if=/dev/zero of=/tmp/fake-disk-256.1 bs=1M count=256
...
root@s1:/tmp# dd if=/dev/zero of=/tmp/fake-disk-256.2 bs=1M count=256
...
root@s1:/tmp# dd if=/dev/zero of=/tmp/fake-disk-256.3 bs=1M count=256
...
root@s1:/tmp# losetup /dev/loop0 ./fake-disk-256.0
root@s1:/tmp# losetup /dev/loop1 ./fake-disk-256.1
root@s1:/tmp# losetup /dev/loop2 ./fake-disk-256.2
root@s1:/tmp# losetup /dev/loop3 ./fake-disk-256.3

And now you have 4 working disks for our tests:

root@s1:/tmp# fdisk -l
Disk /dev/loop0: 256 MiB, 268435456 bytes, 524288 sectors
...
Disk /dev/loop1: 256 MiB, 268435456 bytes, 524288 sectors
...
Disk /dev/loop2: 256 MiB, 268435456 bytes, 524288 sectors
...
Disk /dev/loop3: 256 MiB, 268435456 bytes, 524288 sectors
...
Disk /dev/sda: (...)

For the system, these devices can be used as regular disks (e.g. format them, mount, etc.):

root@s1:/tmp# mkfs.ext4 /dev/loop0
mke2fs 1.44.1 (24-Mar-2018)
...
Writing superblocks and filesystem accounting information: done

root@s1:/tmp# mkdir -p /tmp/mnt/disk0
root@s1:/tmp# mount /dev/loop0 /tmp/mnt/disk0/
root@s1:/tmp# cd /tmp/mnt/disk0/
root@s1:/tmp/mnt/disk0# touch this-is-a-file
root@s1:/tmp/mnt/disk0# ls -l
total 16
drwx------ 2 root root 16384 Apr 16 12:35 lost+found
-rw-r--r-- 1 root root 0 Apr 16 12:38 this-is-a-file
root@s1:/tmp/mnt/disk0# cd /tmp/
root@s1:/tmp# umount /tmp/mnt/disk0

Concepts of LVM

LVM has simple actors:

  • Physical volume: which is a physical disk.
  • Volume group: which is a set of physical disks managed together.
  • Logical volume: which is a block device.

Logical Volumes (LV) are stored in Volume Groups (VG), which are backed by Physical Volumes (PV).

PVs are managed using pv* commands (e.g. pvscan, pvs, pvcreate, etc.). VGs are managed using vg* commands (e.g. vgs, vgdisplay, vgextend, etc.). LGs are managed using lg* commands (e.g. lvdisplay, lvs, lvextend, etc.).

Simple workflow with LVM

To have an LVM system, we have to first initialize a physical volume. That is somehow “initializing a disk in LVM format”, and that wipes the content of the disk:

root@s1:/tmp# pvcreate /dev/loop0
WARNING: ext4 signature detected on /dev/loop0 at offset 1080. Wipe it? [y/n]: y
Wiping ext4 signature on /dev/loop0.
Physical volume "/dev/loop0" successfully created.

Now we have to create a volume group (we’ll call it test-vg):

root@s1:/tmp# vgcreate test-vg /dev/loop0
Volume group "test-vg" successfully created

And finally, we can create a logical volume

root@s1:/tmp# lvcreate -l 100%vg --name test-vol test-vg
Logical volume "test-vol" created.

And now we have a simple LVM system that is built from one single physical disk (/dev/loop0) that contains one single volume group (test-vg) that holds a single logical volume (test-vol).

Examining things in LVM

  • The commands to examine PVs: pvs and pvdisplay. Each of them offers different information. pvscan also exists, but it is not needed in current versions of LVM.
  • The commands to examine VGs: vgs and vgdisplay. Each of them offers different information. vgscan also exists, but it is not needed in current versions of LVM.
  • The commands to examine PVs: lvs and lvdisplay. Each of them offers different information. lvscan also exists, but it is not needed in current versions of LVM.

Each command has several options, which we are not exploring here. We are just using the commands and we’ll present some options in the next examples.

At this time we should have a PV, a VG, and LV, and we can see them by using pvs, vgs and lvs:

root@s1:/tmp# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 test-vg lvm2 a-- 252.00m 0
root@s1:/tmp# vgs
VG #PV #LV #SN Attr VSize VFree
test-vg 1 1 0 wz--n- 252.00m 0
root@s1:/tmp# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
test-vol test-vg -wi-a----- 252.00m

Now we can use the test-vol as if it was a partition:

root@s1:/tmp# mkfs.ext4 /dev/mapper/test--vg-test--vol
mke2fs 1.44.1 (24-Mar-2018)
...
Writing superblocks and filesystem accounting information: done

root@s1:/tmp# mount /dev/mapper/test--vg-test--vol /tmp/mnt/disk0/
root@s1:/tmp# cd /tmp/mnt/disk0/
root@s1:/tmp/mnt/disk0# touch this-is-my-file
root@s1:/tmp/mnt/disk0# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
/dev/mapper/test--vg-test--vol 241M 2.1M 222M 1% /tmp/mnt/disk0

Adding another disk to grow the filesystem

Imagine that we have filled our 241Mb volume in our 256Mb disk (/dev/loop0), and we need some more storage space. We could buy an extra disk (i.e. /dev/loop1) and add it to the LVM (using command vgextend).

 

root@s1:/tmp# pvcreate /dev/loop1
Physical volume "/dev/loop1" successfully created.
root@s1:/tmp# vgextend test-vg /dev/loop1
Volume group "test-vg" successfully extended

And now we have two physical volumes added to a single volume group. The VG is of size 504Mb and there are 252Mb free.

root@s1:/tmp# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 test-vg lvm2 a-- 252.00m 0
/dev/loop1 test-vg lvm2 a-- 252.00m 252.00m
root@s1:/tmp# vgs
VG #PV #LV #SN Attr VSize VFree
test-vg 2 1 0 wz--n- 504.00m 252.00m

We could think as if the VG was somehow a disk and the LV are the partitions. So we can make grow the LV in the VG and then grow the filesystem:

root@s1:/tmp/mnt/disk0# lvscan
ACTIVE '/dev/test-vg/test-vol' [252.00 MiB] inherit

root@s1:/tmp/mnt/disk0# lvextend -l +100%free /dev/test-vg/test-vol
Size of logical volume test-vg/test-vol changed from 252.00 MiB (63 extents) to 504.00 MiB (126 extents).
Logical volume test-vg/test-vol successfully resized.

root@s1:/tmp/mnt/disk0# resize2fs /dev/test-vg/test-vol
resize2fs 1.44.1 (24-Mar-2018)
Filesystem at /dev/test-vg/test-vol is mounted on /tmp/mnt/disk0; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 4
The filesystem on /dev/test-vg/test-vol is now 516096 (1k) blocks long.

root@s1:/tmp/mnt/disk0# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
/dev/mapper/test--vg-test--vol 485M 2.3M 456M 1% /tmp/mnt/disk0

root@s1:/tmp/mnt/disk0# ls -l
total 12
drwx------ 2 root root 12288 Apr 22 16:46 lost+found
-rw-r--r-- 1 root root 0 Apr 22 16:46 this-is-my-file

Now we have the new LV with double size.

Downsize the LV

Now we have obtained some free space and we want to keep only 1 disk (e.g. /dev/loop0). So we can downsize the filesystem (e.g. to 200Mb), and then downsize the LV.

This method needs unmounting the filesystem. So if you want to resize the root partition, you would need to use a live system or pivoting root to an unused filesystem as described [here](https://unix.stackexchange.com/a/227318).

First, unmount the filesystem and check it:

root@s1:/tmp# umount /tmp/mnt/disk0
root@s1:/tmp# e2fsck -ff /dev/mapper/test--vg-test--vol
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/test--vg-test--vol: 12/127008 files (0.0% non-contiguous), 22444/516096 blocks

Then change the size of the filesystem to the desired size:

root@s1:/tmp# resize2fs /dev/test-vg/test-vol 200M
resize2fs 1.44.1 (24-Mar-2018)
Resizing the filesystem on /dev/test-vg/test-vol to 204800 (1k) blocks.
The filesystem on /dev/test-vg/test-vol is now 204800 (1k) blocks long.

And now, we’ll reduce the logical volume to the new size and re-check the filesystem:

root@s1:/tmp# lvreduce -L 200M /dev/test-vg/test-vol
WARNING: Reducing active logical volume to 200.00 MiB.
THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce test-vg/test-vol? [y/n]: y
Size of logical volume test-vg/test-vol changed from 504.00 MiB (126 extents) to 200.00 MiB (50 extents).
Logical volume test-vg/test-vol successfully resized.
root@s1:/tmp# lvdisplay
--- Logical volume ---
LV Path /dev/test-vg/test-vol
LV Name test-vol
VG Name test-vg
LV UUID xGh4cd-R93l-UpAL-LGTV-qnxq-vvx2-obSubY
LV Write Access read/write
LV Creation host, time s1, 2020-04-22 16:26:48 +0200
LV Status available
# open 0
LV Size 200.00 MiB
Current LE 50
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0

root@s1:/tmp# resize2fs /dev/test-vg/test-vol
resize2fs 1.44.1 (24-Mar-2018)
The filesystem is already 204800 (1k) blocks long. Nothing to do!

And now we are ready to use the disk with the new size:

root@s1:/tmp# mount /dev/test-vg/test-vol /tmp/mnt/disk0/
root@s1:/tmp# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
/dev/mapper/test--vg-test--vol 190M 1.6M 176M 1% /tmp/mnt/disk0root@s1:/tmp# cd /tmp/mnt/disk0/
root@s1:/tmp/mnt/disk0# ls -l
total 12
drwx------ 2 root root 12288 Apr 22 16:46 lost+found
-rw-r--r-- 1 root root 0 Apr 22 16:46 this-is-my-file

Removing a PV

Now we want to remove /dev/loop0 (which was our original disks) and keep the replacement (/dev/loop1).

We just need to free /dev/loop0 from VGs and remove it from the VG, for being able to safely remove it from the system. First, we check the PVs:

root@s1:/tmp/mnt/disk0# pvs -o+pv_used
PV VG Fmt Attr PSize PFree Used
/dev/loop0 test-vg lvm2 a-- 252.00m 52.00m 200.00m
/dev/loop1 test-vg lvm2 a-- 252.00m 252.00m 0

We can see that /dev/loop0 is used, so we need to move its data to another PV:

root@s1:/tmp/mnt/disk0# pvmove /dev/loop0
/dev/loop0: Moved: 100.00%
root@s1:/tmp/mnt/disk0# pvs -o+pv_used
PV VG Fmt Attr PSize PFree Used
/dev/loop0 test-vg lvm2 a-- 252.00m 252.00m 0
/dev/loop1 test-vg lvm2 a-- 252.00m 52.00m 200.00m

Now /dev/loop0 is 100% free and we can remove it from the VG:

root@s1:/tmp/mnt/disk0# vgreduce test-vg /dev/loop0
Removed "/dev/loop0" from volume group "test-vg"
root@s1:/tmp/mnt/disk0# pvremove /dev/loop0
Labels on physical volume "/dev/loop0" successfully wiped.

Thin provisioning with LVM

Thin provisioning consists of providing the user with the illusion of having an amount of storage space, but only storing the amount of space that is actually used. It is similar to qcow2 or vmdk disk formats for virtual machines. The idea is that, if you have 1 Gb of storage space used, the backend will only store these data, even if the volume is 10Gb. The total amount of storage space will be used as it is requested.

First, you need to reserve an effective storage space in the form of a “thin pool”. The next example reserves 200M (-L 200M) as a thin storage (-T) as the thin pool thinpool in VG test-vg:

root@s1:/tmp/mnt# lvcreate -L 200M -T test-vg/thinpool
Using default stripesize 64.00 KiB.
Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data.
Logical volume "thinpool" created.

The result is that we’ll have a volume of size 200M, which is like a volume but is marked as a thin pool.

Now we can create two thin-provisioned volumes of the same size:

root@s1:/tmp/mnt# lvcreate -V 200M -T test-vg/thinpool -n thin-vol-1
Using default stripesize 64.00 KiB.
Logical volume "thin-vol-1" created.
root@s1:/tmp/mnt# lvcreate -V 200M -T test-vg/thinpool -n thin-vol-2
Using default stripesize 64.00 KiB.
WARNING: Sum of all thin volume sizes (400.00 MiB) exceeds the size of thin pool test-vg/thinpool and the amount of free space in volume group (296.00 MiB).
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
Logical volume "thin-vol-2" created.
root@s1:/tmp/mnt# lvs -o name,lv_size,data_percent,thin_count
LV LSize Data% #Thins
thin-vol-1 200.00m 0.00
thin-vol-2 200.00m 0.00
thinpool 200.00m 0.00 2

The result is that we have 2 volumes of 200Mb each, while actually having only 200Mb. But each of the volumes is empty. Now as we use the volumes, the space will be occupied:

root@s1:/tmp/mnt# mkfs.ext4 /dev/test-vg/thin-vol-1
(...)
Writing superblocks and filesystem accounting information: done

root@s1:/tmp/mnt# mount /dev/test-vg/thin-vol-1 /tmp/mnt/disk0/
root@s1:/tmp/mnt# dd if=/dev/random of=/tmp/mnt/disk0/randfile bs=1K count=1024
dd: warning: partial read (94 bytes); suggest iflag=fullblock
0+1024 records in
0+1024 records out
48623 bytes (49 kB, 47 KiB) copied, 0.0701102 s, 694 kB/s
root@s1:/tmp/mnt# lvs -o name,lv_size,data_percent,thin_count
LV LSize Data% #Thins
thin-vol-1 200.00m 5.56
thin-vol-2 200.00m 0.00
thinpool 200.00m 5.56 2
root@s1:/tmp/mnt# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
/dev/mapper/test--vg-thin--vol--1 190M 1.6M 175M 1% /tmp/mnt/disk0

In this example, we have used 49kb, which means 5.56%. If we repeat the process for the other volume:

root@s1:/tmp/mnt# mkfs.ext4 /dev/test-vg/thin-vol-2
(...)
Writing superblocks and filesystem accounting information: done

root@s1:/tmp/mnt# mkdir -p /tmp/mnt/disk1
root@s1:/tmp/mnt# mount /dev/test-vg/thin-vol-2 /tmp/mnt/disk1/
root@s1:/tmp/mnt# dd if=/dev/random of=/tmp/mnt/disk1/randfile bs=1K count=1024
dd: warning: partial read (86 bytes); suggest iflag=fullblock
0+1024 records in
0+1024 records out
38821 bytes (39 kB, 38 KiB) copied, 0.0473561 s, 820 kB/s
root@s1:/tmp/mnt# lvs -o name,lv_size,data_percent,thin_count
LV LSize Data% #Thins
thin-vol-1 200.00m 5.56
thin-vol-2 200.00m 5.56
thinpool 200.00m 11.12 2
root@s1:/tmp/mnt# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
/dev/mapper/test--vg-thin--vol--1 190M 1.6M 175M 1% /tmp/mnt/disk0
/dev/mapper/test--vg-thin--vol--2 190M 1.6M 175M 1% /tmp/mnt/disk1

We can see that the space is being consumed, but we still see the 200Mb for each of the volumes.

Using RAID with LVM

Apart from using LVM as RAID-0 (i.e. stripping an LV across multiple physical devices), it is possible to create other types of raids using LVM.

Some of the most popular types of raids are RAID-1, RAID-10, and RAID-5. In the context of LVM, they intuitively mean:

  • RAID-1: mirror an LV in multiple PV
  • RAID-10: mirror an LV in multiple PV, but also stripping parts of the volume in different PVs.
  • RAID-5: distributing the LV in multiple PV, and using some parity data to be able to continue working if PV fails.

You can check more specific information on RAIDs in this link.

You can use other RAID utilities (such as on-board RAIDs or software like mdadm), but using LVM you will also be able to take profit from LVM features.

Mirroring an LV with LVM

The first use case is to get a volume that is mirrored in multiple PV. We’ll get back to the 2 PV set-up:

root@s1:/tmp/mnt# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 test-vg lvm2 a-- 252.00m 252.00m
/dev/loop1 test-vg lvm2 a-- 252.00m 252.00m
root@s1:/tmp/mnt# vgs
VG #PV #LV #SN Attr VSize VFree
test-vg 2 0 0 wz--n- 504.00m 504.00m

And now we can create an LV that is in RAID1. But we can also set the number of copies that we want for the volume (using flag -m). In this case, we’ll create LV lv-mirror (-n lv-mirror) that is mirrored once (-m 1) with a size of 100Mb (-L 100M).

root@s1:/tmp/mnt# lvcreate --type raid1 -m 1 -L 100M -n lv-mirror test-vg
Logical volume "lv-mirror" created.
root@s1:/tmp/mnt# lvs -a -o name,copy_percent,devices
LV Cpy%Sync Devices
lv-mirror 100.00 lv-mirror_rimage_0(0),lv-mirror_rimage_1(0)
[lv-mirror_rimage_0] /dev/loop1(1)
[lv-mirror_rimage_1] /dev/loop0(1)
[lv-mirror_rmeta_0] /dev/loop1(0)
[lv-mirror_rmeta_1] /dev/loop0(0)

As you can see, LV lv-mirror is built from lv-mirror_rimage_0, lv-mirror_rmeta_0, lv-mirror_rimage_1 and lv-mirror_rmeta_1 “devices”, and in which PV is located each part.

For the case of RAID1, you can convert it to and from linear volumes. This feature is not (yet) implemented for other types of raid such as RAID10 or RAID5.

You can change the number of mirror copies for an LV (even getting the volume to linear), by using command lvconvert:

root@s1:/tmp/mnt# lvconvert -m 0 /dev/test-vg/lv-mirror
Are you sure you want to convert raid1 LV test-vg/lv-mirror to type linear losing all resilience? [y/n]: y
Logical volume test-vg/lv-mirror successfully converted.
root@s1:/tmp/mnt# lvs -a -o name,copy_percent,devices
LV Cpy%Sync Devices
lv-mirror /dev/loop1(1)
root@s1:/tmp/mnt# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 test-vg lvm2 a-- 252.00m 252.00m
/dev/loop1 test-vg lvm2 a-- 252.00m 152.00m

In this case, we have converted the volume to linear (i.e. zero copies).

But you can also get mirror capabilities for a linear volume:

root@s1:/tmp/mnt# lvconvert -m 1 /dev/test-vg/lv-mirror
Are you sure you want to convert linear LV test-vg/lv-mirror to raid1 with 2 images enhancing resilience? [y/n]: y
Logical volume test-vg/lv-mirror successfully converted.
root@s1:/tmp/mnt# lvs -a -o name,copy_percent,devices
LV Cpy%Sync Devices
lv-mirror 100.00 lv-mirror_rimage_0(0),lv-mirror_rimage_1(0)
[lv-mirror_rimage_0] /dev/loop1(1)
[lv-mirror_rimage_1] /dev/loop0(1)
[lv-mirror_rmeta_0] /dev/loop1(0)
[lv-mirror_rmeta_1] /dev/loop0(0)

More on RAIDs

Using command lvcreate it is also possible to create other types of RAID (e.g. RAID10, RAID5, RAID6, etc.). The only requirement is to have enough PVs for the type of RAID. But you must also have in mind that it is not possible to convert from these other types of raid to or from LV.

You can find much more information on LVM and RAIDs in this link.

More on LVM

There are a lot of features and tweaks to adjust in LVM, but this post shows the basics (and a bit more) to deal with LVM. You are advised to check the file /etc/lvm.conf and the man page.

How to Recover Partitions from LVM Volume

Yesterday I had a problem with a disk… while trying to increase the size of a LVM volume, I lost the disk. What I did was: add the new disk to the LVM volume, mirror the volume and removing the original disk. But the problem is that I added the disk back again and the things started to go wrong. The volume did not boot, etc.

The original system was a Scientific Linux 6.7 and it had different partitions: one /boot ext-4 partition and a LVM volume in which we had 2 partitions: lv-root and lv-swap.

At the end of the LVM problem I had the original volume with LVM bad metadata that did not allow me to use the original information. Luckily I had not written any other information in the disks… so the information had to be there.

Once the disaster was there…

I learned how to recover the partitions from a LVM volume.

I had to recover the partitions, to create a new disk with the partitions.

Recovering partitions

After some failed tests, I got to the situation in which I had /dev/sda with a single GPT partition in there. I remembered about TestDisk, which is a tool that helps in forensics, and I started to play with it.

1

The first that I did was to try to figure out what could I do with my partition. So I started my system with a ubuntu 14.04 desktop LiveCD and /dev/sda, downloaded TestDisk and tried

$ ./testdisk_static /dev/sda

Then I used the GPT option and analysed the disk. In the disk I found two partitions: my boot partition and a LVM partition. I wrote the partition table and got to the initial page where I entered in the advanced options to dump the partition (Image Creation).

2

Then I had the boot partition dumped and it could be used as a raw partition dumped with dd.

Then I exited from the app and started it back again, because the LVM partition was now accesible in /dev/sda2. Now I tried

$ ./testdisk_static /dev/sda2

Now I selected the disk and selected the Intel partition option. TestDisk found the two partitions: the Linux and the Linux Swap.

3.png

And now I dumped the Linux partition.

Disclaimer

This is my specific organization for the disk, but the key is that TestDisk helped me to figure out where the partitions were and to get their raw images.

Creating the new disk

Now that I have the partition images: image.dd (for the boot partition) and sda1.dd, and now I have to create a new disk. So I booted the ubuntu desktop again, with a new disk (/dev/sdb).

The first thing is to get the size of the partitions and we will use the fdisk utility on the dumped files:

$ fdisk -l image.dd
$ fdisk -l sda1.dd

Using these commands I will get the size in sectors of the disk. And using that size I can make the partitions in /dev/sdb. According to my case, I will create a partition for the boot and other for the filesystem. My options were the next (please pay attention to the images to check that the values of the sectors match between the size of the partitions and the size of the NEW partitions and so on).

$ fdisk /dev/sdb
n
<enter>
<enter>
<enter>
+1024000
n
<enter>
<enter>
<enter>
+15810560
w

3.png

The key is to pay attention to the size (in sectors) of the partitions obtained with the fdisk -l command issued before. If you have any doubt, please check the images.

And now you are ready to dump the images in the disk:

$ dd if=image.dd of=/dev/sdb1
...
$ dd if=sda1.dd of=/dev/sdb2
...

Check this cool tip!

The process of dd costs a lot. If you want to see the progress, you can open other commandline and issue the command

$ killall -USR1 dd

The dd command will output the size that it has dumped in its console. If you want to see the progress, you can issue a command like this one:

$ watch -n 10 killall -USR1 dd

This will make that dd outputs the size dumped every 10 seconds.

More on this

Once I had the partition dumped, I used gparted to resize the second partition (as it was almost full). My disk was far bigger than the original, but if you only want to get the data or you have free space, this won’t probably be useful for you (so I am skipping it).

 

How to compact a QCOW2 or a VMDK file

When you create a Virtual Machine (VM), you usually have the option of use a format that reserves the whole size of the disk (e.g. RAW), or to use a format that grows according to the used space in the disk (e.g. QCOW or VMDK).

The problem is that the space actually used in the disk grows as the disk files are written, but it is not decreased as they are deleted. But if you writed a lot of files and you deleted after they were needed, you’d probably have a lot of space reserved for the VMDK file, while that space is not actually used. I wanted to reclaim that space, to move the VMs using less space, and so this time…

I learned how to compact a VMDK file (the same method applies to QCOW2)

The method is, in fact, very easy… you simply have to re-encode the file using the same output format. If you have your original-disk.vmdk file, you simply have to issue a command like this one:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

And that will make the magic (easy, isn’t it?).

But if you want to compact it more, you can claim more space from the disk before re-enconding the disk. First, I’d go to the solution and then I’ll explain it:

If the VM is a linux-based, you can boot it and create a zero-file, and once the file has exhausted the disk, delete it:

$ dd if=/dev/zero of=/tmp/zerofile.raw...$ rm /tmp/zerofile.raw

If the VM is a Windows-based, you can get the command sdelete from Microsoft website decompress it and execute the following commandline:

c:\> sdelete -z c:

Now you can power off the VM and issue the qemu-img command. You’ll get a file that correspond to only the used space in the disk:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

Explanation

(Disclaimer: Please take into account that this is a simple and conceptual explanation)

If you knew about how the disks are managed, you’d probably know that when a file is deleted, it is not actually deleted from the disk. Instead, the space that it was using is marked as “ready to be used in case that it is needed”. So if a new file is created in the disk, it is possible that it uses that physical space (or not).

That is the trick from which the file recovery applications work: trying to find those “ready to be used” sectors. And that is why the “low-level format” exists: in order to “zero” the disk and to avoid that files are recovered.

When you created the /tmp/zerofile.raw file, you started to write zeros in the disk. When the physical empty space was exhausted, the disk controller started to use the “ready to be used” sectors, and the zerofile wrote zeros on them, and the zeros were written in the VMDK file.

The good thing here is that when a VMDK file is created (from any format… in our case, it is VMDK), the qemu-img application does not write those zeros in the file that contains the disk, and that is how the storage space is reclaimed.