How to use a Dell PS Equallogic as a backend for OpenStack Cinder

I have written several posts on installing OpenStack Rocky from scratch. They all have the tag #openstack. In the previous posts we…

  1. Installed OpenStack Rocky (part 1part 2, and part 3).
  2. Installed the Horizon Dashboard and upgraded noVNC (install horizon).
  3. Installed Cinder and integrated it with Glance (in this post).

And now that I have a Dell PS Equallogic, I learned…

How to use a Dell PS Equallogic as a backend for OpenStack Cinder

In the post in which we installed Cinder, we used a single disk and LVM as a backend to store and to serve the Cinder volumes. But in my lab, we own a Dell PS Equallogic, which is far better storage than a Linux server as a SAN. So I’d prefer to use it as a backend for Cinder.

In the last post we did “the hard work” and now Cinder, and setting the new backend is easier now. We’ll follow the official documentation of the plugin for the Dell EMC PS Series in this link.

Prior to replacing the storage backend, it is advisable to remove any volume that was stored in the other backend. And also the images that were volume-backed. Otherwise, Cinder will understand that the old volumes are stored in the new backend, and your installation will run into a weird state. If you plan to keep both storage backends (e.g. because you still have running volumes), you can add the new storage backend and set it as default.

In this guide, I will assume that every volume and image has been removed from the OpenStack deployment. Moreover, I assume that the PS Equallogic is up and running.

The SAN is in a separate data network. The IP address of the SAN is 192.168.1.11, and every node and the controller have IP addresses in that range. In my particular case, the controller has the IP address 192.168.1.49.

Create a user in the SAN

We need a user for OpenStack to access the SAN. So I have created user “osuser” with password “SAN_PASS”.

SAN
osuser is restricted to use only come features

It is important to check the connectivity via ssh. We can check it from the front-end:

osuser@192.168.1.11's password:
Last login: Wed May 6 15:45:56 2020 from 192.168.1.49 on tty??


Welcome to Group Manager

Copyright 2001-2016 Dell Inc.

group1>

Add the new backend to Cinder

Now that we have the user, we can just add the new backend to Cinder. So we’ll add the following lines to /etc/cinder/cinder.conf

[dell]
volume_driver = cinder.volume.drivers.dell_emc.ps.PSSeriesISCSIDriver
san_ip = 192.168.1.11
san_login = osuser
san_password = SAN_PASS
eqlx_group_name = group1
eqlx_pool = default

# Optional settings
san_thin_provision = true
use_chap_auth = false
eqlx_cli_max_retries = 3
san_ssh_port = 22
ssh_conn_timeout = 15
ssh_min_pool_conn = 1
ssh_max_pool_conn = 5

# Enable the volume-backed image cache
image_volume_cache_enabled = True

These lines just configure the access to your SAN. Please adjust the parameters to your settings and preferences.

Pay attention to variable “image_volume_cache_enabled” that I have also set to True for this backend. This enables the creation of the images by means of volume cloning, instead of uploading the images to the volume each time (you can read more about this in the part of integrating Cinder and Glance, in the previous post). In the end, this mechanism boosts the boot process of the VMs.

Now, we have to update the value of the enable backends in the [DEFAULT] section of file /etc/cinder/cinder.conf. In my case, I have disabled the other backends (i.e. LVM):

enabled_backends = dell

Finally, you have to restart the cinder services:

# service cinder-volume restart
# service cinder-scheduler restart

Testing the integration

If the integration was fine, you can find messages like the next ones in /var/log/cinder-volume.log:

2020-05-06 14:17:40.688 17855 INFO cinder.volume.manager [req-74aabdaa-5f88-4576-801e-cf923265d23e - - - - -] Starting volume driver PSSeriesISCSIDriver (1.4.6)
2020-05-06 14:17:40.931 17855 INFO paramiko.transport [-] Connected (version 1.99, client OpenSSH_5.0)
2020-05-06 14:17:42.096 17855 INFO paramiko.transport [-] Authentication (password) successful!
2020-05-06 14:17:42.099 17855 INFO cinder.volume.drivers.dell_emc.ps [req-74aabdaa-5f88-4576-801e-cf923265d23e - - - - -] PS-driver: executing "cli-settings confirmation off".
...

As I prepared a separate block for osuser, when I get to the SAN via SSH, I get no volume:

osuser@192.168.1.11's password:
Last login: Wed May 6 16:02:01 2020 from 192.168.1.49 on tty??


Welcome to Group Manager

Copyright 2001-2016 Dell Inc.

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
group1>

Now I am creating a new volume:

# openstack volume create --size 2 testvol
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
(...)
| id | 7844286d-869d-49dc-9c91-d7af6c2123ab |
(...)

And now, if I get to the SAN CLI, I will see a new volume whose name coincides with the ID of the just created volume:

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
volume-7844286d 2GB 0 online read-write 0 Y
-869d-49dc-9c
91-d7af6c2123
ab

Verifying the usage as image cache for volume backed instances

We are creating a new image, and it should be stored in the SAN, because we set Cinder as the default storage backend for Glance, in the previous post.

# openstack image create --public --container-format bare --disk-format qcow2 --file ./bionic-server-cloudimg-amd64.img "Ubuntu 18.04"
...
# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+

And the SAN’s CLI shows the newly created volume:

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
volume-41caf4ad 1GB 0 online read-write 0 Y
-2bbd-4311-90
03-00d39d009a
9f

At this point, we will boot a new VM which is volume backed and makes use of that image.

Captura de pantalla 2020-05-06 a las 16.44.55

 

First, we will see that a new volume is being created, and the volume that corresponds to the image is connected to “None”.

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+----------+------+-----------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+----------+------+-----------------------------------+
| dd4071bf-48e9-484c-80cd-89f52a4fa442 | | creating | 4 | |
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | in-use | 1 | Attached to None on glance_store |
+--------------------------------------+--------------------------------------------+----------+------+-----------------------------------+

This is because Cinder is grabbing the image to be able to prepare it and upload it to the storage backend as a special image to clone from. We can check that folder /var/lib/cinder/conversion/ contains a temporary file, which corresponds to the image.

root@menoscloud:~# ls -l /var/lib/cinder/conversion/
total 337728
-rw------- 1 cinder cinder 0 may 6 14:32 tmp3wcPzD
-rw------- 1 cinder cinder 345833472 may 6 14:31 tmpDsKUzxmenoscloud@dell

Once obtained the image, Cinder will convert it to raw format (using qemu-img), into the volume that it has just created.

root@menoscloud:~# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| dd4071bf-48e9-484c-80cd-89f52a4fa442 | | downloading | 3 | |
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 1 | |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
root@menoscloud:~# ps -ef | grep qemu
root 18571 17855 0 14:32 ? 00:00:00 sudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpDsKUzxmenoscloud@dell /dev/sda
root 18573 18571 2 14:32 ? 00:00:00 /usr/bin/python2.7 /usr/bin/cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpDsKUzxmenoscloud@dell /dev/sda
root 18575 18573 17 14:32 ? 00:00:02 /usr/bin/qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpDsKUzxmenoscloud@dell /dev/sda

And once the conversion procedure has finished, we’ll see that it appears a new volume that stores the image (in raw format), and it is ready to be cloned for the next volume-backed instances:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------------------------+
| dd4071bf-48e9-484c-80cd-89f52a4fa442 | | in-use | 4 | Attached to eql1 on /dev/vda |
| 61903a21-02ad-4dc4-8709-a039e7a65815 | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 3 | |
| 41caf4ad-2bbd-4311-9003-00d39d009a9f | image-c8f3d5c5-5d4a-47de-bac0-bfd3f673c713 | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-------------------------------+

If we start a new volume-backed instance, we can check that it boots much faster than the first one, and it happens because in this case, Cinder skips the “qemu-img convert” phase and just clones the volume. It is possible to check the commands in file /var/log/cinder/cinder-volume.log:

(...)
2020-05-06 14:39:39.181 17855 INFO cinder.volume.drivers.dell_emc.ps [req-0a0a625f-eb5d-4826-82b2-f1bce93535cd 22a4facfd9794df1b8db1b4b074ae6db 50ab438534cd4c04b9ad341b803a1587 - - -] PS-driver: executing "volume select volume-61903a21-02ad-4dc4-8709-a039e7a65815 clone volume-f3226260-7790-427c-b3f0-da6ab1b2291b".
2020-05-06 14:39:40.333 17855 INFO cinder.volume.drivers.dell_emc.ps [req-0a0a625f-eb5d-4826-82b2-f1bce93535cd 22a4facfd9794df1b8db1b4b074ae6db 50ab438534cd4c04b9ad341b803a1587 - - -] PS-driver: executing "volume select volume-f3226260-7790-427c-b3f0-da6ab1b2291b size 4G no-snap".
(...)

Obviously, you can check that the volumes have been created in the backend, by using the SAN’s CLI:

group1> volume show
Name Size Snapshots Status Permission Connections T
--------------- ---------- --------- ------- ---------- ----------- -
volume-41caf4ad 1GB 0 online read-write 0 Y
-2bbd-4311-90
03-00d39d009a
9f
volume-dd4071bf 4GB 0 online read-write 1 Y
-48e9-484c-80
cd-89f52a4fa4
42
volume-61903a21 3GB 0 online read-write 0 Y
-02ad-4dc4-87
09-a039e7a658
15
volume-f3226260 4GB 0 online read-write 1 Y
-7790-427c-b3
f0-da6ab1b229
1b

How to install Cinder in OpenStack Rocky and make it work with Glance

I have written several posts on installing OpenStack Rocky from scratch. They all have the tag #openstack. In the previous posts we…

  1. Installed OpenStack Rocky (part 1, part 2, and part 3).
  2. Installed the Horizon Dashboard and upgraded noVNC (install horizon).

So we have a working installation of the basic services (keystone, glance, neutron, compute, etc.). And now it is time to learn

How to install Cinder in OpenStack Rocky and make it work with Glance

Cinder is very straightforward to install using the basic mechanism: having a standard Linux server that will serve block devices as a SAN, by providing iSCSI endpoints. This server will use tgtadm and iscsiadm as the basic tools, and a backend for the block devices.

The other problem is to integrate the cinder server with an external SAN device, such as a Dell Equallogic SAN. Cinder has some plugins and each of them has its own problems.

In this post, we are following the standard cinder installation guide for Ubuntu (in this link), and what we’ll get is the standard SAN server with an LVM back-end for the block devices. Then we will integrate it with Glance (to be able to use Cinder as a storage for the OpenStack images) and we’ll learn a bit about how they work.

Installing Cinder

In the first place we are creating a database for Cinder:

mysql -u root -p <<< "CREATE DATABASE cinder;\
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' IDENTIFIED BY 'CINDER_DBPASS';\
GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' IDENTIFIED BY 'CINDER_DBPASS';"

Now we are creating a user for Cinder and to create the service in OpenStack (we create both v2 and v3):

$ openstack user create --domain default --password "CINDER_PASS" cinder
$ openstack role add --project service --user cinder admin
$ openstack service create --name cinderv2 --description "OpenStack Block Storage" volumev2
$ openstack service create --name cinderv3 --description "OpenStack Block Storage" volumev3

Once we have the user and the service, we create the proper endpoints for both v2 and v3:

$ openstack endpoint create --region RegionOne volumev2 public http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev2 internal http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev2 admin http://controller:8776/v2/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 public http://controller:8776/v3/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 internal http://controller:8776/v3/%\(project_id\)s
$ openstack endpoint create --region RegionOne volumev3 admin http://controller:8776/v3/%\(project_id\)s

And now we are ready to install the cinder packages

$ apt install -y cinder-api cinder-scheduler

Once the packages are installed, we need to update the configuration file /etc/cinder/cinder.conf. The content will be something like the next:

[DEFAULT]
rootwrap_config = /etc/cinder/rootwrap.conf
api_paste_confg = /etc/cinder/api-paste.ini
iscsi_helper = tgtadm
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
auth_strategy = keystone
state_path = /var/lib/cinder
lock_path = /var/lock/cinder
volumes_dir = /var/lib/cinder/volumes
enabled_backends = lvm
transport_url = rabbit://openstack:RABBIT_PASS@controller
my_ip = 192.168.1.241
glance_api_servers = http://controller:9292
[database]
connection = mysql+pymysql://cinder:CINDER_DBPASS@controller/cinder
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = cinder
password = CINDER_PASS
[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
[lvm]
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group = cinder-volumes
iscsi_protocol = iscsi
iscsi_helper = tgtadm
image_volume_cache_enabled = True

You must adapt this file to your configuration. In special, the passwords of rabbit, cinder database and cinder service, and the IP address of the cinder server (which is stored in my_ip variable). In my case, I am using the same server as in the previous posts.

Using this configuration we suppose that cinder will use tgtadm to create the iSCSI endpoints, and the backend will be LVM.

Now we just have to add the following lines to file /etc/nova/nova.conf to enable cinder in OpenStack via nova-api:

[cinder]
os_region_name=RegionOne

Then, sync the cinder database by executing the next command:

$ su -s /bin/sh -c "cinder-manage db sync" cinder

And restart the related services:

$ service nova-api restart
$ service cinder-scheduler restart
$ service apache2 restart

Preparing the LVM backend

Now that we have configured cinder, we need a backend for the block devices. In our case, it is LVM. If you want to know a bit more about the concepts that we are using at this point and what we are doing, you can check my previous post in this link.

Now we are installing the LVM tools:

$ apt install lvm2 thin-provisioning-tools

LVM needs a partition or a whole disk to work. You can use any partition or disk (or even a file that can be used for testing purposes, as described in the section “testlab” in this link). In our case, we are using the whole disk /dev/vdb.

According to our settings, OpenStack expects to be able to use an existing LVM volume group with the name “cinder-volumes”. So we need to create it

$ pvcreate /dev/vdb
$ vgcreate cinder-volumes /dev/vdb

Once we have our volume group ready, we can install the cinder-volume service.

$ apt install cinder-volume

And that’s all about the installation of cinder. The last part will work because we included section [lvm] in /etc/cinder/cinder.conf and “enabled_backends = lvm”.

Verifying that cinder works

To verify that cinder works, we’ll just create one volume:

# openstack volume create --size 2 checkvol
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| consistencygroup_id | None                                 |
| created_at          | 2020-05-05T09:52:47.000000           |
| description         | None                                 |
| encrypted           | False                                |
| id                  | aadc24eb-ec1c-4b84-b2b2-8ea894b50417 |
| migration_status    | None                                 |
| multiattach         | False                                |
| name                | checkvol                             |
| properties          |                                      |
| replication_status  | None                                 |
| size                | 2                                    |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| status              | creating                             |
| type                | None                                 |
| updated_at          | None                                 |
| user_id             | 8c67fb57d70646d9b57beb83cc04a892     |
+---------------------+--------------------------------------+

After a while, we can check that the volume has been properly created and it is available.

# openstack volume list
+--------------------------------------+----------+-----------+------+-----------------------------+
| ID                                   | Name     | Status    | Size | Attached to                 |
+--------------------------------------+----------+-----------+------+-----------------------------+
| aadc24eb-ec1c-4b84-b2b2-8ea894b50417 | checkvol | available |    2 |                             |
+--------------------------------------+----------+-----------+------+-----------------------------+

If we are curious, we can check what happened in the backend:

# lvs -o name,lv_size,data_percent,thin_count
  LV                                          LSize  Data%  #Thins
  cinder-volumes-pool                         19,00g 0,00        1
  volume-aadc24eb-ec1c-4b84-b2b2-8ea894b50417  2,00g 0,00

We can see that we have a volume with the name volume-aadc24eb-ec1c-4b84-b2b2-8ea894b50417, with the ID that coincides with the ID of the volume that we have just created. Moreover, we can see that it has occupied 0% of space because it is thin-provisioned (i.e. it will only use the effective stored data like in qcow2 or vmdk virtual disk formats).

Integrating Cinder with Glance

The integration of Cinder with Glance can be made in two different parts:

  1. Using Cinder as a storage backend for the Images.
  2. Using Cinder as a cache for the Images of the VMs that are volume-based.

It may seem that it is the same, but it is not. To be able to identify what feature we want, we need to know how OpenStack works, and also acknowledge that Cinder and Glance are independent services.

Using Cinder as a backend for Glance

In OpenStack, when a VM is image-based (i.e. it does not create a volume), nova-compute will transfer the image to the host in which it has to be used. It happens regarding it is from a filesystem backend (i.e. stored in /var/lib/glance/images/), it is stored in swift (it is transferred using HTTP), or it is stored in cinder (it is transferred using iSCSI). So using Cinder as a storage backend for the Images will prevent the need of having extra storage in the controller. But it will not be useful for anything more.

Image based VM
Booting an image-based VM, which does not create a volume.

If you start a volume-based image, OpenStack will create a volume for your new VM (using cinder). In this case, cinder is very inefficient, because it connects to the existing volume, downloads it, and converts it to raw format and dumps it to the new volume (i.e. using qemu-img convert -O raw -f qcow2 …). So the creation of the image is extremely slow.

There is one way to boost this procedure by using efficient tools: if the image is stored in raw format and the owner is the same user that tries to use it (check image_upload_use_internal_tenant), and the allowed_direct_url_schemes option is properly set, the new volume will be created by cloning the volume that contains the image and resizing it using the backend tools (i.e. lvm cloning and resizing capabilities). That means that the new volume will be almost instantly created, and we’ll try to use this mechanism, if possible.

To enable cinder as a backend for Glance, you need to add the following lines to file /etc/glance/glance-api.conf

[glance_store]
stores = file,http,cinder
default_store = cinder
filesystem_store_datadir = /var/lib/glance/images/
cinder_store_auth_address = http://controller:5000/v3
cinder_store_user_name = cinder
cinder_store_password = CINDER_PASS
cinder_store_project_name = service

We are just adding “Cinder” as one of the mechanisms for Glance (apart from the others, like file or HTTP). In our example, we are setting Cinder as the default storage backend, because the horizon dashboard has not any way to select where to store the images.

It is possible to set any other storage backend as default storage backend, but then you’ll need to create the volumes by hand and execute Glance low-level commands such as “glance location-add <image-uuid> –url cinder://<volume-uuid>”. The mechanism can be seen in the official guide.

Variables cinder_store_user_name, cinder_store_password, and cinder_store_project_name are used to set the owner of the images that are uploaded to Cinder via Glance. And they are used only if setting image_upload_use_internal_tenant is set to True in the Cinder configuration.

And now we need to add the next lines to section [DEFAULT] in /etc/cinder/cinder.conf:

allowed_direct_url_schemes = cinder
image_upload_use_internal_tenant = True

Finally, you need to restart the services:

# service cinder-volume restart
# service cinder-scheduler restart
# service glance-api restart

It may seem a bit messy, but Cinder and Glance are configured in this way. I feel that if you use the configuration that I propose in this post, you’ll get the integration as expected.

Verifying the integration

We are storing a new image, but we’ll store it as a volume this time. Moreover, we will store it in raw format, to be able to use the “direct_url” method to clone the volumes instead of downloading them:

# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
(...)
# qemu-img convert -O raw bionic-server-cloudimg-amd64.img bionic-server-cloudimg-amd64.raw
# openstack image create --public --container-format bare --disk-format raw --file ./bionic-server-cloudimg-amd64.raw "Ubuntu 18.04"+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(...)
| id | 7fd1c4b4-783e-41cb-800d-4e259c22d1ab |
| name | Ubuntu 18 |
(...)
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

And now we can check what happened under the hood:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID                                   | Name                                       | Status    | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| 13721f57-c706-47c9-9114-f4b011f32ea2 | image-7fd1c4b4-783e-41cb-800d-4e259c22d1ab | available |    3 |             |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            11,57  16,11
  volume-13721f57-c706-47c9-9114-f4b011f32ea2 cinder-volumes Vwi-a-tz--  3,00g cinder-volumes-pool        73,31

We can see that a new volume has been created with the name “image-7fd1c4b4…”, which corresponds to the just created image ID. The volume has an ID 13721f57…, and LVM has a new logical volume with the name volume-13721f57 that corresponds to that new volume.

Now if we create a new VM that uses that image, we will notice that the creation of the VM is very quick (and this is because we used the “allowed_direct_url_schemes” method).

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| 219d9f92-ce17-4da6-96fa-86a04e460eb2 | | in-use | 4 | Attached to u1 on /dev/vda |
| 13721f57-c706-47c9-9114-f4b011f32ea2 | image-7fd1c4b4-783e-41cb-800d-4e259c22d1ab | available | 3 | |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
cinder-volumes-pool cinder-volumes twi-aotz-- 19,00g 11,57 16,11
volume-13721f57-c706-47c9-9114-f4b011f32ea2 cinder-volumes Vwi-a-tz-- 3,00g cinder-volumes-pool 73,31
volume-219d9f92-ce17-4da6-96fa-86a04e460eb2 cinder-volumes Vwi-aotz-- 4,00g cinder-volumes-pool volume-13721f57-c706-47c9-9114-f4b011f32ea2 54,98

Under the hood, we can see that the creation of the volume for the instance (id 219d9f92…) has a reference is LVM to the original volume (id 13721f57…) that corresponds to the volume of the image.

Cinder as an image-volume storage cache

If you do not need Cinder as a storage backend (either because you are happy with the filesystem backend or you are storing the images in swift, etc.), it is also possible to use it to boost the boot process of VMs that create a volume as the main disk.

Cinder enables a mechanism to be used as an image-volume storage cache. It means that when an image is being used for a volume-based VM, it will be stored in a special volume regarding it was stored in cinder or not. Then the volume that contains the image will be cloned and resized (using the backend tools; i.e. lvm cloning and resizing capabilities) for subsequent VMs that use that image.

During the first use of the image, it will be downloaded (either from the filesystem, cinder, swift, or wherever the image is stored), converted to raw format, and stored as a volume. The next uses of the image will work as using the “direct_url” method (i.e. cloning the volume).

To enable this mechanism, you need to get the id of project “server” and the id of user “cinder”:

# openstack project list
+----------------------------------+---------+
| ID | Name |
+----------------------------------+---------+
| 50ab438534cd4c04b9ad341b803a1587 | service |
(...)
+----------------------------------+---------+
# openstack user list
+----------------------------------+-----------+
| ID | Name |
+----------------------------------+-----------+
| 22a4facfd9794df1b8db1b4b074ae6db | cinder |
(...)
+----------------------------------+-----------+

Then you need to add the following lines to the [DEFAULT] section in file /etc/cinder/cinder.conf (configuring your ids):

cinder_internal_tenant_project_id = 50ab438534cd4c04b9ad341b803a1587
cinder_internal_tenant_user_id = 22a4facfd9794df1b8db1b4b074ae6db

And add the following line to the section of the backend that will act as a cache (in our case [lvm])

[lvm]
...
image_volume_cache_enabled = True

Then you just need to restart the cinder services:

root@menoscloud:~# service cinder-volume restart
root@menoscloud:~# service cinder-scheduler restart

Testing the cache

In this case, I am creating a new image, which is in qcow2 format:

# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
(...)
# openstack image create --public --container-format bare --disk-format qcow2 --file ./bionic-server-cloudimg-amd64.img "Ubuntu 18.04 - qcow2"

Under the hood, OpenStack created a volume (id c36de566…) for the corresponding image (id 50b84eb0…), that can be seen in LVM:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            1,70   11,31
  volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz--  1,00g cinder-volumes-pool        32,21

Now we create a VM (which is volume-based). And during the “block device mapping” phase, we can inspect what is happening under the hood:

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| ID                                   | Name                                       | Status      | Size | Attached to |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
| 60c19e3c-3960-4fe7-9895-0426070b3e88 |                                            | downloading |    3 |             |
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available   |    1 |             |
+--------------------------------------+--------------------------------------------+-------------+------+-------------+
# lvs
  LV                                          VG             Attr       LSize  Pool                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  cinder-volumes-pool                         cinder-volumes twi-aotz-- 19,00g                            3,83   12,46
  volume-60c19e3c-3960-4fe7-9895-0426070b3e88 cinder-volumes Vwi-aotz--  3,00g cinder-volumes-pool        13,54
  volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz--  1,00g cinder-volumes-pool        32,21
# ps -ef | grep qemu
root      9681  9169  0 09:42 ?        00:00:00 sudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88
root      9682  9681  0 09:42 ?        00:00:00 /usr/bin/python2.7 /usr/bin/cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88
root      9684  9682 29 09:42 ?        00:00:13 /usr/bin/qemu-img convert -O raw -t none -f qcow2 /var/lib/cinder/conversion/tmpWqlJD5menoscloud@lvm /dev/mapper/cinder--volumes-volume--60c19e3c--3960--4fe7--9895--0426070b3e88inder has created a new volume (id 26aee3ee...) and it is converting the content of the image (id 50b84eb0...) to raw format into that new volume.

Cinder created a new volume (id 60c19e3c…) that does not correspond with the size of the flavor I used (4Gb). And cinder is converting one image to that new volume. That image was previously downloaded from Cinder, from volume c36de566… to folder /var/lib/cinder/conversion mapping the iSCSI device and dumping its contents. In case that the image was not Cinder backed, it would have been downloaded using the appropriate mechanisms (e.g. HTTP or file copy from /var/lib/glance/images).

After a while (depending on the conversion process), the VM will start and we can inspect the backend…

# openstack volume list --all-projects
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
| 91d51bc2-e33b-4b97-b91d-3a8655f88d0f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 3 | |
| 60c19e3c-3960-4fe7-9895-0426070b3e88 | | in-use | 4 | Attached to q1 on /dev/vda |
| c36de566-d538-4b43-b2b3-d000f9b4162f | image-50b84eb0-9de5-45ba-8004-f1f1c7a0c00c | available | 1 | |
+--------------------------------------+--------------------------------------------+-----------+------+-----------------------------+
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
cinder-volumes-pool cinder-volumes twi-aotz-- 19,00g 13,69 17,60
volume-60c19e3c-3960-4fe7-9895-0426070b3e88 cinder-volumes Vwi-aotz-- 4,00g cinder-volumes-pool 56,48
volume-91d51bc2-e33b-4b97-b91d-3a8655f88d0f cinder-volumes Vwi-a-tz-- 3,00g cinder-volumes-pool volume-60c19e3c-3960-4fe7-9895-0426070b3e88 73,31
volume-c36de566-d538-4b43-b2b3-d000f9b4162f cinder-volumes Vwi-a-tz-- 1,00g cinder-volumes-pool 32,21

Now we can see that there is a new volume (id 91d51bc2…) which has been associated to the image (id 50b84eb0…). And that volume will be cloned using the LVM mechanisms in the next uses of the image for volume-backend instances. Now if you start new instances, they will boot much faster.

How to move from a linear disk to an LVM disk and join the two disks into an LVM-like RAID-0

I had the recent need for adding a disk to an existing installation of Ubuntu, to make the / folder bigger. In such a case, I have two possibilities: to move my whole system to a new bigger disk (and e.g. dispose of the original disk) or to convert my disk to an LVM volume and add a second disk to enable the volume to grow. The first case was the subject of a previous post, but this time I learned…

How to move from a linear disk to an LVM disk and join the two disks into an LVM-like RAID-0

The starting point is simple:

  • I have one 14 Gb. disk (/dev/vda) with a single partition that is mounted in / (The disk has a GPT table and UEFI format and so it has extra partitions that we’ll keep as they are).
  • I have an 80 Gb. brand new disk (/dev/vdb)
  • I want to have one 94 Gb. volume built from the two disks
root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
├─vda1 252:1 0 13.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 80G 0 disk /mnt
vdc 252:32 0 4G 0 disk [SWAP]

The steps are the next:

  1. Creating a boot partition in /dev/vdb (this is needed because Grub cannot boot from LVM and needs an ext or VFAT partition)
  2. Format the boot partition and put the content of the current /boot folder
  3. Create an LVM volume using the extra space in /dev/vdb and initialize it using an ext4 filesystem
  4. Put the contents of the current / folder into the new partition
  5. Update grub to boot from the new disk
  6. Update the mount point for our system
  7. Reboot (and check)
  8. Add the previous disk to the LVM volume.

Let’s start…

Separate the /boot partition

When installing an LVM system, it is needed to have a /boot partition in a common format (e.g. ext2 or ext4), because GRUB cannot read from LVM. Then GRUB reads the contents of that partition and starts the proper modules to read the LVM volumes.

So we need to create the /boot partition. In our case, we are using ext2 format, because has no digest (we do not need it for the content of /boot) and it is faster. We are using 1 Gb. for the /boot partition, but 512 Mb. will probably be enough:

root@somove:~# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (1-4, default 1):
First sector (2048-167772159, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-167772159, default 167772159): +1G

Created a new partition 1 of type 'Linux' and of size 1 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

root@somove:~# mkfs.ext2 /dev/vdb1
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 24618637-d2d4-45fe-bf83-d69d37f769d0
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

Now we’ll make a mount point for this partition, mount the partition and copy the contents of the current /boot folder to that partition:

root@somove:~# mkdir /mnt/boot
root@somove:~# mount /dev/vdb1 /mnt/boot/
root@somove:~# cp -ax /boot/* /mnt/boot/

Create an LVM volume in the extra space of /dev/vdb

First, we will create a new partition for our LVM system, and we’ll get the whole free space:

root@somove:~# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type
p primary (1 primary, 0 extended, 3 free)
e extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (2-4, default 2):
First sector (2099200-167772159, default 2099200):
Last sector, +sectors or +size{K,M,G,T,P} (2099200-167772159, default 167772159):

Created a new partition 2 of type 'Linux' and of size 79 GiB.

Command (m for help): w
The partition table has been altered.
Syncing disks.

Now we will create a Physical Volume, a Volume Group and the Logical Volume for our root filesystem, using the new partition:

root@somove:~# pvcreate /dev/vdb2
Physical volume "/dev/vdb2" successfully created.
root@somove:~# vgcreate rootvg /dev/vdb2
Volume group "rootvg" successfully created
root@somove:~# lvcreate -l +100%free -n rootfs rootvg
Logical volume "rootfs" created.

If you want to learn about LVM to better understand what we are doing, you can read my previous post.

Now we are initializing the filesystem of the new /dev/rootvg/rootfs volume using ext4, and then we’ll copy the existing filesystem except from the special folders and the /boot folder (which we have separated in the other partition):

root@somove:~# mkfs.ext4 /dev/rootvg/rootfs
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 20708352 4k blocks and 5177344 inodes
Filesystem UUID: 47b4b698-4b63-4933-98d9-f8904ad36b2e
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

root@somove:~# mkdir /mnt/rootfs
root@somove:~# mount /dev/rootvg/rootfs /mnt/rootfs/
root@somove:~# rsync -aHAXx --delete --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/boot/*,/lost+found} / /mnt/rootfs/

Update the system to boot from the new /boot partition and the LVM volume

At this point we have our /boot partition (/dev/vdb1) and the / filesystem (/dev/rootvg/rootfs). Now we need to prepare GRUB to boot using these new resources. And here comes the magic…

root@somove:~# mount --bind /dev /mnt/rootfs/dev/
root@somove:~# mount --bind /sys /mnt/rootfs/sys/
root@somove:~# mount -t proc /proc /mnt/rootfs/proc/
root@somove:~# chroot /mnt/rootfs/

We are binding the special mount points /dev and /sys to the same folders in the new filesystem which is mounted in /mnt/rootfs. We are also creating the /proc mount point which holds the information about the processes. You can find some more information about why this is needed in my previous post on chroot and containers.

Intuitively, we are somehow “in the new filesystem” and now we can update things as if we had already booted into it.

At this point, we need to update the mount point in /etc/fstab to mount the proper disks once the system boots. So we are getting the UUIDs for our partitions:

root@somove:/# blkid
/dev/vda1: LABEL="cloudimg-rootfs" UUID="135ecb53-0b91-4a6d-8068-899705b8e046" TYPE="ext4" PARTUUID="b27490c5-04b3-4475-a92b-53807f0e1431"
/dev/vda14: PARTUUID="14ad2c62-0a5e-4026-a37f-0e958da56fd1"
/dev/vda15: LABEL="UEFI" UUID="BF99-DB4C" TYPE="vfat" PARTUUID="9c37d9c9-69de-4613-9966-609073fba1d3"
/dev/vdb1: UUID="24618637-d2d4-45fe-bf83-d69d37f769d0" TYPE="ext2"
/dev/vdb2: UUID="Uzt1px-ANds-tXYj-Xwyp-gLYj-SDU3-pRz3ed" TYPE="LVM2_member"
/dev/mapper/rootvg-rootfs: UUID="47b4b698-4b63-4933-98d9-f8904ad36b2e" TYPE="ext4"
/dev/vdc: UUID="3377ec47-a0c9-4544-b01b-7267ea48577d" TYPE="swap"

And we are updating /etc/fstab to mount /dev/mapper/rootvg-rootfs as the / folder. But we need to mount partition /dev/vdb1 in /boot. Using our example, the /etc/fstab file will be this one:

UUID="47b4b698-4b63-4933-98d9-f8904ad36b2e" / ext4 defaults 0 0
UUID="24618637-d2d4-45fe-bf83-d69d37f769d0" /boot ext2 defaults 0 0
LABEL=UEFI /boot/efi vfat defaults 0 0
UUID="3377ec47-a0c9-4544-b01b-7267ea48577d" none swap sw,comment=cloudconfig 0 0

We are using the UUID to mount / and /boot folders because the devices may change their names or location and that may lead to breaking our system.

And now we are ready to mount our /boot partition, update grub, and to install it in the /dev/vda disk (because we are keeping both disks).

root@somove:/# mount /boot
root@somove:/# update-grub
Generating grub configuration file ...
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Found linux image: /boot/vmlinuz-4.15.0-43-generic
Found initrd image: /boot/initrd.img-4.15.0-43-generic
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Found Ubuntu 18.04.1 LTS (18.04) on /dev/vda1
done
root@somove:/# grub-install /dev/vda
Installing for i386-pc platform.
Installation finished. No error reported.

Reboot and check

We are almost done, and now we are exiting the chroot and rebooting

root@somove:/# exit
root@somove:~# reboot

And the result should be the next one:

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
├─vda1 252:1 0 13.9G 0 part
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 80G 0 disk
├─vdb1 252:17 0 1G 0 part /boot
└─vdb2 252:18 0 79G 0 part
└─rootvg-rootfs 253:0 0 79G 0 lvm /
vdc 252:32 0 4G 0 disk [SWAP]

root@somove:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 395M 676K 394M 1% /run
/dev/mapper/rootvg-rootfs 78G 993M 73G 2% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/vdb1 1008M 43M 915M 5% /boot
/dev/vda15 105M 3.6M 101M 4% /boot/efi
tmpfs 395M 0 395M 0% /run/user/1000

We have our / system mounted from the new LVM Logical Volume /dev/rootvg/rootfs, the /boot partition from /dev/vdb1, and the /boot/efi from the existing partition (just in case that we need it).

Add the previous disk to the LVM volume

Here we are facing the easier part, which is to integrate the original /dev/vda1 volume in the LVM volume.

Once we have double-checked that we had copied every file from the original / folder in /dev/vda1, we can initialize it for using it in LVM:

WARNING: This step wipes the content of /dev/vda1.

root@somove:~# pvcreate /dev/vda1
WARNING: ext4 signature detected on /dev/vda1 at offset 1080. Wipe it? [y/n]: y
Wiping ext4 signature on /dev/vda1.
Physical volume "/dev/vda1" successfully created.

Finally, we can integrate the new partition in our volume group and extend the logical volume to use the free space:

root@somove:~# vgextend rootvg /dev/vda1
Volume group "rootvg" successfully extended
root@somove:~# lvextend -l +100%free /dev/rootvg/rootfs
Size of logical volume rootvg/rootfs changed from <79.00 GiB (20223 extents) to 92.88 GiB (23778 extents).
Logical volume rootvg/rootfs successfully resized.
root@somove:~# resize2fs /dev/rootvg/rootfs
resize2fs 1.44.1 (24-Mar-2018)
Filesystem at /dev/rootvg/rootfs is mounted on /; on-line resizing required
old_desc_blocks = 10, new_desc_blocks = 12
The filesystem on /dev/rootvg/rootfs is now 24348672 (4k) blocks long.

And now we have the new 94 Gb. / folder which is made from /dev/vda1 and /dev/vdb2:

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
├─vda1 252:1 0 13.9G 0 part
│ └─rootvg-rootfs 253:0 0 92.9G 0 lvm /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 80G 0 disk
├─vdb1 252:17 0 1G 0 part /boot
└─vdb2 252:18 0 79G 0 part
└─rootvg-rootfs 253:0 0 92.9G 0 lvm /
vdc 252:32 0 4G 0 disk [SWAP]
root@somove:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 395M 676K 394M 1% /run
/dev/mapper/rootvg-rootfs 91G 997M 86G 2% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/vdb1 1008M 43M 915M 5% /boot
/dev/vda15 105M 3.6M 101M 4% /boot/efi
tmpfs 395M 0 395M 0% /run/user/1000

(optional) Having the /boot partition to /dev/vda

In case we wanted to have the /boot partition in /dev/vda, the procedure will be a bit different:

  1. Instead of creating the LVM volume in /dev/vdb1, I would prefer to create a single partition /dev/vdb1 (ext4) which does not imply the separation of /boot and /.
  2. Once created /dev/vdb1, copy the filesystem in /dev/vda1 to /dev/vdb1 and prepare to boot from /dev/vdb1 (chroot, adjust mount points, update-grub, grub-install…).
  3. Boot from the new partition and wipe the original /dev/vda1 partition.
  4. Create a partition /dev/vda1 for the new /boot and initialize it using ext2, copy the contents of /boot according to the instructions in this post.
  5. Create a partition /dev/vda2, create the LVM volume, initialize it and copy the contents of /dev/vdb1 except from /boot
  6. Prepare to boot from /dev/vda (chroot, adjust mount points, mount /boot, update-grub, grub-install…)
  7. Boot from the new /root+LVM / and decide whether you want to add /dev/vdb to the LVM volume or not.

Using this procedure, you will get from linear to LVM with a single disk. And then you can decide whether to make the LVM to grow or not. Moreover you may decide whether to create a LVM-Raid(1,5,…) with the new or other disks.

How to move an existing installation of Ubuntu to another disk

Under some circumstances, we may have the need of moving a working installation of Ubuntu to another disk. The most common case is when your current disk runs out of space and you want to move it to a bigger one. But you could also want to move to an SSD disk or to create an LVM raid…

So this time I learned…

How to move an existing installation of Ubuntu to another disk

I have a 14Gb disk that contains my / partition (vda), and I want to move to a new 80Gb disk (vdb).

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
└─vda1 252:1 0 13.9G 0 part /
vdb 252:16 0 80G 0 disk
vdc 252:32 0 4G 0 disk [SWAP]

First of all, I will create a partition for my / system in /dev/vdb.

root@somove:~# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-167772159, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-167772159, default 167772159):

Created a new partition 1 of type 'Linux' and of size 80 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

NOTE: The inputs from the user are: n for the new partition and the defaults (i.e. return) for any setting to get the whole disk. Then w to write the partition table.

Now that I have the new partition, we’ll create the filesystem (ext4):

root@somove:~# mkfs.ext4 /dev/vdb1
mke2fs 1.44.1 (24-Mar-2018)
Creating filesystem with 20971264 4k blocks and 5242880 inodes
Filesystem UUID: ea7ee2f5-749e-4e74-bcc3-2785297291a4
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

We have to transfer the content of the running filesystem to the new disk. But first, we’ll make sure that any other mount point except for / is unmounted (to avoid copying files in other disks).:

root@somove:~# umount -a
umount: /run/user/1000: target is busy.
umount: /sys/fs/cgroup/unified: target is busy.
umount: /sys/fs/cgroup: target is busy.
umount: /: target is busy.
umount: /run: target is busy.
umount: /dev: target is busy.
root@somove:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=2006900k,nr_inodes=501725,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=403912k,mode=755)
/dev/vda1 on / type ext4 (rw,relatime,data=ordered)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=403908k,mode=700,uid=1000,gid=1000)
root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 14G 0 disk
└─vda1 252:1 0 13.9G 0 part /
vdb 252:16 0 80G 0 disk
└─vdb1 252:17 0 80G 0 part
vdc 252:32 0 4G 0 disk [SWAP]

Now we will create a mount point for the new filesystem and we’ll copy everything from / to it, except for the special folders (i.e. /tmp, /sys, /dev, etc.). Once completed, we’ll create the Linux special folders:

root@somove:~# mkdir /mnt/vdb1
root@somove:~# mount /dev/vdb1 /mnt/vdb1/
root@somove:~# rsync -aHAXx --delete --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} / /mnt/vdb1/

Instead of using rsync, we could use cp -ax /bin /etc /home /lib /lib64 …, but you need make sure that all folders and files are copied. You also need to make sure that the special folders are created by running mkdir /mnt/vdb1/{boot,mnt,proc,run,tmp,dev,sys}. The rsync version is easier to control and to understand.

Now that we have the same directory tree, we just need to make the magic of chroot to prepare the new disk:

root@somove:~# mount --bind /dev /mnt/vdb1/dev
root@somove:~# mount --bind /sys /mnt/vdb1/sys
root@somove:~# mount -t proc /proc /mnt/vdb1/proc
root@somove:~# chroot /mnt/vdb1/

We need to make sure that the new system will try to mount in / the new partition (i.e. /dev/vdb1), but we cannot use /dev/vdb1 id because if we remove the other disk, it will modify its device name to /dev/vda1. So we are using the UUID of the disk. To get it, we can use blkid:

root@somove:/# blkid
/dev/vda1: UUID="135ecb53-0b91-4a6d-8068-899705b8e046" TYPE="ext4"
/dev/vdb1: UUID="eb8d215e-d186-46b8-bd37-4b244cbb8768" TYPE="ext4"

And now we have to update file /etc/fstab to mount the proper UUID in the / folder. The new file /etc/fstab for our example is the next:

UUID="eb8d215e-d186-46b8-bd37-4b244cbb8768" / ext4 defaults 0 0

At this point, we need to update grub to match our disks (it will get the UUID or labels), and install it in the new disk:

root@somove:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-43-generic
Found initrd image: /boot/initrd.img-4.15.0-43-generic
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Found Ubuntu 18.04.1 LTS (18.04) on /dev/vda1
done
root@somove:/# grub-install /dev/vdb
Installing for i386-pc platform.
Installation finished. No error reported.

WARNING: In case we get error “error: will not proceed with blocklists.”, please go to the end part of this post.

WARNING: If you plan to keep the original disk in its place (e.g. a Virtual Machine in Amazon or OpenStack), you must install grub in /dev/vda. Otherwise, it will boot the previous system.

Finally, you can exit from chroot, power off, remove the old disk, and boot using the new one. The result will be next:

root@somove:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:16 0 80G 0 disk
└─vda1 252:17 0 80G 0 part /
vdc 252:32 0 4G 0 disk [SWAP]

What if we get error “error: will not proceed with blocklists.”

If we get this error (only if we get this error), we’ll need to wipe the gap between the init of the disk and the partition and then we’ll be able to install grub in the disk.

WARNING: make sure that you know what you are doing, or the disk is new, because this can potentialle erase the data of /dev/vdb.

$ grub-install /dev/vdb
Installing for i386-pc platform.
grub-install: warning: Attempting to install GRUB to a disk with multiple partition labels. This is not supported yet..
grub-install: warning: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged..
grub-install: error: will not proceed with blocklists.

In this case, we need to check the partition table of /dev/vdb

root@somove:/# fdisk -l /dev/vdb
Disk /dev/vdb: 80 GiB, 85899345920 bytes, 167772160 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
/dev/vdb1 2048 167772159 167770112 80G 83 Linux

And now we will put zeros in /dev/vdb (skipping the first sector where the partition table is stored), up to the sector before to where our partition starts (in our case, partition /dev/vdb1 starts in sector 2048 and so we will zero 2047 sectors):

root@somove:/# dd if=/dev/zero of=/dev/vdb seek=1 count=2047
2047+0 records in
2047+0 records out
1048064 bytes (1.0 MB, 1.0 MiB) copied, 0.0245413 s, 42.7 MB/s

If this was the problem, now you should be able to install grub:

root@somove:/# grub-install /dev/vdb
Installing for i386-pc platform.
Installation finished. No error reported.

How to install Horizon Dashboard in OpenStack Rocky and upgrade noVNC

Some time ago I wrote a series of posts on installing OpenStack Rocky: Part 1, Part 2 and Part 3. That installation was usable by the command line, but now I learned…

How to install Horizon Dashboard in OpenStack Rocky and upgrade noVNC

In this post, I start from the working installation of OpenStack Rocky in Ubuntu created in the previous posts in this series.

Installing the Horizon Dashboard if you have used the configuration settings that I suggested is very simple (I’m following the official documentation). You just need to install the dashboard packages and dependencies by issuing the next command:

$ apt install openstack-dashboard

And now you need to configure the dashboard settings in the file /etc/openstack-dashboard/local_settings.py. The basic configuration can be made with the next lines:

$ sed -i 's/OPENSTACK_HOST = "[^"]*"/OPENSTACK_HOST = "controller"/g' /etc/openstack-dashboard/local_settings.py
$ sed -i 's/^\(CACHES = {\)/SESSION_ENGINE = "django.contrib.sessions.backends.cache"\n\1/' /etc/openstack-dashboard/local_settings.py
$ sed -i "s/'LOCATION': '127\.0\.0\.1:11211',/'LOCATION': 'controller:11211'/" /etc/openstack-dashboard/local_settings.py
$ sed -i 's/^\(#OPENSTACK_API_VERSIONS = {\)/OPENSTACK_API_VERSIONS = {\n"identity": 3,\n"image": 2,\n"volume": 2,\n}\n\1'
$ sed -i 's/^OPENSTACK_KEYSTONE_DEFAULT_ROLE = "[^"]*"/OPENSTACK_KEYSTONE_DEFAULT_ROLE = "user"/'
$ sed -i "/^#OPENSTACK_KEYSTONE_DEFAULT_DOMAIN =.*$/aOPENSTACK_KEYSTONE_DEFAULT_DOMAIN='Default'" /etc/openstack-dashboard/local_settings.py
$ sed -i 's/^TIME_ZONE = "UTC"/TIME_ZONE = "Europe\/Madrid"/' /etc/openstack-dashboard/local_settings.py

Each line consists in:

  1. Setting the IP address of the controller (we set it in the /etc/hosts file).
  2. Setting a cache engine for the pages.
  3. Setting the memcached server (we installed it in the controller).
  4. Set the version of the APIs that we installed.
  5. Setting the default role for the users that log into the portal to “user” instead of the default one (which is “member”).
  6. Set the default domain to “Default” to avoid the need for querying for the domain to the users.
  7. Set the timezone of the site (you can check the code for your timezone here)

In our installation, we used the self-service option for the networks. If you changed it, please make sure that variable OPENSTACK_NEUTRON_NETWORK to match your platform.

And that’s all on the basic configuration of Horizon. Now we just need to check that file /etc/apache2/conf-available/openstack-dashboard.conf contains the next line:

WSGIApplicationGroup %{GLOBAL}

And finally, you need to restart apache:

$ service apache2 restart

Now you should be able to log in to the horizon portal by using the address https://controller.my.server/horizon in your web browser.

Please take into account that controller.my.server corresponds to the routable IP address of your server (if you followed the previous posts, it is 158.42.1.1).

Captura de pantalla 2020-04-01 a las 13.09.29

Configuring VNC

One of the most common complaints about the Horizon dashboard is that VNC does not work. The effect is that you can reach to the “console” tab of the instances, but you cannot see the console. And if you try to open the console in a new tab, you will probably find a Not Found web page.

The most common error, in this case, is that you have not configured the noVNC settings in the /etc/nova/nova.conf file, in the computing nodes. So please check the section [vnc] in that file. It should look like the next:

[vnc]
enabled = true
server_listen = 0.0.0.0
server_proxyclient_address = $my_ip
novncproxy_base_url = http://controller.my.server:6080/vnc_auto.html

There are two keys in this configuration:

  • Make sure that controller.my.server corresponds to the routable IP address of your server (the same that you used in the web browser).
  • Make sure that file vnc_auto.html exists in folder /usr/share/novnc in the host where horizon is installed.

Captura de pantalla 2020-04-01 a las 13.22.32

Upgrading noVNC

OpenStack Rocky comes with a very old version of noVNC (0.4), while at the moment of writing this post, noVNC has already released version 1.1.0 (see here).

To update noVNC is as easy as getting the file that contains the release and to put it in /usr/share/novnc, in the front-end:

$ cd /tmp
$ wget https://github.com/novnc/noVNC/archive/v1.1.0.tar.gz -O novnc-v1.1.0.tar.gz
$ tar xfz novnc-v1.1.0.tar.gz
$ mv /tmp/noVNC-1.1.0 /usr/share
$ cd /usr/share
$ mv novnc novnc-0.4
$ ln -s noVNC-1.1.0 novnc

Now we need to configure the new settings in the compute nodes. So we need to update file /etc/nova/nova.conf in each compute node. We need to modify the [vnc] section to match the new version of noVNC. The section will look like the next one:

[vnc]
enabled = true
server_listen = 0.0.0.0
server_proxyclient_address = $my_ip
novncproxy_base_url = http://controller.my.server:6080/vnc_lite.html

Finally, you will need to restart nova-compute on each compute node:

$ service nova-compute restart

and to restart apache2 in the server in which horizon is installed:

$ service apache2 restart

*WARNING* It is not guaranteed that the changes will be updated for the running instances, but it will be applied for the new ones.

 

How to install OpenStack Rocky – part 2

This is the second post on the installation of OpenStack Rocky in an Ubuntu based deployment.

In this post I am explaning how to install the essential OpenStack services in the controller. Please check the previous port How to install OpenStack Rocky – part 1 to learn on how to prepare the servers to host our OpenStack Rocky platform.

Recap

In the last post, we prepared the network for both the node controller and the compute elements. The description is in the next figure:

horsemen

We also installed the prerrequisites for the controller.

Installation of the controller

In this section we are installing keystone, glance, nova, neutron and the dashboard, in the controller.

Repositories

In first place, we need to install the OpenStack repositories:

# apt install software-properties-common
# add-apt-repository cloud-archive:rocky
# apt update && apt -y dist-upgrade
# apt install -y python-openstackclient

Keystone

To install keystone, first we need to create the database:

# mysql -u root -p <<< "CREATE DATABASE keystone;
GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost' IDENTIFIED BY 'KEYSTONE_DBPASS';
GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'%' IDENTIFIED BY 'KEYSTONE_DBPASS';"

And now, we’ll install keystone

# apt install -y keystone apache2 libapache2-mod-wsgi

We are creating the minimal keystone.conf configuration, according to the basic deployment:

# cat > /etc/keystone/keystone.conf  <<EOT
[DEFAULT]
log_dir = /var/log/keystone
[database]
connection = mysql+pymysql://keystone:KEYSTONE_DBPASS@controller/keystone
[extra_headers]
Distribution = Ubuntu
[token]
provider = fernet"
fi
EOT

Now we need to execute some commands to prepare the keystone service

# su keystone -s /bin/sh -c 'keystone-manage db_sync'
# keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
# keystone-manage credential_setup --keystone-user keystone --keystone-group keystone
# keystone-manage bootstrap --bootstrap-password "ADMIN_PASS" --bootstrap-admin-url http://controller:5000/v3/ --bootstrap-internal-url http://controller:5000/v3/ --bootstrap-public-url http://controller:5000/v3/ --bootstrap-region-id RegionOne

At this moment, we have to configure apache2, because it is used as the http backend.

# echo "ServerName controller" >> /etc/apache2/apache2.conf
# service apache2 restart

Finally we’ll prepare a file that contains a set of variables that will be used to access openstack. This file will be called admin-openrc and its content is the next:

# cat > admin-openrc <<EOT
export OS_PROJECT_DOMAIN_NAME=Default
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_NAME=admin
export OS_USERNAME=admin
export OS_PASSWORD=ADMIN_PASS
export OS_AUTH_URL=http://controller:5000/v3
export OS_IDENTITY_API_VERSION=3
export OS_IMAGE_API_VERSION=2
EOT

And now we are almost ready to operate keystone. Now we need to source that file:

# source admin-openrc

And now we are ready to issue commands in OpenStack. And we are testing by create a project that will host the OpenStack services:

# openstack project create --domain default --description "Service Project" service

Demo Project

In the OpenStack installation guide, a demo project is created. We are including the creation of this demo project, although it is not needed:

# openstack project create --domain default --description "Demo Project" myproject
# openstack user create --domain default --password "MYUSER_PASS" myuser
# openstack role create myrole
# openstack role add --project myproject --user myuser myrole

We are also creating the set of variables needed in the system to execute commands in OpenStack, using this demo user

# cat > demo-openrc << EOT
export OS_PROJECT_DOMAIN_NAME=Default
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_NAME=myproject
export OS_USERNAME=myuser
export OS_PASSWORD=MYUSER_PASS
export OS_AUTH_URL=http://controller:5000/v3
export OS_IDENTITY_API_VERSION=3
export OS_IMAGE_API_VERSION=2"
EOT

In case that you want to use this demo user and project, you will be either login in the horizon portal (once it is installed in further steps), using the pair myuser/MYUSER_PASS credentials, or sourcing the file demo-openrc to use the commandline.

Glance

Glance is the OpenStack service dedicated to manage the VM images. And this using this steps, we will be able to make a basic installation where the images are stored in the filesystem of the controller.

First we need to create a database and user in mysql:

# mysql -u root -p <<< "CREATE DATABASE glance;
GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'localhost' IDENTIFIED BY 'GLANCE_DBPASS';
GRANT ALL PRIVILEGES ON glance.* TO 'glance'@'%' IDENTIFIED BY 'GLANCE_DBPASS';"

Now we need to create the user dedicated to run the service and the endpoints in keystone, but first we’ll make sure that we have the proper env variables by sourcing the admin credentials:

# source admin-openrc# openstack user create --domain default --password "GLANCE_PASS" glance
# openstack role add --project service --user glance admin
# openstack service create --name glance --description "OpenStack Image" image
# openstack endpoint create --region RegionOne image public http://controller:9292
# openstack endpoint create --region RegionOne image internal http://controller:9292
# openstack endpoint create --region RegionOne image admin http://controller:9292

Now we are ready to install the components:

# apt install -y glance

At the time of writing this post, there is an error in the glance package in the OpenStack repositories. That makes that (e.g.) integration with cinder does not work. The problem is that file /etc/glance/rootwrap.conf and folder /etc/glance/rootwrap.d are inside folder /etc/glance/glance. So the patch simply consists in executing

$ mv /etc/glance/glance/rootwrap.* /etc/glance/

 

And now we are creating the basic configuration files, needed to run glance as in the basic installation:

# cat > /etc/glance/glance-api.conf  << EOT
[database]
connection = mysql+pymysql://glance:GLANCE_DBPASS@controller/glance
backend = sqlalchemy
[image_format]
disk_formats = ami,ari,aki,vhd,vhdx,vmdk,raw,qcow2,vdi,iso,ploop.root-tar
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = glance
password = GLANCE_PASS
[paste_deploy]
flavor = keystone
[glance_store]
stores = file,http
default_store = file
filesystem_store_datadir = /var/lib/glance/images/
EOT

And

# cat > /etc/glance/glance-registry.conf  << EOT
[database]
connection = mysql+pymysql://glance:GLANCE_DBPASS@controller/glance
backend = sqlalchemy
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = glance
password = GLANCE_PASS
[paste_deploy]
flavor = keystone
EOT

The backend to store the files is the folder /var/lib/glance/images/ in the controller node. If you want to change this folder, please update the variable filesystem_store_datadir in the file glance-api.conf

We have created the files that result from the installation from the official documentation and now we are ready to start glance. First we’ll prepare the database

# su -s /bin/sh -c "glance-manage db_sync" glance

And finally we will restart the services

# service glance-registry restart
# service glance-api restart

At this point, we are creating our first image (the common cirros image):

# wget -q http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img -O /tmp/cirros-0.4.0-x86_64-disk.img
# openstack image create "cirros" --file /tmp/cirros-0.4.0-x86_64-disk.img --disk-format qcow2 --container-format bare --public

Nova (i.e. compute)

Nova is the set of services dedicated to the compute service. As we are installing the controller, this server will not run any VM. Instead it will coordinate the creation of the VMs in the working nodes.

First we need to create the databases and users in mysql:

# mysql -u root -p <<< "CREATE DATABASE nova_api;
CREATE DATABASE nova;
CREATE DATABASE nova_cell0;
CREATE DATABASE placement;
GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'localhost' IDENTIFIED BY 'NOVA_DBPASS';
GRANT ALL PRIVILEGES ON nova_api.* TO 'nova'@'%' IDENTIFIED BY 'NOVA_DBPASS';
GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'localhost' IDENTIFIED BY 'NOVA_DBPASS';
GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'%' IDENTIFIED BY 'NOVA_DBPASS';
GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'localhost' IDENTIFIED BY 'NOVA_DBPASS';
GRANT ALL PRIVILEGES ON nova_cell0.* TO 'nova'@'%' IDENTIFIED BY 'NOVA_DBPASS';
GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'localhost' IDENTIFIED BY 'PLACEMENT_DBPASS';
GRANT ALL PRIVILEGES ON placement.* TO 'placement'@'%' IDENTIFIED BY 'PLACEMENT_DBPASS';"

And now we will create the users that will manage the services and the endpoints in keystone. But first we’ll make sure that we have the proper env variables by sourcing the admin credentials:

# source admin-openrc
# openstack user create --domain default --password "NOVA_PASS" nova
# openstack role add --project service --user nova admin
# openstack service create --name nova --description "OpenStack Compute" compute
# openstack endpoint create --region RegionOne compute public http://controller:8774/v2.1
# openstack endpoint create --region RegionOne compute internal http://controller:8774/v2.1
# openstack endpoint create --region RegionOne compute admin http://controller:8774/v2.1
# openstack user create --domain default --password "PLACEMENT_PASS" placement
# openstack role add --project service --user placement admin
# openstack service create --name placement --description "Placement API" placement
# openstack endpoint create --region RegionOne placement public http://controller:8778
# openstack endpoint create --region RegionOne placement internal http://controller:8778
# openstack endpoint create --region RegionOne placement admin http://controller:8778

Now we’ll install the services

# apt -y install nova-api nova-conductor nova-consoleauth nova-novncproxy nova-scheduler nova-placement-api

Once the services have been installed, we are creating the basic configuration file

# cat > /etc/nova/nova.conf  <<\EOT
[DEFAULT]
lock_path = /var/lock/nova
state_path = /var/lib/nova
transport_url = rabbit://openstack:RABBIT_PASS@controller
my_ip = 192.168.1.240
use_neutron = true
firewall_driver = nova.virt.firewall.NoopFirewallDriver
[api]
auth_strategy = keystone
[api_database]
connection = mysql+pymysql://nova:NOVA_DBPASS@controller/nova_api
[cells]
enable = False
[database]
connection = mysql+pymysql://nova:NOVA_DBPASS@controller/nova
[glance]
api_servers = http://controller:9292
[keystone_authtoken]
auth_url = http://controller:5000/v3
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = nova
password = NOVA_PASS
[neutron]
url = http://controller:9696
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = neutron
password = NEUTRON_PASS
service_metadata_proxy = true
metadata_proxy_shared_secret = METADATA_SECRET
[oslo_concurrency]
lock_path = /var/lib/nova/tmp
[placement]
os_region_name = openstack
region_name = RegionOne
project_domain_name = Default
project_name = service
auth_type = password
user_domain_name = Default
auth_url = http://controller:5000/v3
username = placement
password = PLACEMENT_PASS
[placement_database]
connection = mysql+pymysql://placement:PLACEMENT_DBPASS@controller/placement
[scheduler]
discover_hosts_in_cells_interval = 300
[vnc]
enabled = true
server_listen = $my_ip
server_proxyclient_address = $my_ip
EOT

In this file, the most important value to tweak is “my_ip” that corresponds to the internal IP address of the controller.

Also remember that we are using simple passwords, to ease its tracking. In case that you need to make the deployment more secure, please set secure passwords.

At this point we need to synchronize the databases and create the openstack cells

# su -s /bin/sh -c "nova-manage api_db sync" nova
# su -s /bin/sh -c "nova-manage cell_v2 map_cell0" nova
# su -s /bin/sh -c "nova-manage cell_v2 create_cell --name=cell1 --verbose" nova
# su -s /bin/sh -c "nova-manage db sync" nova

Finally we need to restart the compute services.

# service nova-api restart
# service nova-consoleauth restart
# service nova-scheduler restart
# service nova-conductor restart
# service nova-novncproxy restart

We have to take into account that this is the controller node, and will not host any virtual machine.

Neutron

Neutron is the networking service in OpenStack. In this post we are installing the “self-sevice networks” option, so that the users will be able to create their isolated networks.

In first place, we are creating the database for the neutron service.

# mysql -u root -p <<< "CREATE DATABASE neutron;
GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'localhost' IDENTIFIED BY 'NEUTRON_DBPASS';
GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'%' IDENTIFIED BY 'NEUTRON_DBPASS'
"

Now we will create the openstack user and endpoints, but first we need to ensure that we set the env variables:

# source admin-openrc 
# openstack user create --domain default --password "NEUTRON_PASS" neutron
# openstack role add --project service --user neutron admin
# openstack service create --name neutron --description "OpenStack Networking" network
# openstack endpoint create --region RegionOne network public http://controller:9696
# openstack endpoint create --region RegionOne network internal http://controller:9696
# openstack endpoint create --region RegionOne network admin http://controller:9696

Now we are ready to install the packages related to neutron:

# apt install -y neutron-server neutron-plugin-ml2 neutron-linuxbridge-agent neutron-l3-agent neutron-dhcp-agent neutron-metadata-agent

And now we need to create the configuration files for neutron. In first place, the general file /etc/neutron/neutron.conf

# cat > /etc/neutron/neutron.conf <<\EOT
[DEFAULT]
core_plugin = ml2
core_plugin = ml2
service_plugins = router
allow_overlapping_ips = true
transport_url = rabbit://openstack:RABBIT_PASS@controller
auth_strategy = keystone
notify_nova_on_port_status_changes = true
notify_nova_on_port_data_changes = true
[agent]
root_helper = "sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf"
[database]
connection = mysql+pymysql://neutron:NEUTRON_DBPASS@controller/neutron
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = neutron
password = NEUTRON_PASS
[nova]
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = nova
password = NOVA_PASS
[oslo_concurrency]
lock_path = /var/lock/neutron
EOT

Now the file /etc/neutron/plugins/ml2/ml2_conf.ini, that will be used to instruct neutron how to create the LANs:

# cat > /etc/neutron/plugins/ml2/ml2_conf.ini <<\EOT
[ml2]
type_drivers = flat,vlan,vxlan
tenant_network_types = vxlan
mechanism_drivers = linuxbridge,l2population
extension_drivers = port_security
[ml2_type_flat]
flat_networks = provider
[ml2_type_vxlan]
vni_ranges = 1:1000
[securitygroup]
enable_ipset = true
EOT

Now the file /etc/neutron/plugins/ml2/linuxbridge_agent.ini, because we are using linux bridges in this setup:

# cat > /etc/neutron/plugins/ml2/linuxbridge_agent.ini <<\EOT
[linux_bridge]
physical_interface_mappings = provider:eno3
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
enable_security_group = true
[vxlan]
enable_vxlan = true
local_ip = 192.168.1.240
l2_population = true
EOT

In this file, it is important to tweak the value “eno3” in value physical_interface_mappings, to match the physical interface that has access to the provider’s network (i.e. public network). It is also essential to set the proper value for “local_ip“, which is the IP address of your internal interface with which to communicate to the compute hosts.

Now we have to create the files correponding to the l3_agent and the dhcp agent:

# cat > /etc/neutron/l3_agent.ini <<EOT
[DEFAULT]
interface_driver = linuxbridge
EOT
# cat /etc/neutron/dhcp_agent.ini <<EOT
[DEFAULT]
interface_driver = linuxbridge
dhcp_driver = neutron.agent.linux.dhcp.Dnsmasq
enable_isolated_metadata = true
dnsmasq_dns_servers = 8.8.8.8
EOT

Finally we need to create the file /etc/neutron/metadata_agent.ini

# cat > genfile /etc/neutron/metadata_agent.ini <<EOT
[DEFAULT]
nova_metadata_host = controller
metadata_proxy_shared_secret = METADATA_SECRET
EOT

Once the configuration files have been created, we are synchronizing the database and restarting the services related to neutron.

# su -s /bin/sh -c "neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini upgrade head" neutron
# service nova-api restart
# service neutron-server restart
# service neutron-linuxbridge-agent restart
# service neutron-dhcp-agent restart
# service neutron-metadata-agent restart
# service neutron-l3-agent restart

And that’s all.

At this point we have the controller node installed according to the OpenStack documentation. It is possible to issue any command, but it will not be possible to start any VM, because we have not installed any compute node, yet.

 

 

How to install OpenStack Rocky – part 1

This is the first post of a series in which I am describing the installation of a OpenStack site using the latest distribution at writting time: Rocky.

My project is very ambitious, because I have a 2 virtualization nodes (that have different GPU each), 10GbE, a lot of memory and disk, and I want to offer the GPUs to the VMs. The front-end is a 24 core server, with 32Gb. RAM and 6 Tb. disk, with 4 network ports (2x10GbE+2x1GbE), that will also act as block devices server.

We’ll be using Ubuntu 18.04 LTS for all the nodes, and I’ll try to follow the official documentation. But I will try to be very straighforward in the configuration… I want to make it work and I will try to explain how things work, instead of tunning the configuration.

How to install OpenStack Rocky – part 1

My setup for the OpenStack installation is the next one:

horsemen

In the figure I have annotated the most relevant data to identify the servers: the IP addresses for each interface, which is the volume server and the virtualization nodes that will share their GPUs.

At the end, the server horsemen will host the next services: keystone, glance, cinder, neutron and horizon. On the other side, fh01 and fh02 will host the services compute and neutron-related.

In each of the servers we need a network interface (eno1, enp1s0f1 and enp1f0f1) which is intended for administration purposes (i.e. the network 192.168.1.1/24). That interface has a gateway (192.168.1.220) that enables the access to the internet via NAT. From now on, we’ll call these interfaces as the “private interfaces“.

We need an additional interface that is connected to the provider network (i.e. to the internet). That network will hold the publicly routable IP addresses. In my case, I have the network 158.42.1.1/24, that is publicly routable. It is a flat network with its own network services (e.g. gateway, nameservers, dhcp servers, etc.). From now on, we’ll call these interfaces the “public interfaces“.

One note on “eno4” interface in horsemen: I am using this interface for accessing horizon. In case that you do not have a different interface, you can use interface aliasing or providing the IP address in the ifupdown “up” script.

An extra note on “eno2” interface in horsemen: It is an extra interface in the node. It will be left unused during this installation, but it will be configured to work in bond mode with “eno1”.

IMPORTANT: In the whole series of posts, I am using the passwords as they appear: RABBIT_PASS, NOVA_PASS, NOVADB_PASS, etc. You should change it, according to a secure password policy, but they are set as-is to ease understanding the installation. Anyway, most of them will be fine if you have an isolated network and the service listen only in the management network (e.g. mysql will only be configured to listen in the management interface).

Some words on the Openstack network (concepts)

The basic installation of Openstack considers two networks: the provider network and the management network. The provider network means “the network that is attached to the provider” i.e. the network where the VMs can have publicly routable IP addresses.  On the other side, the management network is a private network that is (probably) isolated from the provider one. The computers in that network have private IP addresses (e.g. 192.168.1.1/16, 10.0.0.0/8, 172.16.0.0/12).

In the basic deployment of Openstack, it considers that the controller node do not need to have a routable IP address. Instead, it can be accessed by the admin by the management network. That is why the “eno3” interface has not an IP address.

In the Openstack architecture, horizon is a separated piece, so horizon is the one that will need a routable IP address. As I want to install horizon also in the controller, I need a routable IP address and that is why I put a publicly routable IP address in “eno4” (158.42.1.1).

In my case, I had a spare network interface (eno4) but if you do not have one of them, you can create a bridge and add your “interface connected to the provider network” (i.e. “eno3”) to that bridge, and then add a publicly routable IP address to the bridge.

IMPORTANT: this is not part of my installation. Although it may be part of your installation.

brctl addbr br-publicbrctl addif br-public eno3ip link set dev br-public upip addr add 158.42.1.1/16 dev br-public

Configuring the network

One of the first things that we need to set up is to configure the network for the different servers.

Ubuntu 18.04 has moved to netplan, but at the time of writting this text, I have not found any mechanism to get an interface UP and not providing an IP address for it, using netplan. Moreover, when trying to use ifupdown, netplan is not totally disabled and interferes with options such as dns-nameservers for the static addresses. At the end I need to install ifupdown and make a mix of configuration using both netplan and ifupdown.

It is very important to disable IPv6 for any of the servers, because if not, you will probably face a problem when using the public IP addresses. You can read more in this link.

To disable IPv6, we need to execute the following lines in all the servers (as root):

# sysctl -w net.ipv6.conf.all.disable_ipv6=1
# sysctl -w net.ipv6.conf.default.disable_ipv6=1
# cat >> /etc/default/grub << EOT
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1"
GRUB_CMDLINE_LINUX="ipv6.disable=1"
EOT
# update-grub

We’ll disable IPv6 for the current session, and persist it by disabling at boot time. If you have customized your grub, you should check the options that we are setting.

Configuring the network in “horsemen”

You need to install ifupdown, to have an anonymous interface connected to the internet, to include neutron-related services

# apt update && apt install -y ifupdown

Edit the file /etc/network/interfaces and adjust it with a content like the next one:

auto eno3
iface eno3 inet manual
up ip link set dev $IFACE up
down ip link set dev $IFACE down

Now edit the file /etc/netplan/50-cloud-init.yaml to set the private IP address:

network:
  ethernets:
    eno4:
      dhcp4: true
    eno1:
      addresses:
        - 192.168.1.240/24
      gateway4: 192.168.1.221
      nameservers:
        addresses: [ 192.168.1.220, 8.8.8.8 ]
  version: 2

When you save these settings, you can issue the next commands:

# netplan generate
# netplan apply

Now we’ll edit the file /etc/hosts, and will add the addresses of each server. My file is the next one:

127.0.0.1       localhost.localdomain   localhost
192.168.1.240   horsemen controller
192.168.1.241   fh01
192.168.1.242   fh02

I have removed the entry 127.0.1.1 because I read thet it may interfer. And I also removed all the crap about IPv6 because I disabled it.

Configuring the network in “fh01” and “fh02”

Here is the short version of the configuration of fh01:

# apt install -y ifupdown
# cat >> /etc/network/interfaces << EOT
auto enp1s0f0
iface enp1s0f0 inet manual
up ip link set dev $IFACE up
down ip link set dev $IFACE down
EOT

Here is my file /etc/netplan/50-cloud-init.yaml for fh01:

network:
  ethernets:
    enp1s0f1:
      addresses:
        - 192.168.1.241/24
      gateway4: 192.168.1.221
    nameservers:
    addresses: [ 192.168.1.220, 8.8.8.8 ]
  version: 2

Here is the file /etc/hosts for fh01:

127.0.0.1 localhost.localdomain localhost
192.168.1.240 horsemen controller
192.168.1.241 fh01
192.168.1.242 fh02

You can export this configuration to fh02 by adjusting the IP address in the /etc/netplan/50-cloud-init.yaml file.

Reboot and test

Now it is a good moment to reboot your systems and test that the network is properly configured. If it is not, please make sure that it is working because. Otherwise the next steps will have no sense.

From each of the hosts you should be able to ping to the outern world and ping between the hosts. These are the tests from horsemen, but you need to be able to repeat them from each of the servers.

root@horsemen# ping -c 2 www.google.es
PING www.google.es (172.217.17.3) 56(84) bytes of data.
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=1 ttl=54 time=7.26 ms
64 bytes from mad07s09-in-f3.1e100.net (172.217.17.3): icmp_seq=2 ttl=54 time=7.26 ms

--- www.google.es ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 7.262/7.264/7.266/0.002 ms
root@horsemen# ping -c 2 fh01
PING fh01 (192.168.1.241) 56(84) bytes of data.
64 bytes from fh01 (192.168.1.241): icmp_seq=1 ttl=64 time=0.180 ms
64 bytes from fh01 (192.168.1.241): icmp_seq=2 ttl=64 time=0.113 ms

--- fh01 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.113/0.146/0.180/0.035 ms
root@horsemen# ping -c 2 fh02
PING fh02 (192.168.1.242) 56(84) bytes of data.
64 bytes from fh02 (192.168.1.242): icmp_seq=1 ttl=64 time=0.223 ms
64 bytes from fh02 (192.168.1.242): icmp_seq=2 ttl=64 time=0.188 ms

--- fh02 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1027ms
rtt min/avg/max/mdev = 0.188/0.205/0.223/0.022 ms

Prerrequisites for Openstack in the server (horsemen)

Remember: for simplicity, I will use obvious passwords like SERVICE_PASS or SERVICEDB_PASS (e.g. RABBIT_PASS). You should change these passwords, although most of them will be fine if you have an isolated network and the service listen only in the management network.

First of all, we are installing the prerrequisites. We will start with the NTP server, that will keep the hour synchronized between the controller (horsemen) and the virtualization servers (fh01 and fh02). We’ll instal chrony (recommended in the Openstack configuration) and allow any computer in our private network to connect to this new NTP server:

# apt install -y chrony
cat >> /etc/chrony/chrony.conf << EOT
allow 192.168.1.0/24
EOT
# service chrony restart

Now we are installing and configuring the database server (we’ll use mariadb as it is used in the basic installation):

# apt install mariadb-server python-pymysql
# cat > /etc/mysql/mariadb.conf.d/99-openstack.cnf << EOT
[mysqld]
bind-address = 192.168.1.240

default-storage-engine = innodb
innodb_file_per_table = on
max_connections = 4096
collation-server = utf8_general_ci
character-set-server = utf8
EOT
#service mysql restart

Now we are installing rabbitmq, that will be used to orchestrate message interchange between services (please change RABBIT_PASS).

# apt install rabbitmq-server
# rabbitmqctl add_user openstack "RABBIT_PASS"
# rabbitmqctl set_permissions openstack ".*" ".*" ".*"

At this moment, we have to install memcached and configure it to listen in the management interface:

# apt install memcached

# echo "-l 192.168.1.240" >> /etc/memcached.conf

# service memcached restart

Finally we need to install etcd and configure it to be accessible by openstack

# apt install etcd
# cat >> /etc/default/etcd << EOT
ETCD_NAME="controller"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-01"
ETCD_INITIAL_CLUSTER="controller=http://192.168.1.240:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.240:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.240:2379"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.240:2379"
EOT
# systemctl enable etcd
# systemctl start etcd

Now we are ready to with the installation of the OpenStack Rocky packages… (continue to part 2)

How to (securely) contain users using Docker containers

Docker container have proved to be very useful to deliver applications. They enable to pack all the libraries and dependencies needed by an application, and to run it in any system. One of the most drawbacks argued by Docker competitors is that the Docker daemon runs as root and it may introduce security threats.

I have searched for the security problems of Docker (e.g. sysdigblackhat conferenceCVEs, etc.) and I could only find privilege escalation by running privileged containers (–privileged), files that are written using root permissions, using the communication socket, using block devices, poisoned images, etc. But all of these problems are related to letting the users start their own containers.

So I think that Docker can be used by sysadmins to provide a different or contained environment to the users. E.g. having a CentOS 7 front-end, but letting some users to run an Ubuntu 16.04 environment. This is why this time I learned…

How to (securely) contain users using Docker containers

TL;DR

You can find the results of this tests in this repo: https://github.com/grycap/dosh

The repository contains DoSH (which stands for Docker SHell), which is a development to use Docker containers to run the shell of the users in your Linux system. It is an in-progress project that aims at provide a configurable and secure mechanism to make that when a user logs-in a Linux system, a customized (or standard) container will be created for him. This will enable to limit the resources that the user is able to use, the applications, etc. but also provide custom linux flavour for each user or group of users (i.e. it will coexist users that have CentOS 7 with Ubuntu 16.04 in the same server).

The Docker SHell

In a multi-user system it would be nice to offer a feature like providing different flavours of Linux, depending on the user. Or even including a “jailed” system for some specific users.

This could be achieved in a very easy way. You just need to create a script like the next one

root@onefront00:~# cat > /bin/dosh <<\EOF
docker run --rm -it alpine ash
EOF
root@onefront00:~# chmod +x /bin/dosh 
root@onefront00:~# echo "/bin/dosh" >> /etc/shells

And now you can change the sell of one user in /etc/passwd

myuser:x:9870:9870::/home/myuser:/bin/dosh

And you simply have to allow myuser to run docker containers (e.g. in Ubuntu, by adding the user to the “docker” group).

Now we have that when “myuser” logs in the system, he will be inside a container with the Alpine flavour:

alpine-dockershell-1

This is a simple solution that enables the user to have a specific linux distribution… but also your specific linux environment with special applications, libraries, etc.

But the user has not access to its home nor other files that will be interesting to give him the appearance of being in the real system. So we could just map his home folder (and other folders that we wanted to have inside the container; e.g. /tmp). A modified version of /bin/dosh will be the next one:

#!/bin/bash
username="$(whoami)"
docker run --rm -v /home/$username:/home/$username -v /tmp:/tmp -it alpine ash

But if we log in as myuser the result is that the user that logs in is… root. And the things that he does is as root.

alpine-dockershell-2

We to run the container as the user and not as root. An updated version of the script is the next:

#!/bin/bash
username="$(whoami)"
uid="$(id -u $username)"
gid="$(id -g $username)"
docker run --rm -u $uid:$gid -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -it alpine ash

If myuser now logs in, the container has the permissions of this user

alpine-dockershell-3

We can double-check it by checking the running processes of the container

The problem now is that the name of the user (and the groups) are not properly resolved inside the container.

alpine-dockershell-6

This is because the /etc/passwd and the /etc/group files are included in the container, and they do not know about the users or groups in the system. As we want to resemble the system in the container, we can share a readonly copy of /etc/passwd and /etc/group by modifying the /bin/dosh script:

#!/bin/bash
username="$(whoami)"
uid="$(id -u $username)"
gid="$(id -g $username)"
docker run --rm -u $uid:$gid -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -it alpine ash

And now the container has the permissions of the user and the username is resolved. So the user can access the resources in the filesystem in the same conditions that if he was accessing the hosting system.

alpine-dockershell-7

Now we should add the mappings for the folders to which the user has to have permissions to access (e.g. scratch, /opt, etc.).

Using this script as-is, the user will have different environment for each of the different sessions that he starts. That means that the processes will not be shared between different sessions.

But we can create a more ellaborated script to start containers using different Docker images depending on the user or on the group to which the user belongs. Or even to create pseudo-persistent containers that start when the user logs-in and stops when the user leaves (to allow multiple ttys for the same environment).

An example of this kind of script will be the next one:

#!/bin/bash

username="$(whoami)"
uid="$(id -u $username)"
gid="$(id -g $username)"

CONTAINERNAME="container-${username}"
CONTAINERIMAGE="alpine"
CMD="ash"

case "$username" in
 myuser)
 CONTAINERIMAGE="ubuntu:16.04"
 CMD="/bin/bash";;
esac

RUNNING="$(docker inspect -f "{{.State.Running}}" "$CONTAINERNAME" 2> /dev/null)"
if [ $? -ne 0 ]; then
 docker run -h "$(hostname)" -u $uid:$gid -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -id --name "$CONTAINERNAME" "$CONTAINERIMAGE" "$CMD" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
else
 if [ "$RUNNING" == "false" ]; then
 docker start "$CONTAINERNAME" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
 fi
fi
docker exec -it "$CONTAINERNAME" "$CMD"

Using this script we start the user containers on demand and their processes are kept between log-ins. Moreover, the log-in will fail in case that the container fails to start.

In the event that the system is powered off, the container will be powered off although its contents are kept for future log-ins (the container will be restarted from the stop state).

The development of Docker SHell continues in this repository: https://github.com/grycap/dosh

Security concerns

The main problem of Docker related to security is that the daemon is running as root. So if I am able to run containers, I am able to run something like this:

$ docker run --privileged alpine ash -c 'echo 1 > /proc/sys/kernel/sysrq; echo o > /proc/sysrq-trigger'

And the host will be powered off as a regular user. Or simply…

$ docker run --rm -v /etc:/etc -it alpine ash
/ # adduser mynewroot -G root
...
/ # exit

And once you exit the container, you will have a new root user in the physical host.

This happens because the user inside the container is “root” that has UID=0, and it is root because the Docker daemon is root with UID=0.

We could change this behaviour by shifting the user namespace with the flag –userns-remap and the subuids to make that the Docker daemon does not run as UID=0, but this will also limit the features of Docker for the sysadmin. The first consequence is that that the sysadmin will not be able to run Docker containers as root (nor privileged containers). If this is acceptable for your system, this will probably be the best solution for you as it limits the possible security threats.

If you are not experienced with the configuration of Docker or you simply do not want (or do not know how) to use the –userns-remap, you can still use DoSH.

On linux capabilities

If you add the flag --cap-grop=all (or selective cap-drop) to the sequence of running the Docker container, you can get an even more secure container that will never get some linux capabilities (e.g. to mount a device). You can learn more on capabilities in the linux manpage, but we can easily verify the capabilities…

We will run a process using inside the container, using the flag –cap-drop=all:

$ docker run --rm --cap-drop=all -u 1001:1001 -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -v /home/myuser:/home/myuser -v /tmp:/tmp -w /home/myuser -it alpine sleep 10000

Now we can check the capabilities of such process

capabilities-zero.png

We should check the effective capabilities (CapEff) but also the upper bound of the capabilities (CapBnd) which determines which capabilities could the process acquire (e.g. using sudo or executing a suid application). We can see that boths capabilities are zero, and that means that the process cannot get any capability.

Take into account that using  –capdrop=all will make that commands such as ping do not work because it is an application that needs specific capabilities (in the case of ping, it needs cap_net_raw, and this is why it has suid permissions).

capabilities-capdrop

Dropping capabilities when spawning the container will make that the sleep command inside a container is even more secure than the regular one. You can check it by simply repeating the same procedure but not using the containers. In such case, if you inspect the capabilities, you will find the next thing:

capabilities-all.png

The effective capabilities is zero, but the CapBnd field shows that the user could escalate up to get any of the capabilities in a buggy application.

Executing the docker commands by non-root users

The actual problem is that the user needs to be allowed to use Docker to spawn the DoSH container, and you do not want to allow the user to run arbitraty docker commands.

We can consider that the usage of Docker is secure if the containers are ran under the credentials of regular users, and the devices and other critical resources that are attached to the container are used under these credentials. So users can be allowed to run Docker containers if they are forced to include the flat -u <uid>:<gid> and the rest of the commandline is controlled.

The solution is as easy as installing sudo (which is shipped in the default distribution of Ubuntu but also is an standard package almost in any distribution) and allow users to run as sudo only a specific command that execute the docker commands, but do not allow these users to modify these commands.

Once installed sudo, we can create the file /etc/sudoers.d/dosh

root@onefront00:~# cat > /etc/sudoers.d/dosh <<\EOF
> ALL ALL=NOPASSWD: /bin/shell2docker
> EOF
root@onefront00:~# chmod 440 /etc/sudoers.d/dosh

Now we must move the previous /bin/dosh script to /bin/shell2docker and then we can create the script /bin/dosh with the following content:

root@onefront00:~# mv /bin/dosh /bin/shell2docker
root@onefront00:~# cat > /bin/dosh <<\EOF
#!/bin/bash
sudo /bin/shell2docker
EOF
root@onefront00:~# chmod +x /bin/dosh

And finally, we will remove the ability to run docker containers to the user (e.g. in Ubuntu, by removing him from the “docker” group).

If you try to log-in as the user, you will notice that now we have the problem that the user that runs the script is “root” and then the container will be run as “root”. But we can modify the script to detect whether the script has be ran as sudo or as a regular user and then catch the appropriate username. The updated script will be the next:

#!/bin/bash
if [ $SUDO_USER ]; then username=$SUDO_USER; else username="$(whoami)"; fi
uid="$(id -u $username)"
gid="$(id -g $username)"

CONTAINERNAME="container-${username}"
CONTAINERIMAGE="alpine"
CMD="ash"

case "$username" in
 myuser)
 CONTAINERIMAGE="ubuntu:16.04"
 CMD="/bin/bash";;
esac

RUNNING="$(docker inspect -f "{{.State.Running}}" "$CONTAINERNAME" 2> /dev/null)"
if [ $? -ne 0 ]; then
 docker run -h "$(hostname)" -u $uid:$gid -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -id --name "$CONTAINERNAME" "$CONTAINERIMAGE" "$CMD" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
else
 if [ "$RUNNING" == "false" ]; then
 docker start "$CONTAINERNAME" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
 fi
fi
docker exec -it "$CONTAINERNAME" "$CMD"

Now any user can execute the command that create the Docker container as root (using sudo), but the user cannot run arbitraty Docker commands. So all the security is now again in the side of the sysadmin that must create “secure” containers.

This is an in-progress work that will continue in this repository: https://github.com/grycap/dosh

 

 

 

 

How to install a cluster with NIS and NFS in Ubuntu 16.04

I am used to create computing clusters. A cluster consists of a set of computers that work together to solve one task. In a cluster you usually have an interface to access to the cluster, a network that interconnect the nodes and a set of tools to manage the cluster. The interface to access to the cluster usually is a node named front-end to which the users can SSH. The other nodes are usually named the working-nodes. Another common component is a shared filesystem to ease simple communication between the WN.

A very common set-up is to install a NIS server in the front-end so that the users can access to the WN (i.e. using SSH), getting the same credentials than in the front-end. NIS is still useful because is very simple and it integrates very well with NFS, that is commonly used to share a file system.

It was easy to install all of this, but it is also a bit tricky (in special, NIS), and so this time I had to re-learn…

How to install a cluster with NIS and NFS in Ubuntu 16.04

We start from 3 nodes that have a fresh installation of Ubuntu 16.04. These nodes are in the network 10.0.0.1/24. Their names are hpcmd00 (10.0.0.35), hpcmd01 (10.0.0.36) and hpcmd02 (10.0.0.37). In this example, hpcmd00 will be the front-end node and the others will act as the working nodes.

Preparation

First of all we are updating ubuntu in all the nodes:

root@hpcmd00:~# apt-get update && apt-get -y dist-upgrade

Installing and configuring NIS

Install NIS in the Server

Now that the system is up to date, we are installing the NIS server in hpcmd00. It is very simple:

root@hpcmd00:~# apt-get install -y rpcbind nis

During the installation, we will be asked for the name of the domain (as in the next picture):

nisWe have selected the name hpcmd.nis for our domain. It will be kept in the file /etc/defaultdomain. Anyway we can change the name of the domain at any time by executing the next command:

root@hpcmd00:~# dpkg-reconfigure nis

And we will be prompted again for the name of the domain.

Now we need to adjust some parameters of the NIS server, that consist in editing the files /etc/default/nis and /etc/ypserv.securenets. In the first case we have to set the variable NISSERVER to the value “master”. In the second file (ypserv.securents) we are setting which IP addresses are allowed to access to the NIS service. In our case, we are allowing all the nodes in the subnet 10.0.0.0/24.

root@hpcmd00:~# sed -i 's/NISSERVER=.*$/NISSERVER=master/' /etc/default/nis
root@hpcmd00:~# sed 's/^\(0.0.0.0[\t ].*\)$/#\1/' -i /etc/ypserv.securenets
root@hpcmd00:~# echo "255.255.255.0 10.0.0.0" >> /etc/ypserv.securenets

Now we are including the name of the server in the /etc/hosts file, so that the server is able to solve its IP address, and then we are initializing the NIS service. As we have only one master server, we are including its name and let the initialization to proceed.

root@hpcmd00:~# echo "10.0.0.35 hpcmd00" >> /etc/hosts
root@hpcmd00:~# /usr/lib/yp/ypinit -m
At this point, we have to construct a list of the hosts which will run NIS
servers. hpcmd00 is in the list of NIS server hosts. Please continue to add
the names for the other hosts, one per line. When you are done with the
list, type a <control D>.
 next host to add: hpcmd00
 next host to add: 
The current list of NIS servers looks like this:
hpcmd00
Is this correct? [y/n: y] y
We need a few minutes to build the databases...
Building /var/yp/hpcmd.nis/ypservers...
Running /var/yp/Makefile...
make[1]: se entra en el directorio '/var/yp/hpcmd.nis'
Updating passwd.byname...
...
Updating shadow.byname...
make[1]: se sale del directorio '/var/yp/hpcmd.nis'

hpcmd00 has been set up as a NIS master server.

Now you can run ypinit -s hpcmd00 on all slave server.

Finally we are exporting the users of our system by issuing the next command:

root@hpcmd00:~# make -C /var/yp/

Take into account that everytime that you create a new user in the front-end, you need to export the users by issuing the make -C /var/yp command. So it is advisable to create a cron task that runs that command, to make it sure that the users are exported.

root@hpcmd00:~# cat > /etc/cron.hourly/ypexport <<\EOT
#!/bin/sh
make -C /var/yp
EOT
root@hpcmd00:~# chmod +x /etc/cron.hourly/ypexport

The users in NIS

When issuing the command make…, you are exporting the users that have an identifier of 1000 and above. If you want to change it, you can adjust the parameters in the file /var/yp/Makefile.

In particular, you can change the variables MINUID and MINGID to match your needs.

In the default configuration, the users with id 1000 and above are exported because the user 1000 is the first user that is created in the system.

Install the NIS clients

Now that we have installed the NIS server, we can proceed to install the NIS clients. In this example we are installing hpcmd01, but it will be the same procedure for all the nodes.

First install NIS using the next command:

root@hpcmd01:~# apt-get install -y rpcbind nis

As it occurred in the server, you will be prompted for the name of the domain. In our case, it is hpcmd.nis because we set that name in the server.

root@hpcmd01:~# echo "domain hpcmd.nis server hpcmd00" >> /etc/yp.conf 
root@hpcmd01:~# sed -i 's/compat$/compat nis/g;s/dns$/dns nis/g' /etc/nsswitch.conf 
root@hpcmd01:~# systemctl restart nis

Fix the rpcbind bug in Ubuntu 16.04

At this time the NIS services (both in server and clients) are ready to be used, but… WARNING because the rpcbind package needed by NIS has a bug in Ubuntu and as you reboot any of your system, rpc is dead and so the NIS server will not work. You can check it by issuing the next command:

root@hpcmd00:~# systemctl status rpcbind
● rpcbind.service - RPC bind portmap service
 Loaded: loaded (/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled)
 Drop-In: /run/systemd/generator/rpcbind.service.d
 └─50-rpcbind-$portmap.conf
 Active: inactive (dead)

Here you can see that it is inactive. And if you start it by hand, it will be properly running:

root@hpcmd00:~# systemctl start rpcbind
root@hpcmd00:~# systemctl status rpcbind
● rpcbind.service - RPC bind portmap service
 Loaded: loaded (/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled)
 Drop-In: /run/systemd/generator/rpcbind.service.d
 └─50-rpcbind-$portmap.conf
 Active: active (running) since vie 2017-05-12 12:57:00 CEST; 1s ago
 Main PID: 1212 (rpcbind)
 Tasks: 1
 Memory: 684.0K
 CPU: 8ms
 CGroup: /system.slice/rpcbind.service
 └─1212 /sbin/rpcbind -f -w
may 12 12:57:00 hpcmd00 systemd[1]: Starting RPC bind portmap service...
may 12 12:57:00 hpcmd00 rpcbind[1212]: rpcbind: xdr_/run/rpcbind/rpcbind.xdr: failed
may 12 12:57:00 hpcmd00 rpcbind[1212]: rpcbind: xdr_/run/rpcbind/portmap.xdr: failed
may 12 12:57:00 hpcmd00 systemd[1]: Started RPC bind portmap service.

There are some patches, and it seems that it will be solved in the new versions. But for now, we are including a very simple workaround that consists in adding the next lines to the file /etc/rc.local, just before the “exit 0” line:

systemctl restart rpcbind
systemctl restart nis

Now if you reboot your system, it will be properly running the rpcbind service.

WARNING: this needs to be done in all the nodes.

Installing and configuring NFS

We are configuring NFS in a very straightforward way. If you need more security or other features, you should deep into NFS configuration options to adapt it to your deployment.

In particular, we are sharing the /home folder in hpcmd00 to be available for the WN. Then, the users will have their files available at each node. I followed the instructions at this blog post.

Sharing /home at front-end

In order to install NFS in the server, you just need to issue the next command

root@hpcmd00:~# apt-get install -y nfs-kernel-server

And to share the /home folder, you just need to add a line to the /etc/exports file

root@hpcmd00:~# cat >> /etc/exports << \EOF
/home hpcmd*(rw,sync,no_root_squash,no_subtree_check)
EOF

There are a lot of options to share a folder using NFS, but we are just using some of them that are common for a /home folder. Take into account that you can restrict the hosts to which you can share the folder using their names (that is our case: hpcmdXXXX) or using IP addresses. It is noticeable that you can use wildcards such as “*”.

Finally you need to restart the NFS daemon, and you will be able to verify that the exports are ready.

root@hpcmd00:~# service nfs-kernel-server restart
root@hpcmd00:~# showmount -e localhost
Export list for localhost:
/home hpcmd*

Mount the /home folder in the WN

In order to be able to use NFS endpoints, you just need to run the next command on each node:

root@hpcmd01:~# apt-get install -y nfs-common

Now you will be able to list the folders shared at the server

root@hpcmd01:~# showmount -e hpcmd00
Export list for hpcmd00:
/home hpcmd*

At this moment it is possible to mount the /home folder just issuing a command like

root@hpcmd01:~# mount -t nfs hpcmd00:/home /home

But we’d prefer to add a line to the /etc/fstab file. Using this approach, the mount will be available at boot time. In order to make it, we’ll add the proper line:

root@hpcmd01:~# cat >> /etc/fstab << \EOT
hpcmd00:/home /home nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0
EOT

Now you can also issue the following command to start using your share without the need of rebooting:

root@hpcmd01:~# mount /home/

Verification

At the hpcmd00 node you can create a user, and verify that the home folder has been created:

root@hpcmd00:~# adduser testuser
Añadiendo el usuario `testuser' ...
Añadiendo el nuevo grupo `testuser' (1002) ...
Añadiendo el nuevo usuario `testuser' (1002) con grupo `testuser' ...
...
¿Es correcta la información? [S/n] S
root@hpcmd00:~# ls -l /home/
total 4
drwxr-xr-x 2 testuser testuser 4096 may 15 10:06 testuser

If you ssh to the internal nodes, it will fail (the user will not be available), because the user has not been exported:

root@hpcmd00:~# ssh testuser@hpcmd01
testuser@hpcmd01's password: 
Permission denied, please try again.

But the home folder for that user is already available in these nodes (because the folder is shared using NFS).

Once we export the users at hpcmd00 the user will be available in the domain and we will be able to ssh to the WN using that user:

root@hpcmd00:~# make -C /var/yp/
make: se entra en el directorio '/var/yp'
make[1]: se entra en el directorio '/var/yp/hpcmd.nis'
Updating passwd.byname...
Updating passwd.byuid...
Updating group.byname...
Updating group.bygid...
Updating netid.byname...
make[1]: se sale del directorio '/var/yp/hpcmd.nis'
make: se sale del directorio '/var/yp'
root@hpcmd00:~# ssh testuser@hpcmd01
testuser@hpcmd01's password: 
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-77-generic x86_64)

testuser@hpcmd01:~$ pwd
/home/testuser