How to (securely) contain users using Docker containers

Docker container have proved to be very useful to deliver applications. They enable to pack all the libraries and dependencies needed by an application, and to run it in any system. One of the most drawbacks argued by Docker competitors is that the Docker daemon runs as root and it may introduce security threats.

I have searched for the security problems of Docker (e.g. sysdigblackhat conferenceCVEs, etc.) and I could only find privilege escalation by running privileged containers (–privileged), files that are written using root permissions, using the communication socket, using block devices, poisoned images, etc. But all of these problems are related to letting the users start their own containers.

So I think that Docker can be used by sysadmins to provide a different or contained environment to the users. E.g. having a CentOS 7 front-end, but letting some users to run an Ubuntu 16.04 environment. This is why this time I learned…

How to (securely) contain users using Docker containers

TL;DR

You can find the results of this tests in this repo: https://github.com/grycap/dosh

The repository contains DoSH (which stands for Docker SHell), which is a development to use Docker containers to run the shell of the users in your Linux system. It is an in-progress project that aims at provide a configurable and secure mechanism to make that when a user logs-in a Linux system, a customized (or standard) container will be created for him. This will enable to limit the resources that the user is able to use, the applications, etc. but also provide custom linux flavour for each user or group of users (i.e. it will coexist users that have CentOS 7 with Ubuntu 16.04 in the same server).

The Docker SHell

In a multi-user system it would be nice to offer a feature like providing different flavours of Linux, depending on the user. Or even including a “jailed” system for some specific users.

This could be achieved in a very easy way. You just need to create a script like the next one

root@onefront00:~# cat > /bin/dosh <<\EOF
docker run --rm -it alpine ash
EOF
root@onefront00:~# chmod +x /bin/dosh 
root@onefront00:~# echo "/bin/dosh" >> /etc/shells

And now you can change the sell of one user in /etc/passwd

myuser:x:9870:9870::/home/myuser:/bin/dosh

And you simply have to allow myuser to run docker containers (e.g. in Ubuntu, by adding the user to the “docker” group).

Now we have that when “myuser” logs in the system, he will be inside a container with the Alpine flavour:

alpine-dockershell-1

This is a simple solution that enables the user to have a specific linux distribution… but also your specific linux environment with special applications, libraries, etc.

But the user has not access to its home nor other files that will be interesting to give him the appearance of being in the real system. So we could just map his home folder (and other folders that we wanted to have inside the container; e.g. /tmp). A modified version of /bin/dosh will be the next one:

#!/bin/bash
username="$(whoami)"
docker run --rm -v /home/$username:/home/$username -v /tmp:/tmp -it alpine ash

But if we log in as myuser the result is that the user that logs in is… root. And the things that he does is as root.

alpine-dockershell-2

We to run the container as the user and not as root. An updated version of the script is the next:

#!/bin/bash
username="$(whoami)"
uid="$(id -u $username)"
gid="$(id -g $username)"
docker run --rm -u $uid:$gid -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -it alpine ash

If myuser now logs in, the container has the permissions of this user

alpine-dockershell-3

We can double-check it by checking the running processes of the container

The problem now is that the name of the user (and the groups) are not properly resolved inside the container.

alpine-dockershell-6

This is because the /etc/passwd and the /etc/group files are included in the container, and they do not know about the users or groups in the system. As we want to resemble the system in the container, we can share a readonly copy of /etc/passwd and /etc/group by modifying the /bin/dosh script:

#!/bin/bash
username="$(whoami)"
uid="$(id -u $username)"
gid="$(id -g $username)"
docker run --rm -u $uid:$gid -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -it alpine ash

And now the container has the permissions of the user and the username is resolved. So the user can access the resources in the filesystem in the same conditions that if he was accessing the hosting system.

alpine-dockershell-7

Now we should add the mappings for the folders to which the user has to have permissions to access (e.g. scratch, /opt, etc.).

Using this script as-is, the user will have different environment for each of the different sessions that he starts. That means that the processes will not be shared between different sessions.

But we can create a more ellaborated script to start containers using different Docker images depending on the user or on the group to which the user belongs. Or even to create pseudo-persistent containers that start when the user logs-in and stops when the user leaves (to allow multiple ttys for the same environment).

An example of this kind of script will be the next one:

#!/bin/bash

username="$(whoami)"
uid="$(id -u $username)"
gid="$(id -g $username)"

CONTAINERNAME="container-${username}"
CONTAINERIMAGE="alpine"
CMD="ash"

case "$username" in
 myuser)
 CONTAINERIMAGE="ubuntu:16.04"
 CMD="/bin/bash";;
esac

RUNNING="$(docker inspect -f "{{.State.Running}}" "$CONTAINERNAME" 2> /dev/null)"
if [ $? -ne 0 ]; then
 docker run -h "$(hostname)" -u $uid:$gid -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -id --name "$CONTAINERNAME" "$CONTAINERIMAGE" "$CMD" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
else
 if [ "$RUNNING" == "false" ]; then
 docker start "$CONTAINERNAME" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
 fi
fi
docker exec -it "$CONTAINERNAME" "$CMD"

Using this script we start the user containers on demand and their processes are kept between log-ins. Moreover, the log-in will fail in case that the container fails to start.

In the event that the system is powered off, the container will be powered off although its contents are kept for future log-ins (the container will be restarted from the stop state).

The development of Docker SHell continues in this repository: https://github.com/grycap/dosh

Security concerns

The main problem of Docker related to security is that the daemon is running as root. So if I am able to run containers, I am able to run something like this:

$ docker run --privileged alpine ash -c 'echo 1 > /proc/sys/kernel/sysrq; echo o > /proc/sysrq-trigger'

And the host will be powered off as a regular user. Or simply…

$ docker run --rm -v /etc:/etc -it alpine ash
/ # adduser mynewroot -G root
...
/ # exit

And once you exit the container, you will have a new root user in the physical host.

This happens because the user inside the container is “root” that has UID=0, and it is root because the Docker daemon is root with UID=0.

We could change this behaviour by shifting the user namespace with the flag –userns-remap and the subuids to make that the Docker daemon does not run as UID=0, but this will also limit the features of Docker for the sysadmin. The first consequence is that that the sysadmin will not be able to run Docker containers as root (nor privileged containers). If this is acceptable for your system, this will probably be the best solution for you as it limits the possible security threats.

If you are not experienced with the configuration of Docker or you simply do not want (or do not know how) to use the –userns-remap, you can still use DoSH.

Executing the docker commands by non-root users

The actual problem is that the user needs to be allowed to use Docker to spawn the DoSH container, and you do not want to allow the user to run arbitraty docker commands.

We can consider that the usage of Docker is secure if the containers are ran under the credentials of regular users, and the devices and other critical resources that are attached to the container are used under these credentials. So users can be allowed to run Docker containers if they are forced to include the flat -u <uid>:<gid> and the rest of the commandline is controlled.

The solution is as easy as installing sudo (which is shipped in the default distribution of Ubuntu but also is an standard package almost in any distribution) and allow users to run as sudo only a specific command that execute the docker commands, but do not allow these users to modify these commands.

Once installed sudo, we can create the file /etc/sudoers.d/dosh

root@onefront00:~# cat > /etc/sudoers.d/dosh <<\EOF
> ALL ALL=NOPASSWD: /bin/shell2docker
> EOF
root@onefront00:~# chmod 440 /etc/sudoers.d/dosh

Now we must move the previous /bin/dosh script to /bin/shell2docker and then we can create the script /bin/dosh with the following content:

root@onefront00:~# mv /bin/dosh /bin/shell2docker
root@onefront00:~# cat > /bin/dosh <<\EOF
#!/bin/bash
sudo /bin/shell2docker
EOF
root@onefront00:~# chmod +x /bin/dosh

And finally, we will remove the ability to run docker containers to the user (e.g. in Ubuntu, by removing him from the “docker” group).

If you try to log-in as the user, you will notice that now we have the problem that the user that runs the script is “root” and then the container will be run as “root”. But we can modify the script to detect whether the script has be ran as sudo or as a regular user and then catch the appropriate username. The updated script will be the next:

#!/bin/bash
if [ $SUDO_USER ]; then username=$SUDO_USER; else username="$(whoami)"; fi
uid="$(id -u $username)"
gid="$(id -g $username)"

CONTAINERNAME="container-${username}"
CONTAINERIMAGE="alpine"
CMD="ash"

case "$username" in
 myuser)
 CONTAINERIMAGE="ubuntu:16.04"
 CMD="/bin/bash";;
esac

RUNNING="$(docker inspect -f "{{.State.Running}}" "$CONTAINERNAME" 2> /dev/null)"
if [ $? -ne 0 ]; then
 docker run -h "$(hostname)" -u $uid:$gid -v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro -v /home/$username:/home/$username -v /tmp:/tmp -w /home/$username -id --name "$CONTAINERNAME" "$CONTAINERIMAGE" "$CMD" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
else
 if [ "$RUNNING" == "false" ]; then
 docker start "$CONTAINERNAME" > /dev/null
 if [ $? -ne 0 ]; then
 exit 1
 fi
 fi
fi
docker exec -it "$CONTAINERNAME" "$CMD"

Now any user can execute the command that create the Docker container as root (using sudo), but the user cannot run arbitraty Docker commands. So all the security is now again in the side of the sysadmin that must create “secure” containers.

This is an in-progress work that will continue in this repository: https://github.com/grycap/dosh

 

 

 

 

Advertisements

How to deal with operations and parameters in bash script like a master

This is a simple post, based in a previous post in which I learned how to work with parameters in bash, to have option (e.g. -h, –help, etc.), combined options (e.g. -cf that means the same than –cool –function), etc. That post consisted in pre-processing the commandline to get the proper flags and dealing with them.

Now I want to implement bash scripts that work in the form of

$ ./my_script operation --flag-op

And this is why I extended the previous post to learn

How to deal with operations and parameters in bash script like a master.

In this case I want to have scripts that accept the following syntax:

$ ./my_script operation -p <parameter> -ob --file conf.file

And even sintax like the next one:

$ ./my_script --global-option operation -p <parameter> -ob --file conf.file

Using the preprocessing introduced in the previous post, this is very straightforward to implement, as we only need to intercept the name of the operations and continue with the options.

As a plus, in my solution I will delegate each operation to deal with its specific options. My solution is the next one:

function get() {
 while (( $# > 0 )); do
  case "$1" in
   --all|-a) ALL=1;;
   *) usage && exit 1;;
  esac
  shift
 done
 # implement_the_operation
}

n=0
while (( $# > 0 )); do
 if [ "${1:0:1}" == "-" -a "${1:1:1}" != "-" ]; then
  for f in $(echo "${1:1}" | sed 's/\(.\)/-\1 /g' ); do
   ARR[$n]="$f"
   n=$(($n+1))
  done
 else
  ARR[$n]="$1"
  n=$(($n+1))
 fi
 shift
done

n=0
COMMAND=
while [ $n -lt ${#ARR[@]} -a "$COMMAND" == "" ]; do
 PARAM="${ARR[$n]}"
 case "$PARAM" in
  get) COMMAND="$PARAM";;
  --help | -h) usage && exit 0;;
  *) usage && exit 1;;
 esac
 n=$(($n+1))
done

if [ "$COMMAND" != "" ]; then
 $COMMAND "${ARR[@]:$n}"
else
 echo "no command issued" && usage && exit 1
fi

In this script I only accept the operation (COMMAND) “get”. And it is self-serviced using a function with the same name. In order to implement more operations, it is as easy as including it in the detection and creating a function with the same name.

How to compact a QCOW2 or a VMDK file

When you create a Virtual Machine (VM), you usually have the option of use a format that reserves the whole size of the disk (e.g. RAW), or to use a format that grows according to the used space in the disk (e.g. QCOW or VMDK).

The problem is that the space actually used in the disk grows as the disk files are written, but it is not decreased as they are deleted. But if you writed a lot of files and you deleted after they were needed, you’d probably have a lot of space reserved for the VMDK file, while that space is not actually used. I wanted to reclaim that space, to move the VMs using less space, and so this time…

I learned how to compact a VMDK file (the same method applies to QCOW2)

The method is, in fact, very easy… you simply have to re-encode the file using the same output format. If you have your original-disk.vmdk file, you simply have to issue a command like this one:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

And that will make the magic (easy, isn’t it?).

But if you want to compact it more, you can claim more space from the disk before re-enconding the disk. First, I’d go to the solution and then I’ll explain it:

If the VM is a linux-based, you can boot it and create a zero-file, and once the file has exhausted the disk, delete it:

$ dd if=/dev/zero of=/tmp/zerofile.raw...$ rm /tmp/zerofile.raw

If the VM is a Windows-based, you can get the command sdelete from Microsoft website decompress it and execute the following commandline:

c:\> sdelete -z c:

Now you can power off the VM and issue the qemu-img command. You’ll get a file that correspond to only the used space in the disk:

$ qemu-img convert -O vmdk original-disk.vmdk compressed-disk.vmdk

Explanation

(Disclaimer: Please take into account that this is a simple and conceptual explanation)

If you knew about how the disks are managed, you’d probably know that when a file is deleted, it is not actually deleted from the disk. Instead, the space that it was using is marked as “ready to be used in case that it is needed”. So if a new file is created in the disk, it is possible that it uses that physical space (or not).

That is the trick from which the file recovery applications work: trying to find those “ready to be used” sectors. And that is why the “low-level format” exists: in order to “zero” the disk and to avoid that files are recovered.

When you created the /tmp/zerofile.raw file, you started to write zeros in the disk. When the physical empty space was exhausted, the disk controller started to use the “ready to be used” sectors, and the zerofile wrote zeros on them, and the zeros were written in the VMDK file.

The good thing here is that when a VMDK file is created (from any format… in our case, it is VMDK), the qemu-img application does not write those zeros in the file that contains the disk, and that is how the storage space is reclaimed.

How to deal with parameters in bash scripts like a pro

I use to develop bash scripts, and I usually have a problem with flags and parameters. I like to allow parameters like a pro: using the long flags (e.g. –flag), the reduced flags (e.g. -f), but I want to allow combinations of several flags (e.g. -fc). And so this time…

I learned how to deal with parameters in bash scripts like a pro

This time I have started to use bash arrays, that are like C arrays or python arrays, but in bash 😉 I could explain little by little my script, but I’m including here the final script (this is an extract from one of my developments: ec4docker):

CREATE=
TERMINATE=
CONFIG_FILE=
n=0
while [ $# -gt 0 ]; do
    if [ "${1:0:1}" == "-" -a "${1:1:1}" != "-" ]; then
        for f in $(echo "${1:1}" | sed 's/\(.\)/-\1 /g' ); do
            ARR[$n]="$f"
            n=$(($n+1))
        done
    else
        ARR[$n]="$1"
        n=$(($n+1))
    fi
    shift
done
n=0
while [ $n -lt ${#ARR[@]} ]; do
    case "${ARR[$n]}" in
        --create | -c)          CREATE=True;;
        --terminate | -t)       TERMINATE=True;;
        --yes | -y)             ASSUME_YES=True;;
        --config-file | -f)     n=$(($n+1))
                                [ $n -ge ${#ARR[@]} ] && usage && exit 1
                                CONFIG_FILE="${ARR[$n]}";;
        --help | -h)            usage && exit 0;;
        *)                      usage && exit 1;;
    esac
    n=$(($n+1))
done

In this way, you allow to issue commands like

$ ./myapp -ctyf config.conf

But also mix parameter styles

$ ./myapp --create -ty --config-file myapp.conf

Technical details

I like the solution, but I also like the technical details (because I am a code-freak). So I share some technical issues here:

  • The first “while” simply parses the commandline to expand the combined parameters. In fact, if searches for expressions like ‘-fct’ and splits them into a set of expressions ‘-f’, ‘-c’, ‘-t’. So if you do not want to split parameters in this way, you can substitute the first “while” by
ARR=( "$@" )
  • The second “while” is needed because we want to allow parameters that need more than one flag (e.g. -f <config file>). Any time that it is expected to have a parameter for a flag, we need to check if we have enough parameters and if not, raise an error. If you do not need any parameter with extra values, you could substitute the second while by:
for ARRVAL in "${ARR[@]}"; do
  case "$ARRVAL" in