Think twice before running a docker container in privileged mode

Mismanagement of Containers

Have you ever tried running systemctl in a docker container? If you have, then you must have come across this error “System has not been booted with systemd as init system”. The first workaround to this error that you find in your google search is to run the container with a privileged flag, and it for sure works like a charm.But Let’s run-through the negative repercussions on the container’s security when you are using this flag.

When a docker container is spawned without a privileged flag, the default entry point it takes is /bin/bash, which is also the container’s PID 1 program. However, for Systemctl to work, the PID 1 program should be /sbin/init as the init script is the executable starting the Systemd initialization system. Systemd is used to start and stop processes and get your system up and running.

Along with Systemd, the init script initializes the hardware of the host system on the docker container. The /dev folder is mounted onto the container. This is a serious threat to the host machine’s security as /dev folder, which represents various partitions on the drive of the system, is exposed to the container. To make this issue even worse, Docker containers are run as a root user by default. So, If the container is spawned from a public image and has commands that could alter or even worse delete the contents in the root file system like fdisk, sfdisk, it could stop the system from booting.

Moreover, the privileged flag gives the containers some undesirable capabilities. Capabilities are privileges granted to kernel users or kernel level programs to limit or provide access to resources on the system. Privileged containers have capabilities like CAP_SYS_MODULE, CAP_SYS_ADMIN, which are primarily reserved to only superusers, gives the containers unrestricted access to all the components on the system. (use capsh -print to know all the capabilities of a container.)

CAP_SYS_MODULE gives the container permissions to load and unload kernel modules. If an attacker gets access to a privileged container, he can create a reverse shell and obtain an interactive root shell session on the host’s system. The attacker can achieve this simply by inserting a kernel module that calls the “call_usermodehelper” function to execute an interactive bash process, “bash -i,” which creates a root shell. In this way, the SYS_MODULE capability is abused by the attackers.

The CAP_SYS_ADMIN capability allows a container to perform mount operations. A common attack on the containers is performed by mounting the attacker’s public keys on the host’s system. Thus, gaining access to the host’s machine from the attacker’s machine directly without the container.

Securing Containers

One solution that would mitigate the attacks on the vulnerabilities of running the docker as a superuser is to use user namespaces. User namespaces provide a layer of isolation to the container. The key benefit of user namespaces is to distinguish the root user inside the container from the root user of the host. This is achieved by giving the root user inside the container limited access to the privileged resources on the host system.

Use the userns-remap flag to turn on the user namespaces for the docker containers. We can prevent privilege-escalation attacks from within a container by configuring the container’s application to run as unprivileged users. We can remap the root user of the container to a less privileged user on the host’s system. The mapped user is assigned a range of ID’s within the range of normal UID’s, but have no privilege on the host machine.

Another security measure is to drop all the capabilities of the container (— cap-drop=all) and enabling only those that are required.

SUMMARY

Although there are risks in running containers with the privileged flag, there are some cases where this flag is needed, like running docker inside docker containers. By making sure that containers are given fewer privileges and running security checks using tools like docker bench, we can still use the power of docker privileged containers.