Running Docker containers as non-root

A Docker container running as root 1 may not be adhering to the principle of least privilege and may be exposing an attack surface that’s larger than necessary. Many containers do not require root privileges and should be ran as a non-root user instead. In this post, I explain some of Docker Engine’s default behaviors pertaining to users and capabilities, the implications of running as non-root, some guidelines for defining images to run as non-root, and an example containerized application configured to run as non-root.

Containers run as root by default

Arguably the most insecure default behavior of Docker containers is running as root. However, the majority of container images from public repositories are configured to use root as the default user. While some authors of those images are uninformed about the security implications, some are not and they set root as the default user for practical reasons such as facilitating package installation in derived images.

One way to determine whether an image has root as the default user is to run id --user inside a container derived from the image to print the containerized process’s effective UID and then check whether the UID is 0. 2

sh

docker run --rm debian:buster-slim id --user

output

0

Evident from the printed effective UID 0, the image debian:buster-slim, a Docker Official Image hosted on Docker Hub, runs as root by default.

By default, the Docker Engine does not utilize user namespaces for container isolation. 3 As a result, the root user inside the container is the same as that on the host. A more in-depth explanation about the lack of user namespaces is provided later in the article.

To verify that root users of the container and host are the same, run sleep for a reasonable amount of time inside a debian:buster-slim container and then check the effective UID of the containerized process from a shell on the host.

sh (inside container)

docker run --rm debian:buster-slim sleep 100

sh (host)

ps -fC sleep

output (host)

UID          PID    PPID  C STIME TTY          TIME CMD
root        3153    3132  0 13:47 pts/0    00:00:00 sleep 100

The UID column has a value of root, which means the containerized process’s effective UID from the host’s perspective is 0.

Capabilities of root containers

By default, containerized processes running as root do not have the full set of capabilities of traditional privileged processes. 4 To see for yourself, create an image whose derived container prints the capabilities of the containerized process.

Create a Dockerfile named Dockerfile with the contents below.

Dockerfile

FROM debian:buster-slim

RUN apt-get update \
    && apt-get install -y libcap2-bin \
    # Cleanup
    && apt-get -y clean \
    && apt-get -y autoclean \
    && rm -rf /var/lib/apt/lists/*

CMD ["getpcaps", "1"]

Recall that the default user of the base image debian:slim-buster is root. Therefore, the image built from our current Dockerfile has root as the default user.

The package libcap2-bin is installed because it provides the utility getpcaps, which is used as the image’s default command to display the capabilities of the containerized process. 5

Build the image from Dockerfile. 6 Name the image myimage with the tag latest.

sh

docker build -t myimage:latest - < Dockerfile

Run a container derived from myimage:latest.

sh

docker run --rm myimage:latest

output

Capabilities for `1': = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+ep

The listed capabilities are the default ones of a containerized process running as root. 7

If you compare Docker Engine’s default set of capabilities 8 with the total set of capabilities offered by the Linux kernel, you’ll notice that Docker Engine’s default set is smaller than the total set. However, Docker Engine’s default capability set is still excessive for many cases and efforts should be made to further restrict a container’s capability set. If an attacker escapes a root container, then the default capability set allows the attacker to perform operations such as changing the ownership of files and binding sockets to privileged ports, which correspond to the CHOWN and NET_BIND_SERVICE capabilities, respectively.

Capabilities of non-root containers

Running a container as a non-root user will drop all capabilities. 9 To verify, run a container derived from myimage:latest and set the user to a non-root user on the host. I assume the non-root user is the current user.

sh

docker run --rm --user "$(id -u):$(id -g)" myimage:latest

output

Capabilities for `1`: =

As you can see, the container has an empty capability set.

Understanding lack of user namespaces

Remember that the Docker Engine does not enable user namespaces by default. As a result, if the UID of a user inside a container coincides with that of a host user, then the users are one and the same from a privilege checking perspective.

Let’s say the image mynonrootimage:latest is built from the Dockerfile named Dockerfile with the contents below:

Dockerfile

FROM debian:buster-slim

RUN groupadd --gid 1000 navi \
    && useradd --shell /bin/bash --uid 1000 --gid 1000 --create-home navi

USER navi

sh

docker build -t mynonrootimage:latest - < Dockerfile

Dockerfile instructs the creation of a user with username navi, UID 1000, GID 1000, bash as the login shell, and a home directory. The last Dockerfile instruction sets the image’s default user as the newly created user navi.

A container derived from mynonrootimage:latest is then ran to print the contents of the container’s file /etc/passwd, which contains info about all users. Info about the user navi should be present in the container’s /etc/passwd.

sh

docker run --rm mynonrootimage:latest cat /etc/passwd

output

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
navi:x:1000:1000::/home/navi:/bin/bash

Indeed, it is present, and it is at the bottom.

The container’s root filesystem is different than the host’s. Therefore, the container’s /etc/passwd is different than the host’s.

Assume there was no user named navi on the host prior to running the container. Once the container is ran, there is still no navi on the host. 10

sh

grep navi /etc/passwd | wc -l

output

0

Despite navi not existing on the host, the container’s navi may have UID and GID that coincide with those of a differently named user on the host.

navi has UID 1000 and GID 1000. When I search for a user with those UID and GID on my host, I do find a match: my personal user alfonso.

sh

awk -F: '$3 == 1000 && $4 == 1000 { print }' /etc/passwd

output

alfonso:x:1000:1000:Alfonso Castellanos,,,:/home/alfonso:/bin/bash

As a result, navi is the same user as alfonso. Usernames and group names are aliases to UIDs and GIDs, respectively. What matters to the system when privilege checking is UIDs and GIDs, not usernames or group names.

Guidelines for defining images to run as non-root

When defining a Dockerfile for an image that’ll run as non-root, I recommend you follow the following guidelines as closely as possible:

Example non-root containerized app

This section will cover a simple yet practical example of an image configured to have its derived containers run as non-root. The containerized application is a command-line Python application that calculates the nth term of the Fibonacci sequence.

Here’s the directory structure of the application fibonacci:

f i b o n a c c i m r D a e o i q c n u k . i e p r r y e f m i e l n e t s . t x t

Here’s the contents of the Python source file main.py:

main.py

import click


@click.command()
@click.argument("n", type=click.INT)
def fibonacci(n: int):
    if n < 0:
        raise click.BadParameter("n is negative")

    if n == 0 or n == 1:
        print(n)
        return

    prev = 0
    curr = 1

    for _ in range(n - 1):
        newcurr = curr + prev
        prev = curr
        curr = newcurr

    print(curr)


if __name__ == "__main__":
    fibonacci()

The only external dependency is click, a Python package to facilitate the development of command line tools. As a result, the following is the contents of requirements.txt:

requirements.txt

click==8.1.1

Below is the contents of Dockerfile, which consists of three build stages: base, installer, and runner. 11

Dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
FROM python:3.9.12-slim-buster AS base

ARG USERNAME=navi
ARG USER_GROUP_NAME=$USERNAME
ARG USER_UID=1000
ARG USER_GID=$USER_UID

RUN groupadd --gid $USER_GID $USER_GROUP_NAME \
    && useradd --shell /bin/bash --uid $USER_UID --gid $USER_GID --create-home $USERNAME

USER $USER_UID:$USER_GID

WORKDIR /home/$USERNAME




FROM base AS installer

COPY --chown=$USER_UID:$USER_GID requirements.txt .

RUN pip install --user --no-cache-dir -r requirements.txt




FROM base AS runner

COPY --from=installer --chown=$USER_UID:$USER_GID \
    /home/$USERNAME/.local/lib/python3.9/site-packages .local/lib/python3.9/site-packages

COPY --chown=$USER_UID:$USER_GID main.py .

ENTRYPOINT [ "python", "main.py" ]

Base stage

The build stage base, comprising of lines 1 through 13, creates the base image for the next two build stages. base first creates a new user named navi along with its associated group navi and home directory. It then sets the default user to navi and the working directory to navi’s home directory.

Build arguments are defined so that the username, group name, UID, and GID can be configured at build-time if desired. Although the username and group name do not matter, the UID and GID do because they are used used to set the default image user and the ownership attributes of application files. The UID and GID are set to 1000 by default because UID 1000 and GID 1000 are typically used for the first non-root user created on a Linux system.

Installer stage

The build stage installer, comprising of lines 18 through 22, installs the application’s external dependencies.

Using base as its base image, installer copies requirements.txt from the host’s file system to that of the image. Because base set the working directory to navi’s home directory, the destination path . provided to the COPY instruction is relative to navi’s home directory /home/navi and thus has a converted absolute path of /home/navi/. Using the --chown flag, the image’s copy of requirements.txt located at /home/navi has its UID and GID set to those of navi. 12

After requirements.txt has been copied, Python’s package installer pip reads requirements.txt and installs the specified version of click to the user’s site-packages directory, which is located at ~/.local/lib/pythonX.Y/site-packages by default. X and Y are the major and minor versions, respectively, of the Python version being used. As a result, the image’s site-packages path is /home/navi/lib/python3.9/site-packages.

Runner stage

The build stage runner, comprising of lines 27 through 34, copies the installed external dependencies from installer, copies main.py from the host, and sets python main.py as the entrypoint.

Why is the build multi-stage?

Had Dockerfile described a single-stage build like shown below instead, requirements.txt would have been present in the final image.

Dockerfile

FROM python:3.9.12-slim-buster

ARG USERNAME=navi
ARG USER_GROUP_NAME=$USERNAME
ARG USER_UID=1000
ARG USER_GID=$USER_UID

RUN groupadd --gid $USER_GID $USER_GROUP_NAME \
    && useradd --shell /bin/bash --uid $USER_UID --gid $USER_GID --create-home $USERNAME

USER $USER_UID

WORKDIR /home/$USERNAME

COPY --chown=$USER_UID:$USER_GID requirements.txt .

RUN pip install --user --no-cache-dir -r requirements.txt

COPY --chown=$USER_UID:$USER_GID main.py .

ENTRYPOINT [ "python", "main.py" ]

Multi-stage builds are an optimization technique to reduce the memory footprint of containers. Although requirements.txt is a small file and thus a single-stage build would have been practical enough, implementing multi-stage builds from the beginning is a good practice since it doesn’t require much effort.

Footnotes


  1. root is the conventional name of the user with UID 0 who is known as the superuser. By default, the superuser bypasses all privilege checks in a Linux system. Whenever I refer to a user named root, assume the user has UID 0. ↩︎

  2. The --rm flag causes the container to be automatically removed when it exits. See the Docker docs for more info. ↩︎

  3. To learn more about enabling user namespaces and the security benefits they provide, see the Docker docs↩︎

  4. In the traditional privilege scheme, there are two categories of processes: privileged and non-privileged. Privileged processes have effective UID 0 and bypass all privilege checks. Non-privileged processes, on the other hand, undergo privilege checking and their user and group IDs determine whether they pass a privilege check.

    In the capability scheme, the privileges of root are divided into distinct units called capabilities. A process can perform a privileged operation if it has the corresponding capability. For a list of capabilities, see this Linux man page↩︎

  5. The initial, and usually the only, containerized process has PID 1 because PID namespaces are used in container isolation and thus PID enumeration begins at 1. ↩︎

  6. The command docker build - < Dockerfile reads from stdin without context. See the Docker docs for more info. ↩︎

  7. The string +ep at the end of the output indicates that the listed capabilities are members of the process’s effective and permitted capability sets. ↩︎

  8. In the Moby source code, the default capabilities are codified here↩︎

  9. In the Moby source code, the logic for dropping the default capabilities when the user is not root is found here↩︎

  10. wc -l prints the number of lines in the file provided. ↩︎

  11. For more details about multi-stage builds, see the Docker docs↩︎

  12. By default (without the use of the --chown flag), copied files have UID 0 and GID 0. See the Docker docs for more details. ↩︎