Chapter 16 Container Images

This chapter dives more into what images are for containers. Images enable quick sharing of Shiny applications, by bundling all the configurations and dependencies required to run your Shiny application.

Images are shared through container registries, and only need to be built once, saving time from installing dependencies and configuring the Shiny application each time it is run on a new computer.

16.1 Working with Existing Images

Let’s learn how to work with an existing image. Such an image is stored in a container registry where we can pull it from if we know its name.

16.1.1 Image Names and Tags

Image names follow the pattern <host>/<path>:<tag>. The optional <host> name specifies where the image is located. If you don’t specify a host name, the commands will use Docker’s public registry docker.io, aka the Docker Hub.

The <path> can be an “official” image, like ubuntu. The <tag> is a human-readable identifier that is often a specific version of an image. If not specified, the latest tag will be used. So ubuntu as an image name will be identical to docker.io/ubuntu:latest.

It is important to note that the latest tag only means “latest” in the sense of the last image that was tagged as latest or was untagged. If you use a different tag, like v1, the image with the latest tag will not get updated as well. So don’t let this tag fool you. It is strongly recommended in production to always explicitly use a tag that is not latest but a specific version.

The path is usually more structured and consists of slash-separated components. It often looks like <namespace>/<repository> where the namespace specifies the user account or organization to which the image belongs to. The base R image maintained by the Rocker project is named as rocker/r-base where rocker is the organization namespace, r-base is the repository.

Another example is the R version of the Old Faithful example which has the image name ghcr.io/h10y/faithful/r-shiny:latest. This means:

ghcr.io is the GitHub Container Registry host name,
h10y is the GitHub organization,
faithful is the GitHub repository,
r-shiny is the Shiny app build,
latest is the version tag.

16.1.2 Pulling an Image

You can use the docker pull <image-name> command to pull an image from a public registry. For example docker pull ubuntu:24.04 will pull the 24.04 version of the “official” Ubuntu image from the Docker Hub. docker pull rocker/r-base:4.4.1 will pull the image with R version 4.1.1. Pull the R Shiny version of the Old Faithful as:

docker pull ghcr.io/h10y/faithful/r-shiny

# Using default tag: latest
# latest: Pulling from h10y/faithful/r-shiny
# Digest: sha256:12e[...]4ea

You can see from the messages that the latest tag was applied because we did not specify the tag. We can also see the SHA256 digest, that is a unique and immutable identifier. The name can change, or multiple names can refer to the same image (i.e. a set of layers and their manifest). But the image digest will be the same. To “pin” the exact version, you can use the <image-name>@sha256:12e[...]4ea pattern (use the actual digest copied from your screen without the [...]):

docker pull ghcr.io/h10y/faithful/r-shiny@sha256:12e[...]4ea

To pull all images from a repository, you can use the --all-tags flag:

docker image pull --all-tags ghcr.io/h10y/faithful/r-shiny

This will pull not only the latest, but also the image tagged as main named after the Git branch. Use the docker images command to list the images.

16.1.3 Docker Login

You don’t need to authenticate for public images, but in case you are trying to pull a private image from a private repository, you need to log into the container registry. Such private repositories are common and are available on Docker Hub, the GitHub or GitLab container registries. More on the different container registries later.

To log in to the GitHub container registry, use:

docker login ghcr.io

This command will ask for our credentials interactively. If you want, you can provide your username and password. But it is usually recommended to use an access token instead of your password because the token can have more restricted scopes, i.e. only used to (read) access the container registry which is a lot more secure. You can also set expiry dates and can revoke these tokens any time without having to change login passwords elsewhere.

Let’s say that you saved your GitHub token value in a file ~/my_token.txt in the root of your home folder (~). You can pass the PAT value to the docker login command via the standard input as:

cat ~/my_token.txt | docker login \
    --username <username> \
    --password-stdin

where <username> is your GitHub username.

16.1.4 Running a Container

The next command is docker run which runs a command in a new container. It pulls the image if needed before starting the container.

Try the following command. It will pull the latest image for the Python build of the Old Faithful example app, then it will start a new container:

docker run -p 8080:3838 ghcr.io/h10y/faithful/py-shiny

The -p is a shorthand for --publish that instructs Docker to publish a container’s port to the host port. In our example, 3838 is the container’s port which is mapped to port 8080 of the host machine. As a result, you can visit http://127.0.0.1:8080 in your browser to see the Python Shiny app. Hit CTRL+C in the terminal to stop the container. We will learn about container ports in a bit, but in essence it is just a channel that is used to send information back and forth.

16.2 Building a New Image

So far you saw how to use the basic Docker commands to pull and run images. Now you’ll build a Docker image by recreating the Old Faithful Shiny app that we worked with before.

In our examples, we will use the following setup: a file named Dockerfile sits next to a folder named app, and the Shiny app files like app.R or app.py are in this folder. This setup is convenient because we can copy all the files from the app folder without having to worry about copying files that should not be there.

├── Dockerfile
└── app
    └── ...

You can follow along the examples by downloading or cloning the GitHub repository with git clone https://github.com/h10y/faithful.git. All the different builds of the Old Faithful app from Chapter 5 will have a Dockerfile and instructions in the README.md files within each folder.

16.2.1 R for Shiny

For our R Shiny example within the r-shiny folder, this is what is inside the Dockerfile:

FROM rocker/r2u:24.04

RUN R -q -e "install.packages('shiny')"

RUN groupadd app && useradd -g app app
WORKDIR /home/app

COPY app .

RUN chown app:app -R /home/app
USER app
EXPOSE 3838

CMD ["R", "-e", "shiny::runApp(host='0.0.0.0', port=3838)"]

We will explain the Dockerfile instructions in the next section. For now, you can use the docker build command to build the image from the Dockerfile. You will have to be in the same directory as the Dockerfile, this place is what we call the build context. This is what the . at the end of the command stands for:

docker build -t r-shiny:v1 .

The context here specifies the current directory (.), but it can be any relative or absolute filepath. Files and directories inside the context directory are available to the builder, so it can load them when needed. You can use a .dockerignore file to list files and directories that should be ignored within the build context. It is similar to the .gitignore file.

The instructions are taken from the Dockerfile at the root of the build context. If you want to specify a different file, do so by providing the path to the file using the -f (or --file) option as docker build -f Dockerfile2 ..

The -t argument (same as --tag) is followed by the image name (r-shiny-test) and the tag (v1). If you do not specify the image name/tag at image build (i.e. docker build .), Docker will not tag the image but it will have an image ID that you can use later to tag the image with docker tag <image-id> r-shiny-test:v1.

You can apply multiple tags as:

docker build -t r-shiny:v1 -t r-shiny:latest .

16.2.2 Buildx and BuildKit

While the builder is running, you’ll see lots of messages printed as Docker goes through the instructions from the Dockerfile. As of Docker Engine 23.0 and Docker Desktop 4.19, Buildx is the default build client and user interface. Buildx brings extended build capabilities with BuildKit. BuildKit is the server that handles the build execution, e.g. it communicates with registries, instructs the Docker Engine and accesses the local file system. You can enable the use of BuildKit on older Docker systems by setting the environment variable DOCKER_BUILDKIT=1.

The Buildx output is nicer and it provides you with timings for every step of your Dockerfile:

[+] Building 32.4s (12/12) FINISHED
 => [internal] load build definition from Dockerfile                    0.0s
 => => transferring dockerfile: 282B                                    0.0s
 => [internal] load metadata for docker.io/rocker/r2u:24.04             1.2s
 => [auth] rocker/r2u:pull token for registry-1.docker.io               0.0s
 => [internal] load .dockerignore                                       0.0s
 => => transferring context: 2B                                         0.0s
 => [1/6] FROM docker.io/rocker/r2u:24.04@sha256:f327[...]dd73          9.2s
 => => resolve docker.io/rocker/r2u:24.04@sha256:f327[...]dd73          0.0s
[...]
 => [internal] load build context                                       0.0s
 => => transferring context: 845B                                       0.0s
 => [2/6] RUN groupadd app && useradd -g app app                        0.7s
 => [3/6] RUN R -q -e "install.packages('shiny')"                      20.9s
 => [4/6] WORKDIR /home/app                                             0.0s
 => [5/6] COPY app .                                                    0.0s
 => [6/6] RUN chown app:app -R /home/app                                0.1s
 => exporting to image                                                  0.3s
 => => exporting layers                                                 0.3s
 => => writing image sha256:4d10[...]bab7                               0.0s
 => => naming to docker.io/library/r-shiny:v1                           0.0s

Sometimes you want to inspect the output and do not only want the collapsed output. Add the --progress=plain to the build command to see all the output. This comes handy when troubleshooting the build.

BuildKit also offers other nice features, for example setting the target platform(s) for the build via the --platform option. The default value is the platform of the BuildKit daemon where the build runs, i.e. your laptop or a server. This can be important for Mac OS X users on Apple Silicone (M1 and above), because the default ARM64 build will have poor performance or might fail on other platforms on AMD64 machines. Use the --platform=linux/arm64 to build the image for AMD64 architecture. You can also build for multiple architectures at once with docker build --platform linux/amd64,linux/arm64 .. See 3.1.4 for enabling virtualization on Mac OS X to enable builds for multiple platforms.

16.2.3 Inspecting the Image

The output of the build is an image that has a SHA256 hash that can be used as a unique identifier. The image is made up of image layers. These layers are created by the instructions from the Dockerfile. If you run the build command again you will notice that instead of 32 seconds, it will take almost no time to build the image. This is because the layers are cached by default and Docker smartly evaluates which instructions and files have changed since the last build. Sometimes the cache gets tangled, or you just want to make sure that the error is not a caching issue. In this case use the --no-cache flag with docker build.

You can use the docker history r-shiny:v1 command to see how the image was built and you can see the sizes for every layer. Intermediate layers have a size of 0B and these do not contribute to the overall image size. The layers created 2 hours ago are the layers we created, the layers created 2 weeks ago are the layers from the parent image rocker/r2u:24.04, whereas the layers created 2 months ago are the official ubuntu:24.04 image layers that form the parent image of the rocker/r2u:24.04 one:

IMAGE          CREATED        CREATED BY                                  SIZE
4d[...]52   2 hours ago    CMD ["R" "-e" "shiny::runApp(host='0.0.0.0',   0B
<missing>   2 hours ago    EXPOSE map[3838/tcp:{}]                        0B
<missing>   2 hours ago    USER app                                       0B
<missing>   2 hours ago    RUN /bin/sh -c chown app:app -R /home/app #    780B
<missing>   2 hours ago    COPY app . # buildkit                          780B
<missing>   2 hours ago    WORKDIR /home/app                              0B
<missing>   2 hours ago    RUN /bin/sh -c R -q -e "install.packages('sh   109MB
<missing>   2 hours ago    RUN /bin/sh -c groupadd app && useradd -g ap   5.14kB
<missing>   2 weeks ago    RUN /bin/sh -c apt-get update         && apt   642MB
<missing>   2 weeks ago    ENV TZ=UTC                                     0B
<missing>   2 weeks ago    ENV DEBIAN_FRONTEND=noninteractive             0B
<missing>   2 weeks ago    ENV LANG=en_US.UTF-8                           0B
<missing>   2 weeks ago    ENV LC_ALL=en_US.UTF-8                         0B
<missing>   2 weeks ago    RUN /bin/sh -c useradd -s /bin/bash -m docke   81.6MB
<missing>   2 weeks ago    LABEL org.label-schema.license=GPL-2.0 org.l   0B
<missing>   2 months ago   /bin/sh -c #(nop)  CMD ["/bin/bash"]           0B
<missing>   2 months ago   /bin/sh -c #(nop)  ADD file:ac9d5a9d5b9b1217   76.2MB
<missing>   2 months ago   /bin/sh -c #(nop)  LABEL org.opencontainers.   0B
<missing>   2 months ago   /bin/sh -c #(nop)  LABEL org.opencontainers.   0B
<missing>   2 months ago   /bin/sh -c #(nop)  ARG LAUNCHPAD_BUILD_ARCH    0B
<missing>   2 months ago   /bin/sh -c #(nop)  ARG RELEASE                 0B

The docker inspect r-shiny:v1 returns a long JSON output that is the metadata of the image. It also has the SHA256 hash of the image. Here is the greatly simplified output:

[
    {
        "Id": "sha256:4d10[...]bab7",
        "RepoTags": ["r-shiny:v1"],
        "Created": "2024-07-05T04:59:01.123398172Z",
        "Config": {
            "User": "app",
            "ExposedPorts": {"3838/tcp": {}},
            "Cmd": ["R","-e",
                "shiny::runApp(host='0.0.0.0', port=3838)"],
            "Volumes": null,
            "WorkingDir": "/home/app",
            "Entrypoint": null,
        },
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 909132976,
        "Metadata": {
            "LastTagTime": "2024-07-05T06:20:22.2764725Z"
        }
    }
]

Once the docker image is built, you can run the container to make sure the app is working as expected:

docker run -p 8080:3838 r-shiny:v1

16.2.4 Python for Shiny

You can find the Python for Shiny example in the py-shiny folder of the Old Faithful example repository. The Dockerfile for the Python version looks like this:

FROM python:3.9

COPY app/requirements.txt .
RUN pip install --no-cache-dir --upgrade -r requirements.txt

RUN groupadd app && useradd -g app app
WORKDIR /home/app

COPY app .

RUN chown app:app -R /home/app
USER app
EXPOSE 3838

RUN mkdir .config
ENV MPLCONFIGDIR=/home/app/.config
ENV HOME=/home/app

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "3838"]

We’ll explain each line shortly. To build and check the Docker image, use the following commands:

export DOCKER_DEFAULT_PLATFORM=linux/amd64

docker build -t py-shiny:v1 .

docker run -p 8080:3838 py-shiny:v1

The DOCKER_DEFAULT_PLATFORM environment variable is not strictly necessary, but it can save you some headaches on Mac OS X when the platform for the parent image is not matching the local ARM64 architecture of your Apple Silicone.

16.3 Managing Images

There are a few commands that you need to know to manage your Docker images in the absence of the Docker Desktop graphical user interface. This will pay off later when you have no such luxuries on a server.

To list the Docker images, use the docker images command. It will give you a quick summary of the images:

REPOSITORY   TAG       IMAGE ID       CREATED             SIZE
py-shiny     v1        ed11a2980c07   5 seconds ago       1.24GB
r-shiny      v1        4d10f42d6a52   About an hour ago   909MB

Size is the space taken up by the image and all its parent images.

You can filter the output, for example docker images --filter=reference="py-*" will give you images whose name starts with py-, whereas docker images --filter=reference="*:v1" will list images that are tagged with v1.

Use the docker rmi <image-name-or-id> to remove an image based on its name or the image ID.

The docker system df command will give you a concise summary of disk usage by the Docker daemon including images, containers, and volumes:

TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          2         1         1.457GB   1.355GB (98%)
Containers      1         1         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     117       0         3.005GB   3.005GB

If you build images during development while keeping the image name and tag the same you will end up with “dangling” images that are untagged and are not used any longer. Dangling images can accumulate over time and can fill up the available space that Docker Desktop is allocating for images. Use docker system prune to clean up these dangling images. The command docker system prune --all will remove all unused images and containers.

The Docker Desktop uses a finite amount of disk space that can fill up. Do the cleanup or go to the Docker Desktop settings and under Resources you should be able to change the virtual disk limit. You can check the RAM, CPU, and disk usage by looking at the bottom of the Docker Desktop window.

16.4 Sharing Images

As we saw, Docker images are just compressed files linked by metadata. You should be able to copy these files and move them around. The docker save command lets you save an image to a compressed tar file:

docker save -o r-shiny-v1.tar r-shiny:v1

Next, you take this tar file, copy it to another server and load it with:

docker load --input r-shiny-v1.tar

This restores both the image and the tags.

Now imagine that you are managing more than two machines, or you want to share the Docker image with others so that they can use it or to serve as a parent image. The save/copy/load workflow becomes cumbersome quickly. In this case, using a registry might be a much better idea. There are many options to choose from, and you can even host your own registry.

16.4.1 Pushing Images

Let’s tag the r-shiny image so that it has a host defined:

docker tag r-shiny:v1 ghcr.io/h10y/faithful/r-shiny:latest

Now we can push the locally built Docker image to a container registry:

docker push ghcr.io/h10y/faithful/r-shiny:latest

Note that this command will not work on your machine because you do not have write access to the ghcr.io/h10y/faithful repository. You need to create an image name that would let you push to your own personal Docker Hub account as an example.

The image tag should start with the registry name unless you are pushing to Docker Hub. When the image tag is not specified, Docker will treat the new image as :latest automatically.

16.4.2 Docker Registries

A Docker registry stores Docker images. This is where we push images to and pull images from. Docker Hub is a public registry and Docker is configured to look for images on Docker Hub by default. Docker Hub is a service provided by Docker for finding and sharing container images. The canonical host name for Docker Hub is docker.io. This is the default registry when you don’t specify a registry host as part of the image name.

There are many other registries out there besides Docker Hub. Here is a non-exhaustive list of options.

The GitHub Container Registry (GHCR) is available as part of GitHub Packages for free and paid plans, even for private repositories under the free plan. This registry requires no authentication for public images, otherwise you have to authenticate using your GitHub token. The visibility of the images inherits the repository visibility but can be changed by the owner. The host name for GHCR is ghcr.io.

An alternative to GitHub is GitLab (host name registry.gitlab.com), that has provided registry support for its free (public and private) repositories long before GitHub. The registry is tightly integrated with GitLab’s CI/CD pipelines. This registry also needs login with a token for private images.

Heroku is a platform provider and it also comes with a Docker registry (host name is registry.heroku.com) where the Docker-based deployments push the images to.

Every major cloud provider offers a Docker container registry that is integrated with their other offerings. Latency should be minimal due to network proximity to the servers:

Amazon Elastic Container Registry
Azure Container Registry
Google Container Registry
DigitalOcean Container Registry

Other common alternatives for container registries include the JFrog Container Registry, Harbor, and Scaleway.

Although these services are called “container registry”, but strictly speaking they store container images.

16.4.3 Log In to a Registry

When you work with private registries or private images, you need to log in with the docker login command. For Docker Hub, just type docker login. For all other registries, type in the registry URL as well, e.g. docker login ghcr.io.

The Docker CLI then will prompt you for your username and password (or access token).

You can log in programmatically by providing your username and the password through standard input from a file:

cat ~/my_password.txt | docker login -u USER --password-stdin

The my_password.txt in this example is is a simple text file with the token inside and it can be found in the root of your home folder (~). Change the file path and file name as needed.

You can also use an environment variable to store your token value that you can pass to the login command as:

  export TOKEN=<your-token-value>
echo $TOKEN | docker login ghcr.io -u USER --password-stdin

Notice the white space before the export statement, use double spaces so that the command after the spaces will not be saved in your shell history. The history allows you to recall previous commands by pushing the up arrow key. The shell history is really just a text file, so copy pasting secrets into the terminal will leave a trace. Use this trick for sensitive information.

With one of these approaches you can log into any public or private repository for which you have credentials. The credentials will be stored locally in $HOME/.docker/config.json on Linux and Mac or in %USERPROFILE%/.docker/config.json on Windows. After login, there is no need to re-authenticate until you log out with docker logout.

It is always a good idea to use a token instead of your password. Tokens can have limited scope (i.e. only for pulling images), and can be revoked at any time without it impacting other areas of your life.

Note that docker login requires users to use sudo or be the root user.

16.4.4 Local Registry

You might not want the Docker images to leave your computer because you need an air gapped environment, or you are setting up a registry within your virtual private network (VPN). In these situations, you can host your own container registry.

If you want a registry hosted on your machine, just pull the registry image. The next command will pull the registry image, and run the similarly named container in the background on port 5000:

docker run -d \
  -p 5000:5000 \
  --restart=always \
  --name registry \
  registry:2

Giving a container a name makes it easier to remove the container later, this way you don’t have to find the container ID. The restart policy always restarts the container if it stops, but not when it is manually stopped. The -d flag will start the container in a background process, so you get back the shell prompt, and you will not see the container log messages.

Tag an image with the host name of your local registry, localhost:5000, and push the image:

docker tag r-shiny:v1 localhost:5000/r-shiny:v1

docker push localhost:5000/r-shiny:v1

To test if it worked, remove the images from your local Docker system. If you use the -f flag and specify the image ID then the docker rmi command untags and removes all images that match that ID (get the image ID from docker images):

docker rmi -f <image_id>

Now you can pull the image from your local registry:

docker pull localhost:5000/r-shiny:v1

The next command stops and removes the registry container. It is a daemonized (background) process, so CTRL+C won’t work. The -v option makes sure to remove anonymous volumes associated with the container which is often used to mount a volume from your hard drive into the container where the images are stored:

docker container stop registry && \
  docker container rm -v registry

If you want your registry to be accessed over a public network, then you need to think about security and access control. You’ll have to set up transport layer security (TLS) for HTTPS and user authentication, which are advanced topics and we recommend using a commercial container registry that we listed above and use private repositories to control access to your images.

16.5 Summary

This chapter covers the concepts of container images and registries. In the following chapter, we will cover how to build an image with a Shiny app.