Most people who start with containers believe it is just lightweight virtual machines with fast startup time, but it is a oversimplification that can be misleading. In this post we will try to understand the real natures of containers.
It is easy to follow tutorials from the Internet on how to put a Python or a Node.js application into a container. But Docker is a behemoth doing a wide variety of things, and the apparent simplicity of docker run nginx can be deceptive.
Containers Are Not Virtual Machines
Virtualization engines are creating a paravirtualized hardware that from the perspective of the VirtualMachine is looks like a real hardware. So the virtual machine has its own kernel hat communicate with the virtual hardware. Then the virtualization engine communicate with tha real hardware truth the host’s kernel. This solution requires more hardware resource for an app then a direct communication with the kernel.
A container is just an isolated (namespaces) and restricted (cgroups, capabilities, seccomp) process on the host. It dose not have two kernel layer so requires less resources. To start a containerized process, you need to create namespaces, than run a process in it. A low-level Container Runtime knows how to prepare such namespaces and then how to start a containerized process in it.
What is a Low-Level Container Runtime?
Containers are implemented using Linux namespaces and cgroups. Namespaces let you virtualize system resources, like the file system or networking, for each container. Cgroups provide a way to limit the amount of resources like CPU and memory that each container can use. At the lowest level, container runtimes are responsible for setting up these namespaces and cgroups for containers, and then running commands inside those namespaces and cgroups. Low-level runtimes support using these operating system features.
runc is a CLI tool for spawning and running containers according to the OCI specification. Docker donated this library to OCI as a reference implementation of the OCI runtime specification.
crun is a lightweight fully featured OCI runtime and C library for running containers.
“While most of the tools used in the Linux containers ecosystem are written in Go, I believe C is a better fit for a lower level tool like a container runtime. runc, the most used implementation of the OCI runtime specs written in Go, re-execs itself and use a module written in C for setting up the environment before the container process starts.” (Source: crun GitHub page )
crun is faster than runc and has a much lower memory footprint.
Images Aren’t Needed To Run Containers
For folks familiar with how runc starts containers, it’s clear that images aren’t really a part of the equation. Instead, to run a container, a runtime needs a so-called bundle that consists of:
- a config.json file holding container parameters (path to an executable, env vars, etc.)
- a folder with the said executable and supporting files (if any).
Oftentimes, a bundle folder contains a file structure resembling a typical Linux distribution (/var, /usr, /lib, /etc, …). When runc launches a container with such a bundle, the process inside gets a root filesystem that looks pretty much like your favorite Linux flavor, be it Debian, CentOS, or Alpine.
But such a file structure is not mandatory! So-called scratch or distroless containers are getting more and more popular nowadays
Some Containers are Virtual Machines
runv is a hypervisor-based runtime for OCI.
runV supports the following hypervisors:
- KVM (QEMU 2.1 or later)
- KVM (Kvmtool)
- Xen (4.5 or later)
- QEMU without KVM (NOT RECOMMENDED. QEMU 2.1 or later)
What is a shim?
A container runtime shim is a piece of software that resides in between a container manager (containerd, cri-o, podman) and a low-level container runtime (runc, crun) solving the integration problem of these counterparts.
Docker, Google, CoreOS, and other vendors created the Open Container Initiative (OCI). The OCI currently contains two specifications: the Runtime Specification (runtime-spec) as a standerd of CRI (Container runtime Interface) and the Image Specification (image-spec). This led to other standards like CNI (Container Network Interface), a Cloud Native Computing Foundation project, or Container Storage Interface (CSI).
What is a container runtime?
Container runtime is the engine that runs and manages the components required to run containers. Communicating with the kernel to start containerized processes, setting up cgroups, configure mount points and do many things to make your container work.
Docker was released in 2013 and solved many of the problems that developers had running containers like LXC or OpenVZ. Before version 1.11, the implementation of Docker was a monolithic daemon. The monolith did everything as one package such as downloading container images, launching container processes, exposing a remote API, and acting as a log collection daemon, all in a centralized process running as root. (Source: Coreos )
At this time docker was the only runtime that Kubernetes supported, but wit the release of Coreos’s rkt Kubernetes needed a standard interface to ease the integration of other container runtimes. This led to the splitting of Docker into different parts.
CRI-O is an implementation of the Kubernetes CRI (Container Runtime Interface) to enable using OCI (Open Container Initiative) compatible runtimes. It is a lightweight alternative to using Docker as the runtime for kubernetes. It allows Kubernetes to use any OCI-compliant runtime as the container runtime for running pods. Today it supports runc and Kata Containers as the container runtimes but any OCI-conformant runtime can be plugged in principle.
CRI-O supports OCI container images and can pull from any container registry. It is a lightweight alternative to using Docker, Moby or rkt as the runtime for Kubernetes. (Source: CRI-O Website )
crictl ps - list containers
crictl pods - list pods
PouchContainer is an open-source project created by Alibaba Group. It provides applications with a lightweight runtime environment with strong isolation and minimal overhead. PouchContainer isolates applications from varying runtime environment, and minimizes operational workload. t includes lots of security features, like hypervisor-based container technology, lxcfs, directory disk quota, patched Linux kernel. PouchContainer utilizes Dragonfly, a P2P-base distribution system, to achieve lightning-fast container image distribution at enterprise’s large scale.
Kata Containers is an open source community working to build a secure container runtime with lightweight virtual machines that feel and perform like containers, but provide stronger workload isolation using hardware virtualization technology as a second layer of defense. (Source: Kata Containers Website )
Podman is a daemonless, open source, Linux-native tool designed to develop, manage, and run Open Container Initiative (OCI) containers and pods. It has a similar directory structure to Buildah, Skopeo, and CRI-O. Podman doesn’t admin privileges for its commands to work.
containerd is a daemon that controls runC. From containerd website, “containerd manages the complete container lifecycle of its host system, from image transfer and storage to container execution and supervision to low-level storage to network attachments and beyond”.
What is gvisor
gVisor is an application kernel, written in Go, that implements a substantial portion of the Linux system call interface. It provides an additional layer of isolation between running applications and the host operating system.
gVisor includes an Open Container Initiative (OCI) compatible Low-Level Container Runtime called
runsc that makes it easy to work with existing container tooling. The
runsc runtime integrates with Docker, containerd and Kubernetes, making it simple to run sandboxed containers.