Debug with Ephemeral Containers

In this post I will show you how you can debug your application in a pod with a new function called Ephemeral Containers released in Kubernetes 1.25.

In security perspective it is a best practice to create a container with only the necessary tools to run the app. This means when your application not working correctly you didn’t have the tools to debug. You can try to copy the tools to the container on-demand with kubectl cp but it isn’t always possible.

So what Other option we have? With the new Ephemeral Containers option we can add a new container to the existing pod and use this container to debug. This option is available in k8s versions 1.23 and become Generally Available with 1.25. In practice Kubernetes extended the Pod object specification with a new ephemeralContainers attribute. This attribute holds a list of Container v1 core objects.

Demo time

For demo purpose I will use a distroless python container. “Distroless” images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would expect to find in a standard Linux distribution, so it is perfect for the demo.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: slim
spec:
  selector:
    matchLabels:
      app: slim
  template:
    metadata:
      labels:
        app: slim
    spec:
      containers:
      - name: app
        image: gcr.io/distroless/python3-debian11
        command:
        - python
        - -m
        - http.server
        - '8080'
EOF

Now I will try to login to the container:

POD_NAME=$(kubectl get pods -l app=slim -o jsonpath='{.items[0].metadata.name}')

# no bash in the container
$ kubectl exec -it -c app ${POD_NAME} -- bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "43d1e91f41310fb1ede9fbab741921091edfe116311f18a3881f90f68d06dc13": OCI runtime exec failed: exec failed: unable to start container process: exec: "bash": executable file not found in $PATH: unknown

# ther is sh in the container but limited tools
$ kubectl exec -it -c app ${POD_NAME} -- sh
$# ps
sh: 3: ps: not found

So, let’s try inspecting Pods using an ephemeral container:

$ kubectl debug -it --attach=false -c debugger --image=nicolaka/netshoot ${POD_NAME}

$  kubectl get pod ${POD_NAME}   -o jsonpath='{.spec.ephemeralContainers}' | jq

[
  {
    "image": "nicolaka/netshoot",
    "imagePullPolicy": "Always",
    "name": "debugger",
    "resources": {},
    "stdin": true,
    "terminationMessagePath": "/dev/termination-log",
    "terminationMessagePolicy": "File",
    "tty": true
  }
]

$ kubectl get pod ${POD_NAME}   -o jsonpath='{.status.ephemeralContainerStatuses}' | jq
[
  {
    "containerID": "containerd://c3a58d41f5b007aa1d7c2f6758c0d397428bf1d3575380a0661f34efaab4bb34",
    "image": "docker.io/nicolaka/netshoot:latest",
    "imageID": "docker.io/nicolaka/netshoot@sha256:aeafd567d7f7f1edb5127ec311599bb2b8a9c0fb31d7a53e9cff26af6d29fd4e",
    "lastState": {},
    "name": "debugger",
    "ready": false,
    "restartCount": 0,
    "state": {
      "running": {
        "startedAt": "2022-09-07T08:30:44Z"
      }
    }
  }
]

If the ephemeral container is running, we can try attaching to it:

$ kubectl attach -it -c debugger ${POD_NAME}

$# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      -

$# wget -O - 127.0.0.1:8080
Connecting to localhost:8080 (127.0.0.1:8080)
writing to stdout
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
...
</html>

Shared namespace

When you try to check the processes in the container you only see the debugger container processes. So, kubectl debug just put the debug container in the same net and ipc linux namespace. I you want to know more about pods an how they work check my other post about this topic.

$# ps auxf
PID   USER     TIME  COMMAND
    1 root      0:00 zsh
   14 root      0:00 ps auxf


In the official documentation there is a workaround to enable shared linux namespaces for the debugger container:

$ kubectl patch deployment slim --patch '
spec:
  template:
    spec:
      shareProcessNamespace: true'

# or just start the debug container with hare-processes in the first place
$ kubectl debug -it --attach=false -c debugger --image=nicolaka/netshoot --share-processes ${POD_NAME}

Wait for pod restart then test:

$ kubectl get pods
NAME                    READY   STATUS        RESTARTS   AGE
slim-5f5ffd5958-b9sgt   1/1     Terminating   0          72m
slim-66475779f5-5c27b   1/1     Running       0          20s

$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
slim-66475779f5-5c27b   1/1     Running   0          48s

$ POD_NAME=$(kubectl get pods -l app=slim -o jsonpath='{.items[0].metadata.name}')

$ kubectl debug -it --attach=false -c debugger --image=nicolaka/netshoot ${POD_NAME}

$ kubectl attach -it -c debugger ${POD_NAME}

$# ps aux
PID   USER     TIME  COMMAND
    1 65535     0:00 /pause
    7 root      0:00 python -m http.server 8080
   14 root      0:01 zsh
   72 root      0:00 ps aux

Now if I want to access the filesystem of the misbehaving container, because of the shared pid linux namespace I have a trick:

# From inside the ephemeral container:
$# ls /proc/$(pgrep python)/root/usr/bin
c_rehash   getconf    iconv      locale     openssl    python     python3.9  zdump
catchsegv  getent     ldd        localedef  pldd       python3    tzselect

Troubleshooting network activity

# From inside the ephemeral container:
$# tcpdump -i lo -n port 8080

Now try to send a request to the pos. For this I will start a port forward and curl it:

kubectl port-forward pod/${POD_NAME} 8080:8080 &
curl http://127.0.0.1:8080

You will see the fallowing:

# From inside the ephemeral container:
$# tcpdump -i lo -n port 8080
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:45:01.727624 IP 127.0.0.1.41968 > 127.0.0.1.8080: Flags [S], seq 3878283086, win 65495, options [mss 65495,sackOK,TS val 2031418309 ecr 0,nop,wscale 7], length 0
09:45:01.727635 IP 127.0.0.1.8080 > 127.0.0.1.41968: Flags [S.], seq 2447903554, ack 3878283087, win 65483, options [mss 65495,sackOK,TS val 2031418309 ecr 2031418309,nop,wscale 7], length 0
09:45:01.727643 IP 127.0.0.1.41968 > 127.0.0.1.8080: Flags [.], ack 1, win 512, options [nop,nop,TS val 2031418309 ecr 2031418309], length 0
09:45:01.734499 IP 127.0.0.1.41968 > 127.0.0.1.8080: Flags [P.], seq 1:79, ack 1, win 512, options [nop,nop,TS val 2031418316 ecr 2031418309], length 78: HTTP: GET / HTTP/1.1
09:45:01.734521 IP 127.0.0.1.8080 > 127.0.0.1.41968: Flags [.], ack 79, win 511, options [nop,nop,TS val 2031418316 ecr 2031418316], length 0
09:45:01.735712 IP 127.0.0.1.8080 > 127.0.0.1.41968: Flags [P.], seq 1:155, ack 79, win 512, options [nop,nop,TS val 2031418317 ecr 2031418316], length 154: HTTP: HTTP/1.0 200 OK
09:45:01.735721 IP 127.0.0.1.41968 > 127.0.0.1.8080: Flags [.], ack 155, win 511, options [nop,nop,TS val 2031418317 ecr 2031418317], length 0
09:45:01.735753 IP 127.0.0.1.8080 > 127.0.0.1.41968: Flags [P.], seq 155:961, ack 79, win 512, options [nop,nop,TS val 2031418317 ecr 2031418317], length 806: HTTP
09:45:01.735757 IP 127.0.0.1.41968 > 127.0.0.1.8080: Flags [.], ack 961, win 505, options [nop,nop,TS val 2031418317 ecr 2031418317], length 0
09:45:01.735800 IP 127.0.0.1.8080 > 127.0.0.1.41968: Flags [F.], seq 961, ack 79, win 512, options [nop,nop,TS val 2031418317 ecr 2031418317], length 0
09:45:01.742812 IP 127.0.0.1.41968 > 127.0.0.1.8080: Flags [F.], seq 79, ack 962, win 512, options [nop,nop,TS val 2031418324 ecr 2031418317], length 0
09:45:01.742820 IP 127.0.0.1.8080 > 127.0.0.1.41968: Flags [.], ack 80, win 512, options [nop,nop,TS val 2031418324 ecr 2031418324], length 0

Tracing/profiling processes using ephemeral containers.

# From inside the ephemeral container:
$# ps aux
PID   USER     TIME  COMMAND
    1 65535     0:00 /pause
    7 root      0:00 python -m http.server 8080
   14 root      0:01 zsh
   72 root      0:00 ps aux

$# strace -p 7
strace: Process 7 attached
restart_syscall(<... resuming interrupted read ...>) = 0
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 1 ([{fd=3, revents=POLLIN}])
accept4(3, {sa_family=AF_INET, sin_port=htons(40652), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_CLOEXEC) = 4
getsockname(4, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [128 => 16]) = 0
clone(child_stack=0x7f57a50ccfb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[299], tls=0x7f57a50cd700, child_tidptr=0x7f57a50cd9d0) = 299
futex(0x93a56c, FUTEX_WAKE_PRIVATE, 1)  = 1
futex(0x93a570, FUTEX_WAKE_PRIVATE, 1)  = 1
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 500^Cstrace: Process 8 detached
 <detached ...>

$# strace -f -p 7