Kubernetes Pod Security

With the release of Kubernetes v1.23, Pod Security admission has now entered beta. In this article, we cover the key concepts of Pod Security along with how to use it.

Parst of the K8S Security series

What is a Pod Security Policy?

A Pod Security Policy is a cluster-level resource that controls security sensitive aspects of the pod specification. RBAC Controlls the usable Kubernetes objects for a user but nt the conditions of a specific ofject like allow run as root or not in a container. PSP objects define a set of conditions that a pod must run with in order to be accepted into the system, as well as defaults for their related fields. PodSecurityPolicy is an optional admission controller that is enabled by default through the API, thus policies can be deployed without the PSP admission plugin enabled.

What is a Pod Security

Pod Security is the successor to PodSecurityPolicy which was deprecated in the v1.21 release, and will be removed in Kubernetes v1.25. Pod Security overcomes key shortcomings of Kubernetes' existing, PodSecurityPolicy (PSP) mechanism like: challenging to deploy with controllers and teh lack of dry-run/audit capabilities made it hard to enable PodSecurityPolicy.

Configuring Pod Security

Pod Security use profiles based on Pod Security Standards. These standards define three different policy levels:

  • privileged - The Privileged policy is purposely-open, and entirely unrestricted.
  • baseline - The Baseline policy is aimed at ease of adoption for common containerized workloads while preventing known privilege escalations.
    • Disables HostProcess for windows
    • Disables Host Namespaces on linux like: hostNetwork, hostPID and hostIPC
    • Disables Privileged Containers
    • Disallow adding of Capabilities
    • Disallow mounting of HostPath Volumes
    • Disallow usage of Host Ports
    • On supported hosts, the runtime/default AppArmor profile is applied by default.
    • Setting the SELinux type is restricted, and setting a custom SELinux user or role option is forbidden.
    • Seccomp profile must not be explicitly set to Unconfined.
    • Disallow the configuration of Sysctls
  • restricted - The Restricted policy is aimed at enforcing current Pod hardening best practices.
    • Disallow Privilege Escalation
    • Disallow running cotainer as root user, group and uid or guid as 0
    • Seccomp profile must be explicitly set to one of the allowed values.
    • Containers must drop ALL capabilities, and are only permitted to add back the NET_BIND_SERVICE capability.

Policies are applied in a specific mode. The modes are:

  • enforce — Any Pods that violate the policy will be rejected
  • audit — Violations will be recorded as an annotation in the audit logs, but don’t affect whether the pod is allowed.
  • warn — Violations will send a warning message back to the user, but don’t affect whether the pod is allowed.

Demo:

Start a cluster with kind for the demo:

kind create cluster --image kindest/node:v1.23.0
kubectl cluster-info --context kind-kind
kubectx kind-kind

Find the enabled admission plugins:

$ kubectl -n kube-system exec kube-apiserver-kind-control-plane -it -- kube-apiserver -h | grep "default enabled ones"
...
      --enable-admission-plugins strings
admission plugins that should be enabled in addition
to default enabled ones (NamespaceLifecycle, LimitRanger,
ServiceAccount, TaintNodesByCondition, PodSecurity, Priority,
DefaultTolerationSeconds, DefaultStorageClass,
StorageObjectInUseProtection, PersistentVolumeClaimResize,
RuntimeClass, CertificateApproval, CertificateSigning,
CertificateSubjectRestriction, DefaultIngressClass,
MutatingAdmissionWebhook, ValidatingAdmissionWebhook,
ResourceQuota).
...

Policies are applied to a namespace via labels. These labels are as follows:

  • pod-security.kubernetes.io/: (required to enable pod security)
  • pod-security.kubernetes.io/-version: (optional, defaults to latest)

Deploy demo workload:

kubectl create ns verify-pod-security
kubens verify-pod-security

# enforces a "restricted" security policy and audits on restricted
kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted

Next, try to deploy a privileged pod:

cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-privileged
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: true
EOF

The output is similar to this:

Error from server (Forbidden): error when creating "STDIN": pods "busybox-privileged" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Now let’s apply the privileged Pod Security level and try tp deploy again.

# enforces a "privileged" security policy and warns / audits on baseline
kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=privileged \
  pod-security.kubernetes.io/warn=baseline \
  pod-security.kubernetes.io/audit=baseline

Now the pod is created:

pod/busybox-privileged created

Let’s apply the baseline Pod Security level and try again.

# enforces a "baseline" security policy and warns / audits on restricted
kubectl label --overwrite ns verify-pod-security \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted
cat <<EOF | kubectl -n verify-pod-security apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-baseline
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
          - NET_BIND_SERVICE
          - CHOWN
EOF

The output is similar to the following. Note that the warnings match the error message from the restricted policy, but the pod is still successfully created.

Warning: would violate PodSecurity "restricted:latest": unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]; container "busybox" must not include "CHOWN" in securityContext.capabilities.add), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/busybox-baseline created

You ken use Kyverno to autamate the creation of the labels at namespace. The usage Kyverno is out of the scope of this post but if you want to know more about about this topic check my other post.

Applying a cluster-wide policy

In addition to applying labels to namespaces to configure policy you can also configure cluster-wide policies and exemptions using the AdmissionConfiguration resource.

First create a new kind cluster with :

kind delete cluster

Create a Pod Security configuration that enforce and audit baseline policies while using a restricted profile to warn the end user.

cat <<EOF > pod-security.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1beta1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "baseline"
      enforce-version: "latest"
      audit: "baseline"
      audit-version: "latest"
      warn: "restricted"
      warn-version: "latest"
      audit: "restricted"
      audit-version: "latest"
    exemptions:
      # Array of authenticated usernames to exempt.
      usernames: []
      # Array of runtime class names to exempt.
      runtimeClasses: []
      # Array of namespaces to exempt.
      namespaces: [kube-system]
EOF
cat <<EOF > kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
        # enable admission-control-config flag on the API server
        extraArgs:
          admission-control-config-file: /etc/kubernetes/policies/pod-security.yaml
        # mount new file / directories on the control plane
        extraVolumes:
          - name: policies
            hostPath: /etc/kubernetes/policies
            mountPath: /etc/kubernetes/policies
            readOnly: true
            pathType: "DirectoryOrCreate"    
  # mount the local file on the control plane
  extraMounts:
  - hostPath: ./pod-security.yaml
    containerPath: /etc/kubernetes/policies/pod-security.yaml
    readOnly: true
EOF
kind create cluster --image kindest/node:v1.23.0 --config kind-config.yaml
kubectl cluster-info --context kind-kind
kubectx kind-kind

Let’s create a new namespace and see if the labels apply there.

$ kubectl create namespace test-defaults
namespace/test-defaults created

$ kubectl describe namespace test-defaults
Name:         test-defaults
Labels:       kubernetes.io/metadata.name=test-defaults
Annotations:  <none>
Status:       Active

No resource quota.

No LimitRange resource.

Can a privileged workload be deployed?

cat <<EOF | kubectl -n test-defaults apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox-privileged
spec:
  containers:
  - name: busybox
    image: busybox
    args:
    - sleep
    - "1000000"
    securityContext:
      allowPrivilegeEscalation: true
EOF

The default warn level is working.

Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/busybox-privileged created

Check the API server metrics endpoint:

kubectl get --raw /metrics | grep pod_security_evaluations_total

# HELP pod_security_evaluations_total [ALPHA] Number of policy evaluations that occurred, not counting ignored or exempt requests.
# TYPE pod_security_evaluations_total counter
pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 2
pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="privileged",policy_version="latest",request_operation="create",resource="pod",subresource=""} 0
pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="privileged",policy_version="latest",request_operation="update",resource="pod",subresource=""} 0
pod_security_evaluations_total{decision="deny",mode="audit",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 1
pod_security_evaluations_total{decision="deny",mode="enforce",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 1
pod_security_evaluations_total{decision="deny",mode="warn",policy_level="restricted",policy_version="latest",request_operation="create",resource="controller",subresource=""} 2
pod_security_evaluations_total{decision="deny",mode="warn",policy_level="restricted",policy_version="latest",request_operation="create",resource="pod",subresource=""} 2

Auditing

Example audit-policy.yaml configuration tuned for Pod Security events:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
    - group: "" # core API group
      resources: ["pods", "pods/ephemeralcontainers", "podtemplates", "replicationcontrollers"]
    - group: "apps"
      resources: ["daemonsets", "deployments", "replicasets", "statefulsets"]
    - group: "batch"
      resources: ["cronjobs", "jobs"]
  verbs: ["create", "update"]
  omitStages:
    - "RequestReceived"
    - "ResponseStarted"
    - "Panic"

The enableing of the audit-policy function is out of the scope of this post but if you want to know more about about this topic check my other post.

PSP migrations

If you’re already using PSP, SIG Auth has created a guide and published the steps to migrate off of PSP.