Kubernetes Pod Security Admission

Page content

With the release of Kubernetes v1.25, Pod Security Admission has now entered to stable and PodSecurityPolicy is removed. In this article, we cover the key concepts of Pod Security Admission along with how to use it.

Parts of the K8S Security Lab series

Container Runetime Security
Advanced Kernel Security
Network Security
Secure Kubernetes Install
User Security
Image Security
  • Part1: Image security Admission Controller
  • Part2: Image security Admission Controller V2
  • Part3: Image security Admission Controller V3
  • Part4: Continuous Image security
  • Part5: trivy-operator 1.0
  • Part6: trivy-operator 2.1: Trivy-operator is now an Admisssion controller too!!!
  • Part7: trivy-operator 2.2: Patch release for Admisssion controller
  • Part8: trivy-operator 2.3: Patch release for Admisssion controller
  • Part8: trivy-operator 2.4: Patch release for Admisssion controller
  • Part8: trivy-operator 2.5: Patch release for Admisssion controller
  • Part9_ Image Signature Verification with Connaisseur
  • Part10: Image Signature Verification with Connaisseur 2.0
  • Part11: Image Signature Verification with Kyverno
  • Part12: How to use imagePullSecrets cluster-wide??
  • Part13: Automatically change registry in pod definition
  • Part14: ArgoCD auto image updater
    Pod Security
    Secret Security
    Monitoring and Observability
    Backup

    What is a Pod Security Policy?

    A Pod Security Policy is a cluster-level resource that controls security sensitive aspects of the pod specification. RBAC Controlls the usable Kubernetes objects for a user but nt the conditions of a specific ofject like allow run as root or not in a container. PSP objects define a set of conditions that a pod must run with in order to be accepted into the system, as well as defaults for their related fields. PodSecurityPolicy is an optional admission controller that is enabled by default through the API, thus policies can be deployed without the PSP admission plugin enabled.

    What is a Pod Security Admission

    Pod Security Admissionis the successor to PodSecurityPolicy which was deprecated in the v1.21 release, and will be removed in Kubernetes v1.25. Pod Security Admission overcomes key shortcomings of Kubernetes' existing, PodSecurityPolicy (PSP) mechanism like: challenging to deploy with controllers and teh lack of dry-run/audit capabilities made it hard to enable PodSecurityPolicy.

    Configuring Pod Security Admission

    Pod Security Admission use profiles based on Pod Security Standards. These standards define three different policy levels:

    • privileged - Unrestricted policy that allows anything, including known privilege escalations. Apply this policy with caution.
    • baseline - Minimally restrictive policy that prevents known privilege escalations. Allows all default values for fields in Pod specifications.
      • Disables HostProcess for windows
      • Disables Host Namespaces on linux like: hostNetwork, hostPID and hostIPC
      • Disables Privileged Containers
      • Disallow adding of Capabilities
      • Disallow mounting of HostPath Volumes
      • Disallow usage of Host Ports
      • On supported hosts, the runtime/default AppArmor profile is applied by default.
      • Setting the SELinux type is restricted, and setting a custom SELinux user or role option is forbidden.
      • Seccomp profile must not be explicitly set to Unconfined.
      • Disallow the configuration of Sysctls
    • restricted - Most restrictive policy. Complies with Pod hardening best practices.
      • Disallow Privilege Escalation
      • Disallow running cotainer as root user, group and uid or guid as 0
      • Seccomp profile must be explicitly set to one of the allowed values.
      • Containers must drop ALL capabilities, and are only permitted to add back the NET_BIND_SERVICE capability.

    Policies are applied in a specific mode. The modes are:

    • enforce — Any Pods that violate the policy will be rejected
    • audit — Violations will be recorded as an annotation in the audit logs, but don’t affect whether the pod is allowed.
    • warn — Violations will send a warning message back to the user, but don’t affect whether the pod is allowed.

    Demo:

    Start a cluster with kind for the demo:

    kind create cluster --image kindest/node:v1.23.0
    kubectl cluster-info --context kind-kind
    kubectx kind-kind
    

    Find the enabled admission plugins:

    $ kubectl -n kube-system exec kube-apiserver-kind-control-plane -it -- kube-apiserver -h | grep "default enabled ones"
    ...
          --enable-admission-plugins strings
    admission plugins that should be enabled in addition
    to default enabled ones (NamespaceLifecycle, LimitRanger,
    ServiceAccount, TaintNodesByCondition, PodSecurity, Priority,
    DefaultTolerationSeconds, DefaultStorageClass,
    StorageObjectInUseProtection, PersistentVolumeClaimResize,
    RuntimeClass, CertificateApproval, CertificateSigning,
    CertificateSubjectRestriction, DefaultIngressClass,
    MutatingAdmissionWebhook, ValidatingAdmissionWebhook,
    ResourceQuota).
    ...
    

    Policies are applied to a namespace via labels. These labels are as follows:

    • pod-security.kubernetes.io/: (required to enable pod security Admission)
    • pod-security.kubernetes.io/-version: (optional, defaults to latest)

    Deploy demo workload:

    kubectl create ns verify-pod-security
    kubens verify-pod-security
    
    # enforces a "restricted" security policy and audits on restricted
    kubectl label --overwrite ns verify-pod-security \
      pod-security.kubernetes.io/enforce=restricted \
      pod-security.kubernetes.io/audit=restricted
    

    Next, try to deploy a privileged pod:

    cat <<EOF | kubectl -n verify-pod-security apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox-privileged
    spec:
      containers:
      - name: busybox
        image: busybox
        args:
        - sleep
        - "1000000"
        securityContext:
          allowPrivilegeEscalation: true
    EOF
    

    The output is similar to this:

    Error from server (Forbidden): error when creating "STDIN": pods "busybox-privileged" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
    

    Now let’s apply the privileged Pod Security Admission level and try tp deploy again.

    # enforces a "privileged" security policy and warns / audits on baseline
    kubectl label --overwrite ns verify-pod-security \
      pod-security.kubernetes.io/enforce=privileged \
      pod-security.kubernetes.io/warn=baseline \
      pod-security.kubernetes.io/audit=baseline
    

    Now the pod is created:

    pod/busybox-privileged created
    

    Let’s apply the baseline Pod Security Admission level and try again.

    # enforces a "baseline" security policy and warns / audits on restricted
    kubectl label --overwrite ns verify-pod-security \
      pod-security.kubernetes.io/enforce=baseline \
      pod-security.kubernetes.io/warn=restricted \
      pod-security.kubernetes.io/audit=restricted
    
    cat <<EOF | kubectl -n verify-pod-security apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox-baseline
    spec:
      containers:
      - name: busybox
        image: busybox
        args:
        - sleep
        - "1000000"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
              - NET_BIND_SERVICE
              - CHOWN
    EOF
    

    The output is similar to the following. Note that the warnings match the error message from the restricted policy, but the pod is still successfully created.

    Warning: would violate PodSecurity "restricted:latest": unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]; container "busybox" must not include "CHOWN" in securityContext.capabilities.add), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
    pod/busybox-baseline created
    

    You ken use Kyverno to autamate the creation of the labels at namespace. The usage Kyverno is out of the scope of this post but if you want to know more about about this topic check my other post.

    Applying a cluster-wide policy

    In addition to applying labels to namespaces to configure policy you can also configure cluster-wide policies and exemptions using the AdmissionConfiguration resource.

    First create a new kind cluster with :

    kind delete cluster
    

    Create a Pod Security Admission configuration that enforce and audit baseline policies while using a restricted profile to warn the end user.

    cat <<EOF > pod-security.yaml
    apiVersion: apiserver.config.k8s.io/v1
    kind: AdmissionConfiguration
    plugins:
    - name: PodSecurity
      configuration:
        apiVersion: pod-security.admission.config.k8s.io/v1beta1
        kind: PodSecurityConfiguration
        defaults:
          enforce: "baseline"
          enforce-version: "latest"
          audit: "baseline"
          audit-version: "latest"
          warn: "restricted"
          warn-version: "latest"
          audit: "restricted"
          audit-version: "latest"
        exemptions:
          # Array of authenticated usernames to exempt.
          usernames: []
          # Array of runtime class names to exempt.
          runtimeClasses: []
          # Array of namespaces to exempt.
          namespaces: [kube-system]
    EOF
    
    cat <<EOF > kind-config.yaml
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    nodes:
    - role: control-plane
      kubeadmConfigPatches:
      - |
        kind: ClusterConfiguration
        apiServer:
            # enable admission-control-config flag on the API server
            extraArgs:
              admission-control-config-file: /etc/kubernetes/policies/pod-security.yaml
            # mount new file / directories on the control plane
            extraVolumes:
              - name: policies
                hostPath: /etc/kubernetes/policies
                mountPath: /etc/kubernetes/policies
                readOnly: true
                pathType: "DirectoryOrCreate"    
      # mount the local file on the control plane
      extraMounts:
      - hostPath: ./pod-security.yaml
        containerPath: /etc/kubernetes/policies/pod-security.yaml
        readOnly: true
    EOF
    
    kind create cluster --image kindest/node:v1.23.0 --config kind-config.yaml
    kubectl cluster-info --context kind-kind
    kubectx kind-kind
    

    Let’s create a new namespace and see if the labels apply there.

    $ kubectl create namespace test-defaults
    namespace/test-defaults created
    
    $ kubectl describe namespace test-defaults
    Name:         test-defaults
    Labels:       kubernetes.io/metadata.name=test-defaults
    Annotations:  <none>
    Status:       Active
    
    No resource quota.
    
    No LimitRange resource.
    

    Can a privileged workload be deployed?

    cat <<EOF | kubectl -n test-defaults apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox-privileged
    spec:
      containers:
      - name: busybox
        image: busybox
        args:
        - sleep
        - "1000000"
        securityContext:
          allowPrivilegeEscalation: true
    EOF
    

    The default warn level is working.

    Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "busybox" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "busybox" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "busybox" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "busybox" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
    pod/busybox-privileged created
    

    Check the API server metrics endpoint:

    kubectl get --raw /metrics | grep pod_security_evaluations_total
    
    # HELP pod_security_evaluations_total [ALPHA] Number of policy evaluations that occurred, not counting ignored or exempt requests.
    # TYPE pod_security_evaluations_total counter
    pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 2
    pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="privileged",policy_version="latest",request_operation="create",resource="pod",subresource=""} 0
    pod_security_evaluations_total{decision="allow",mode="enforce",policy_level="privileged",policy_version="latest",request_operation="update",resource="pod",subresource=""} 0
    pod_security_evaluations_total{decision="deny",mode="audit",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 1
    pod_security_evaluations_total{decision="deny",mode="enforce",policy_level="baseline",policy_version="latest",request_operation="create",resource="pod",subresource=""} 1
    pod_security_evaluations_total{decision="deny",mode="warn",policy_level="restricted",policy_version="latest",request_operation="create",resource="controller",subresource=""} 2
    pod_security_evaluations_total{decision="deny",mode="warn",policy_level="restricted",policy_version="latest",request_operation="create",resource="pod",subresource=""} 2
    

    Auditing

    Example audit-policy.yaml configuration tuned for Pod Security Admission events:

    apiVersion: audit.k8s.io/v1
    kind: Policy
    rules:
    - level: RequestResponse
      resources:
        - group: "" # core API group
          resources: ["pods", "pods/ephemeralcontainers", "podtemplates", "replicationcontrollers"]
        - group: "apps"
          resources: ["daemonsets", "deployments", "replicasets", "statefulsets"]
        - group: "batch"
          resources: ["cronjobs", "jobs"]
      verbs: ["create", "update"]
      omitStages:
        - "RequestReceived"
        - "ResponseStarted"
        - "Panic"
    

    The enableing of the audit-policy function is out of the scope of this post but if you want to know more about about this topic check my other post.

    PSP migrations

    If you’re already using PSP, I created a guide and published the steps to migrate off of PSP.