How to Manage Kubernetes Resource Limits

In this post I will show you the usage of the Kubernetes limits and requests.

Kubernetes schedule and orchestrate containers on the underlying shared set of physical resources. By default, a pod in Kubernetes will run with no limits on CPU and memory in a namespace. However, this can create several problems related to resources. So there is no control of how much resources each pod can use. Some a hungry container can eat the resources from other container on the node.

Resource types

CPU and memory are each a resource type. Every resource type has a base unit. CPU represents compute processing and is specified in units of CPU cores. 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine. You can use a fraction of a cpu by setting to 0.5 For CPU resource units, the quantity expression 0.1 is equivalent to the expression 100m, which can be read as “one hundred millicpu”.

The other common resource is memory. Memory is specified in units of bytes. You can express memory as a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, k or Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent roughly the same value:

128974848, 129e6, 129M,  128974848000m, 123Mi

Understanding Requests

A request is the amount of resources that the system will guarantee for the container. Kubernetes will use this value to decide on which node to place the pod. In the case that request is not set for a container, the defaults is the same as the resource limit. The Kubelet reserves the requested amount of system resource. That means no mather the container use the requested resource or not other containers can not use that reserved resources.

Understanding Limits

A limit is the maximum amount of resources that Kubernetes will allow the container to use. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error. If limit is not set, then the default is to set 0 (unbounded).

Request vs Limit

Examples

Below is an example of a pod configuration file with requests and limits set for CPU and memory of two containers in a pod.

apiVersion: v1
kind: Pod
metadata:
  name: demo
spec:
  containers:
  - name: demo1
    image: nginx
    resources:
      requests:
        memory: "16Mi"
        cpu: "100m"
      limits:
        memory: "32Mi"
        cpu: "200m"
  - name: demo2
    image: mysql
    resources:
      requests:
        memory: "64Mi"
        cpu: "200m"
      limits:
        memory: "128Mi"
        cpu: "400m"

Run the following command to inspect the resources used by the pod:

$ kubectl top pods
NAME          CPU(cores)   MEMORY(bytes)
demo          201m         27Mi

You can do a similar test for nodes or namespaces:

$ kubectl top node

NAME                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8s-m101                     1338m        66%    3586Mi          45%
k8s-w102                     2488m        31%    18918Mi         58%

Yo can use a handy tool called kube-capacity that provides an overview of the resource requests, limits, and utilization in a Kubernetes cluster:

kubectl krew install resource-capacity
$ kube-capacity

NODE        CPU REQUESTS    CPU LIMITS    MEMORY REQUESTS    MEMORY LIMITS
*           560m (28%)      130m (7%)     572Mi (9%)         770Mi (13%)
k8s-m101    220m (22%)      10m (1%)      192Mi (6%)         360Mi (12%)
k8s-w102    340m (34%)      120m (12%)    380Mi (13%)        410Mi (14%)

Finding the right requests and limits

For this demo I will use a Python app using the Flask framework. A simple cache service which has two endpoints, one to cache the data and another for retrieving it.

Before you start, make sure that your cluster has the metrics server installed. In my case I use minikube and I can enable it as a plugin:

$ minikube addons enable metrics-server

You can deploy the application with the following YAML file:

nano deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-cache
spec:
  replicas: 1
  selector:
    matchLabels:
      name: flask-cache
  template:
    metadata:
      labels:
        name: flask-cache
    spec:
      containers:
        - name: cache-service
          image: xasag94215/flask-cache
          ports:
            - containerPort: 5000
              name: rest
---
apiVersion: v1
kind: Service
metadata:
  name: flask-cache
spec:
  selector:
    name: flask-cache
  ports:
    - port: 80
      targetPort: 5000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flask-cache
spec:
  rules:
  - host: limit-demo.mydomain.intra
    http:
      paths:
      - backend:
          service:
            name: flask-cache
            port:
              number: 80
        path: /
        pathType: Prefix
$ kubectl top pods

NAME                           CPU(cores)   MEMORY(bytes)
flask-cache-8484fd74dd-sxpk7   188m         151Mi

Requests and limits depend on how much memory and CPU the application uses. An application that serves static pages might have a memory and CPU mostly static. It is common o have metrics server and prometheus database to store your metrics. The you can get max and min of the CPU and memory and extrapolate requests and limits. But there’s a quicker way.

Vertical Pod Autoscaler

The SIG-autoscaling group developed a tool that can do that automatically: the Vertical Pod Autoscaler (VPA). The Vertical Pod Autoscaler is a Kubernetes Operator that estimates the correct requests and limits for Pod.

Limitations:

  • Vertical Pod autoscaling supports a maximum of 500 VerticalPodAutoscaler objects per cluster.
  • Vertical Pod autoscaling is not ready for use with JVM-based workloads due to limited visibility into actual memory usage of the workload.
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm upgrade --install vpa fairwinds-stable/vpa \
--namespace vpa-system --create-namespace \
--set admissionController.enabled=true \
--set updater.extraArgs.min-replicas=1

So if you want to the Vertical Pod Autoscaler (VPA) to estimate limits and requests for your Flask app, you should create the following YAML file:

nano vpa.yaml
---
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: flask-cache
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: flask-cache
  updatePolicy:
    updateMode: "Off"

You can query the vpa object with:

$ kubectl describe vpa flask-cache
...
Status:
  Conditions:
    Last Transition Time:  2022-11-14T13:58:08Z
    Status:                True
    Type:                  RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  cache-service
      Lower Bound:
        Cpu:     25m
        Memory:  60194k
      Target:
        Cpu:     410m
        Memory:  262144k
      Uncapped Target:
        Cpu:     410m
        Memory:  262144k
      Upper Bound:
        Cpu:     1
        Memory:  500Mi
Events:          <none>

In the lower part of the output, the autoscaler has three sections:

  • The lower bound is the minimum estimation for the container.
  • The upper bound is the maximum recommended resource estimation for the container.
  • Target estimation is the one we will use for setting resource requests.
  • All of these estimations are capped based on min allowed and max allowed container policies.
  • The uncapped target estimation is a target estimation produced if there were no minAllowed and maxAllowed restrictions.

You can start a load tester and keep inspecting the Vertical Pod Autoscaler (VPA) recommendation. Once the recommendations are stable, you can apply them back to your deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-cache
spec:
  replicas: 1
  selector:
    matchLabels:
      name: flask-cache
  template:
    metadata:
      labels:
        name: flask-cache
    spec:
      containers:
        - name: cache-service
          image: xasag94215/flask-cache
          ports:
            - containerPort: 5000
              name: rest
            resources:
              requests:
                cpu: 25m
                memory: 64Mi
              limits:
                cpu: 410m
                memory: 512Mi

If updateMode: "Auto" is set then the VerticalPodAutoscaler automatically patch the pod at next creation.

nano vpa.yaml
---
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: flask-cache
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: flask-cache
  updatePolicy:
    updateMode: "Auto"
kubectl delete po flask-cache-8484fd74dd-hpmhh --force

kubectl get po -o yaml flask-cache-8484fd74dd-cdxkn | grep vpa
    vpaObservedContainers: cache-service
    vpaUpdates: 'Pod resources updated by flask-cache: container 0: cpu request, memory
kubectl get po -o yaml flask-cache-8484fd74dd-cdxkn
...
spec:
  containers:
  - image: xasag94215/flask-cache
    imagePullPolicy: Always
    name: cache-service
    ports:
    - containerPort: 5000
      name: rest
      protocol: TCP
    resources:
      requests:
        cpu: 350m
        memory: "248153480"

Visualising limits and requests recommendations

If you prefer to use a graphical interface to inspect VPA you can use Goldilocks dashboard

nano goldilocks-values.yaml
dashboard:
  ingress:
    enabled: true
    hosts:
      - host: goldilocks.mydomain.intra
        paths:
          - path: "/"
            type: "ImplementationSpecific"
---
helm upgrade --install goldilocks --namespace vpa-system fairwinds-stable/goldilocks -f goldilocks-values.yaml

kubectl label ns goldilocks goldilocks.fairwinds.com/enabled=true

If you want Goldilocks to display Vertical Pod Autoscaler (VPA) recommendations, you should tag the namespace with a particular label:

kubectl label namespace default goldilocks.fairwinds.com/enabled=true