K8S Logging And Monitoring

Page content

In this tutorial I will show you how to install a prometheus operator to monotor kubernetes and loki to gether logs.

Parts of the K8S Security Lab series

Container Runetime Security
Advanced Kernel Security
Network Security
Secure Kubernetes Install
User Security
Image Security
  • Part1: Image security Admission Controller
  • Part2: Image security Admission Controller V2
  • Part3: Image security Admission Controller V3
  • Part4: Continuous Image security
  • Part5: trivy-operator 1.0
  • Part6: trivy-operator 2.1: Trivy-operator is now an Admisssion controller too!!!
  • Part7: trivy-operator 2.2: Patch release for Admisssion controller
  • Part8: trivy-operator 2.3: Patch release for Admisssion controller
  • Part8: trivy-operator 2.4: Patch release for Admisssion controller
  • Part8: trivy-operator 2.5: Patch release for Admisssion controller
  • Part9_ Image Signature Verification with Connaisseur
  • Part10: Image Signature Verification with Connaisseur 2.0
  • Part11: Image Signature Verification with Kyverno
  • Part12: How to use imagePullSecrets cluster-wide??
  • Part13: Automatically change registry in pod definition
  • Part14: ArgoCD auto image updater
    Pod Security
    Secret Security
    Monitoring and Observability
    Backup

    Prometheus is an open-source monitoring system with a built-in noSQL time-series database. It offers a multi-dimensional data model, a flexible query language, and diverse visualization possibilities. Prometheus collects metrics from http nedpoint. Most service dind’t have this endpoint so you need optional programs that generate additional metrics cald exporters.

    Monitoring

    nano values.yaml
    ---
    global:
      rbac:
        create: true
        pspEnabled: true
    
    alertmanager:
      alertmanagerSpec:
        storage:
          volumeClaimTemplate:
            spec:
              resources:
                requests:
                  storage: 10Gi
      ingress:
        enabled: true
        annotations:
          cert-manager.io/cluster-issuer: ca-issuer
        hosts:
          - alertmanager.k8s.intra
        paths:
        - /
        pathType: ImplementationSpecific
        tls:
        - secretName: tls-alertmanager-cert
          hosts:
          - alertmanager.k8s.intra
    
    grafana:
      rbac:
        enable: true
        pspEnabled: true
        pspUseAppArmor: false
      initChownData:
        enabled: false
      enabled: true
      adminPassword: Password1
      plugins:
      - grafana-piechart-panel
      persistence:
        enabled: true
        size: 10Gi
      ingress:
        enabled: true
        annotations:
          cert-manager.io/cluster-issuer: ca-issuer
        hosts:
          - grafana.k8s.intra
        paths:
        - /
        pathType: ImplementationSpecific
        tls:
        - secretName: tls-grafana-cert
          hosts:
          - grafana.k8s.intra
    
    
    prometheus:
      enabled: true
      prometheusSpec:
        podMonitorSelectorNilUsesHelmValues: false
        serviceMonitorSelectorNilUsesHelmValues: false
        secrets: ['etcd-client-cert']
        storageSpec:
          volumeClaimTemplate:
            spec:
              resources:
                requests:
                  storage: 10Gi
      ingress:
        enabled: true
        annotations:
          cert-manager.io/cluster-issuer: ca-issuer
        hosts:
          - prometheus.k8s.intra
        paths:
        - /
        pathType: ImplementationSpecific
        tls:
        - secretName: tls-prometheus-cert
          hosts:
          - prometheus.k8s.intra
    

    There is a bug in the Grafana helm chart so it didn’t sreate the psp correcly for the init container: https://github.com/grafana/helm-charts/issues/427

    # solution
    kubectl edit psp prometheus-grafana
    ...
      runAsUser:
        rule: RunAsAny
    ...
    
    kubectl get rs
    NAME                                             DESIRED   CURRENT   READY   AGE
    prometheus-grafana-74b5d957bc                    1         0         0       12m
    ...
    
    kubectl delete rs prometheus-grafana-74b5d957bc
    
    #### grafana dashboards
    ## RKE2
    # 14243
    ## NGINX Ingress controller
    # 9614
    ## cert-manager
    # 11001
    ## longhorn
    # 13032
    ### kyverno
    # https://raw.githubusercontent.com/kyverno/grafana-dashboard/master/grafana/dashboard.json
    ### calico
    # 12175
    # 3244
    ### cilium
    # 6658
    # 14500
    # 14502
    # 14501
    
    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    
    helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
    

    For the proxy down status:

    kubectl edit cm/kube-proxy -n kube-system
    ...
    kind: KubeProxyConfiguration
    metricsBindAddress: 0.0.0.0:10249
    ...
    

    If you use rke2 you can configure this from the helm chart before first start:

    cat << EOF > /var/lib/rancher/rke2/server/manifests/rke2-kube-proxy-config.yaml
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
      name: rke2-kube-proxy
      namespace: kube-system
    spec:
      valuesContent: |-
        metricsBindAddress: 0.0.0.0:10249
    EOF
    

    For the controller-manager down status:

    nano /etc/kubernetes/manifests/kube-controller-manager.yaml
    # OR
    nano /var/lib/rancher/rke2/agent/pod-manifests/kube-controller-manager.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      ...
    spec:
      containers:
      - command:
        - kube-controller-manager
        ...
        - --address=0.0.0.0
        ...
        - --bind-address=<your control-plane IP or 0.0.0.0>
        ...
        livenessProbe:
          failureThreshold: 8
          httpGet:
           	host: 0.0.0.0
        ...
    

    For the kube-scheduler down status:

    nano /etc/kubernetes/manifests/kube-scheduler.yaml
    # OR
    nano /var/lib/rancher/rke2/agent/pod-manifests/kube-scheduler.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      ...
    spec:
      containers:
      - command:
        - kube-scheduler
        - --address=0.0.0.0
        - --bind-address=0.0.0.0
        ...
        livenessProbe:
          failureThreshold: 8
          httpGet:
           	host: 0.0.0.0
        ...
    

    For the etcd down status firs we need to create a secret to authenticate for the etcd:

    # kubeadm
    kubectl -n monitoring create secret generic etcd-client-cert \
    --from-file=/etc/kubernetes/pki/etcd/ca.crt \
    --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
    --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
    
    # rancher
    kubectl -n monitoring create secret generic etcd-client-cert \
    --from-file=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
    --from-file=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
    --from-file=/var/lib/rancher/rke2/server/tls/etcd/server-client.key
    

    Then we configure the prometheus to use it:

    nano values.yaml
    ---
    ...
    prometheus:
      enabled: true
      prometheusSpec:
        secrets: ['etcd-client-cert']
    ...
    kubeEtcd:
      enabled: true
      service:
        port: 2379
        targetPort: 2379
        selector:
          component: etcd
      serviceMonitor:
        interval: ""
        scheme: https
        insecureSkipVerify: true
        serverName: ""
        metricRelabelings: []
        relabelings: []
        caFile: /etc/prometheus/secrets/etcd-client-cert/server-ca.crt
        certFile: /etc/prometheus/secrets/etcd-client-cert/server-client.crt
        keyFile: /etc/prometheus/secrets/etcd-client-cert/server-client.key
    
    # for kubeadm
    #    caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
    #    certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
    #    keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key
    
    

    Monitoring Nginx

    cat << EOF > /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
      name: rke2-ingress-nginx
      namespace: kube-system
    spec:
      valuesContent: |-
        controller:
          metrics:    
    	      enabled: true
            service:
              annotations:
                prometheus.io/scrape: "true"
                prometheus.io/port: "10254"
            serviceMonitor:
              enabled: true
              namespace: "monitoring"
    EOF
    
    kaf /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
    

    Monitoring Core-DNS

    cat << EOF > default-network-dns-policy.yaml
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: default-network-dns-policy
      namespace: kube-system
    spec:
      ingress:
      - ports:
        - port: 53
          protocol: TCP
        - port: 53
          protocol: UDP
        - port: 9153
          protocol: TCP
      podSelector:
        matchLabels:
          k8s-app: kube-dns
      policyTypes:
      - Ingress
    EOF
    
    kaf default-network-dns-policy.yaml
    

    Monitor cert-manager

    nano 01-cert-managger.yaml
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: ingress-system
    ---
    apiVersion: helm.cattle.io/v1
    kind: HelmChart
    metadata:
      name: cert-manager
      namespace: ingress-system
    spec:
      repo: "https://charts.jetstack.io"
      chart: cert-manager
      targetNamespace: ingress-system
      valuesContent: |-
        installCRDs: true
        clusterResourceNamespace: "ingress-system"
        prometheus:
          enabled: true
          servicemonitor:
            enabled: true
            namespace: "monitoring"    
    
    kubectl apply -f 01-cert-managger.yaml
    

    Longhorn

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: longhorn-prometheus-servicemonitor
      namespace: monitoring
      labels:
        name: longhorn-prometheus-servicemonitor
    spec:
      selector:
        matchLabels:
          app: longhorn-manager
      namespaceSelector:
        matchNames:
        - longhorn-system
      endpoints:
      - port: manager
    

    Logging

    helm repo add grafana https://grafana.github.io/helm-charts
     
    helm repo update
     
    helm search repo loki
     
    helm upgrade --install loki-stack grafana/loki-stack \
    --create-namespace \
    --namespace loki-stack \
    --set promtail.enabled=true,loki.persistence.enabled=true,loki.persistence.size=10Gi
    

    Promtail dose not working with enabled selinux, because this promtail deployment store som files on the host filesystem and selinux dose not allow to write it, so you need ti use fluent-bit.

    helm upgrade --install loki-stack grafana/loki-stack \
    --create-namespace \
    --namespace loki-stack \
    --set fluent-bit.enabled=true,promtail.enabled=false \
    --set loki.persistence.enabled=true,loki.persistence.size=10Gi
    
    

    Add datasource to grafana:

    type: loki
    name: Loki
    url: http://loki-stack.loki-stack:3100