K8S Logging And Monitoring
In this tutorial I will show you how to install a prometheus operator to monotor kubernetes and loki to gether logs.
Parts of the K8S Security Lab series
Container Runetime Security
- Part1: How to deploy CRI-O with Firecracker?
- Part2: How to deploy CRI-O with gVisor?
- Part3: How to deploy containerd with Firecracker?
- Part4: How to deploy containerd with gVisor?
- Part5: How to deploy containerd with kata containers?
Advanced Kernel Security
- Part1: Hardening Kubernetes with seccomp
- Part2: Linux user namespace management wit CRI-O in Kubernetes
- Part3: Hardening Kubernetes with seccomp
Network Security
- Part1: RKE2 Install With Calico
- Part2: RKE2 Install With Cilium
- Part3: CNI-Genie: network separation with multiple CNI
- Part3: Configurre network wit nmstate operator
- Part3: Kubernetes Network Policy
- Part4: Kubernetes with external Ingress Controller with vxlan
- Part4: Kubernetes with external Ingress Controller with bgp
- Part4: Central authentication with oauth2-proxy
- Part5: Secure your applications with Pomerium Ingress Controller
- Part6: CrowdSec Intrusion Detection System (IDS) for Kubernetes
- Part7: Kubernetes audit logs and Falco
Secure Kubernetes Install
- Part1: Best Practices to keeping Kubernetes Clusters Secure
- Part2: Kubernetes Secure Install
- Part3: Kubernetes Hardening Guide with CIS 1.6 Benchmark
- Part4: Kubernetes Certificate Rotation
User Security
- Part1: How to create kubeconfig?
- Part2: How to create Users in Kubernetes the right way?
- Part3: Kubernetes Single Sign-on with Pinniped OpenID Connect
- Part4: Kubectl authentication with Kuberos Depricated !!
- Part5: Kubernetes authentication with Keycloak and gangway Depricated !!
- Part6: kube-openid-connect 1.0 Depricated !!
Image Security
Pod Security
- Part1: Using Admission Controllers
- Part2: RKE2 Pod Security Policy
- Part3: Kubernetes Pod Security Admission
- Part4: Kubernetes: How to migrate Pod Security Policy to Pod Security Admission?
- Part5: Pod Security Standards using Kyverno
- Part6: Kubernetes Cluster Policy with Kyverno
Secret Security
- Part1: Kubernetes and Vault integration
- Part2: Kubernetes External Vault integration
- Part3: ArgoCD and kubeseal to encript secrets
- Part4: Flux2 and kubeseal to encrypt secrets
- Part5: Flux2 and Mozilla SOPS to encrypt secrets
Monitoring and Observability
- Part6: K8S Logging And Monitoring
- Part7: Install Grafana Loki with Helm3
Backup
Prometheus is an open-source monitoring system with a built-in noSQL time-series database. It offers a multi-dimensional data model, a flexible query language, and diverse visualization possibilities. Prometheus collects metrics from http nedpoint. Most service dind’t have this endpoint so you need optional programs that generate additional metrics cald exporters.
Monitoring
nano values.yaml
---
global:
rbac:
create: true
pspEnabled: true
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 10Gi
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ca-issuer
hosts:
- alertmanager.k8s.intra
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: tls-alertmanager-cert
hosts:
- alertmanager.k8s.intra
grafana:
rbac:
enable: true
pspEnabled: true
pspUseAppArmor: false
initChownData:
enabled: false
enabled: true
adminPassword: Password1
plugins:
- grafana-piechart-panel
persistence:
enabled: true
size: 10Gi
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ca-issuer
hosts:
- grafana.k8s.intra
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: tls-grafana-cert
hosts:
- grafana.k8s.intra
prometheus:
enabled: true
prometheusSpec:
podMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
secrets: ['etcd-client-cert']
storageSpec:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 10Gi
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ca-issuer
hosts:
- prometheus.k8s.intra
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: tls-prometheus-cert
hosts:
- prometheus.k8s.intra
There is a bug in the Grafana helm chart so it didn’t sreate the psp correcly for the init container: https://github.com/grafana/helm-charts/issues/427
# solution
kubectl edit psp prometheus-grafana
...
runAsUser:
rule: RunAsAny
...
kubectl get rs
NAME DESIRED CURRENT READY AGE
prometheus-grafana-74b5d957bc 1 0 0 12m
...
kubectl delete rs prometheus-grafana-74b5d957bc
#### grafana dashboards
## RKE2
# 14243
## NGINX Ingress controller
# 9614
## cert-manager
# 11001
## longhorn
# 13032
### kyverno
# https://raw.githubusercontent.com/kyverno/grafana-dashboard/master/grafana/dashboard.json
### calico
# 12175
# 3244
### cilium
# 6658
# 14500
# 14502
# 14501
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
For the proxy down status:
kubectl edit cm/kube-proxy -n kube-system
...
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0:10249
...
If you use rke2 you can configure this from the helm chart before first start:
cat << EOF > /var/lib/rancher/rke2/server/manifests/rke2-kube-proxy-config.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-kube-proxy
namespace: kube-system
spec:
valuesContent: |-
metricsBindAddress: 0.0.0.0:10249
EOF
For the controller-manager down status:
nano /etc/kubernetes/manifests/kube-controller-manager.yaml
# OR
nano /var/lib/rancher/rke2/agent/pod-manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
containers:
- command:
- kube-controller-manager
...
- --address=0.0.0.0
...
- --bind-address=<your control-plane IP or 0.0.0.0>
...
livenessProbe:
failureThreshold: 8
httpGet:
host: 0.0.0.0
...
For the kube-scheduler down status:
nano /etc/kubernetes/manifests/kube-scheduler.yaml
# OR
nano /var/lib/rancher/rke2/agent/pod-manifests/kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
containers:
- command:
- kube-scheduler
- --address=0.0.0.0
- --bind-address=0.0.0.0
...
livenessProbe:
failureThreshold: 8
httpGet:
host: 0.0.0.0
...
For the etcd down status firs we need to create a secret to authenticate for the etcd:
# kubeadm
kubectl -n monitoring create secret generic etcd-client-cert \
--from-file=/etc/kubernetes/pki/etcd/ca.crt \
--from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
# rancher
kubectl -n monitoring create secret generic etcd-client-cert \
--from-file=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
--from-file=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
--from-file=/var/lib/rancher/rke2/server/tls/etcd/server-client.key
Then we configure the prometheus to use it:
nano values.yaml
---
...
prometheus:
enabled: true
prometheusSpec:
secrets: ['etcd-client-cert']
...
kubeEtcd:
enabled: true
service:
port: 2379
targetPort: 2379
selector:
component: etcd
serviceMonitor:
interval: ""
scheme: https
insecureSkipVerify: true
serverName: ""
metricRelabelings: []
relabelings: []
caFile: /etc/prometheus/secrets/etcd-client-cert/server-ca.crt
certFile: /etc/prometheus/secrets/etcd-client-cert/server-client.crt
keyFile: /etc/prometheus/secrets/etcd-client-cert/server-client.key
# for kubeadm
# caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
# certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
# keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key
Monitoring Nginx
cat << EOF > /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
metrics:
enabled: true
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10254"
serviceMonitor:
enabled: true
namespace: "monitoring"
EOF
kaf /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
Monitoring Core-DNS
cat << EOF > default-network-dns-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-network-dns-policy
namespace: kube-system
spec:
ingress:
- ports:
- port: 53
protocol: TCP
- port: 53
protocol: UDP
- port: 9153
protocol: TCP
podSelector:
matchLabels:
k8s-app: kube-dns
policyTypes:
- Ingress
EOF
kaf default-network-dns-policy.yaml
Monitor cert-manager
nano 01-cert-managger.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: ingress-system
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: cert-manager
namespace: ingress-system
spec:
repo: "https://charts.jetstack.io"
chart: cert-manager
targetNamespace: ingress-system
valuesContent: |-
installCRDs: true
clusterResourceNamespace: "ingress-system"
prometheus:
enabled: true
servicemonitor:
enabled: true
namespace: "monitoring"
kubectl apply -f 01-cert-managger.yaml
Longhorn
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: longhorn-prometheus-servicemonitor
namespace: monitoring
labels:
name: longhorn-prometheus-servicemonitor
spec:
selector:
matchLabels:
app: longhorn-manager
namespaceSelector:
matchNames:
- longhorn-system
endpoints:
- port: manager
Logging
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm search repo loki
helm upgrade --install loki-stack grafana/loki-stack \
--create-namespace \
--namespace loki-stack \
--set promtail.enabled=true,loki.persistence.enabled=true,loki.persistence.size=10Gi
Promtail dose not working with enabled selinux
, because this promtail deployment store som files on the host filesystem and selinux
dose not allow to write it, so you need ti use fluent-bit.
helm upgrade --install loki-stack grafana/loki-stack \
--create-namespace \
--namespace loki-stack \
--set fluent-bit.enabled=true,promtail.enabled=false \
--set loki.persistence.enabled=true,loki.persistence.size=10Gi
Add datasource to grafana:
type: loki
name: Loki
url: http://loki-stack.loki-stack:3100