K8S Logging And Monitoring
In this tutorial I will show you how to install a prometheus operator to monotor kubernetes and loki to gether logs.
Parst of the K8S Security series
- Part1: Best Practices to keeping Kubernetes Clusters Secure
- Part2: Kubernetes Hardening Guide with CIS 1.6 Benchmark
- Part3: RKE2 The Secure Kubernetes Engine
- Part4: RKE2 Install With cilium
- Part5: Kubernetes Certificate Rotation
- Part6: Hardening Kubernetes with seccomp
- Part7a: RKE2 Pod Security Policy
- Part7b: Kubernetes Pod Security Admission
- Part7c: Pod Security Standards using Kyverno
- Part8: Kubernetes Network Policy
- Part9: Kubernetes Cluster Policy with Kyverno
- Part10: Using Admission Controllers
- Part11a: Image security Admission Controller
- Part11b: Image security Admission Controller V2
- Part11c: Image security Admission Controller V3
- Part12: Continuous Image security
- Part13: K8S Logging And Monitoring
- Part14: Kubernetes audit logs and Falco
- Part15a Image Signature Verification with Connaisseur
- Part15b Image Signature Verification with Connaisseur 2.0
- Part15c Image Signature Verification with Kyverno
- Part16a Backup your Kubernetes Cluster
- Part16b How to Backup Kubernetes to git?
- Part17a Kubernetes and Vault integration
- Part17b Kubernetes External Vault integration
- Part18a: ArgoCD and kubeseal to encript secrets
- Part18b: Flux2 and kubeseal to encrypt secrets
- Part18c: Flux2 and Mozilla SOPS to encrypt secrets
- Part19: ArgoCD auto image updater
- Part20: Secure k3s with gVisor
- Part21: How to use imagePullSecrets cluster-wide??
- Part22: Automatically change registry in pod definition
Prometheus is an open-source monitoring system with a built-in noSQL time-series database. It offers a multi-dimensional data model, a flexible query language, and diverse visualization possibilities. Prometheus collects metrics from http nedpoint. Most service dind’t have this endpoint so you need optional programs that generate additional metrics cald exporters.
Monitoring
nano values.yaml
---
global:
rbac:
create: true
pspEnabled: true
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 10Gi
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ca-issuer
hosts:
- alertmanager.k8s.intra
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: tls-alertmanager-cert
hosts:
- alertmanager.k8s.intra
grafana:
rbac:
enable: true
pspEnabled: true
pspUseAppArmor: false
initChownData:
enabled: false
enabled: true
adminPassword: Password1
plugins:
- grafana-piechart-panel
persistence:
enabled: true
size: 10Gi
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ca-issuer
hosts:
- grafana.k8s.intra
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: tls-grafana-cert
hosts:
- grafana.k8s.intra
prometheus:
enabled: true
prometheusSpec:
podMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
secrets: ['etcd-client-cert']
storageSpec:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 10Gi
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ca-issuer
hosts:
- prometheus.k8s.intra
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: tls-prometheus-cert
hosts:
- prometheus.k8s.intra
There is a bug in the Grafana helm chart so it didn’t sreate the psp correcly for the init container: https://github.com/grafana/helm-charts/issues/427
# solution
kubectl edit psp prometheus-grafana
...
runAsUser:
rule: RunAsAny
...
kubectl get rs
NAME DESIRED CURRENT READY AGE
prometheus-grafana-74b5d957bc 1 0 0 12m
...
kubectl delete rs prometheus-grafana-74b5d957bc
#### grafana dashboards
## RKE2
# 14243
## NGINX Ingress controller
# 9614
## cert-manager
# 11001
## longhorn
# 13032
### kyverno
# https://raw.githubusercontent.com/kyverno/grafana-dashboard/master/grafana/dashboard.json
### calico
# 12175
# 3244
### cilium
# 6658
# 14500
# 14502
# 14501
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f values.yaml
For the proxy down status:
kubectl edit cm/kube-proxy -n kube-system
...
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0:10249
...
If you use rke2 you can configure this from the helm chart before first start:
cat << EOF > /var/lib/rancher/rke2/server/manifests/rke2-kube-proxy-config.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-kube-proxy
namespace: kube-system
spec:
valuesContent: |-
metricsBindAddress: 0.0.0.0:10249
EOF
For the controller-manager down status:
nano /etc/kubernetes/manifests/kube-controller-manager.yaml
# OR
nano /var/lib/rancher/rke2/agent/pod-manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
containers:
- command:
- kube-controller-manager
...
- --address=0.0.0.0
...
- --bind-address=<your control-plane IP or 0.0.0.0>
...
livenessProbe:
failureThreshold: 8
httpGet:
host: 0.0.0.0
...
For the kube-scheduler down status:
nano /etc/kubernetes/manifests/kube-scheduler.yaml
# OR
nano /var/lib/rancher/rke2/agent/pod-manifests/kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
containers:
- command:
- kube-scheduler
- --address=0.0.0.0
- --bind-address=0.0.0.0
...
livenessProbe:
failureThreshold: 8
httpGet:
host: 0.0.0.0
...
For the etcd down status firs we need to create a secret to authenticate for the etcd:
# kubeadm
kubectl -n monitoring create secret generic etcd-client-cert \
--from-file=/etc/kubernetes/pki/etcd/ca.crt \
--from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
# rancher
kubectl -n monitoring create secret generic etcd-client-cert \
--from-file=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
--from-file=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
--from-file=/var/lib/rancher/rke2/server/tls/etcd/server-client.key
Then we configure the prometheus to use it:
nano values.yaml
---
...
prometheus:
enabled: true
prometheusSpec:
secrets: ['etcd-client-cert']
...
kubeEtcd:
enabled: true
service:
port: 2379
targetPort: 2379
selector:
component: etcd
serviceMonitor:
interval: ""
scheme: https
insecureSkipVerify: true
serverName: ""
metricRelabelings: []
relabelings: []
caFile: /etc/prometheus/secrets/etcd-client-cert/server-ca.crt
certFile: /etc/prometheus/secrets/etcd-client-cert/server-client.crt
keyFile: /etc/prometheus/secrets/etcd-client-cert/server-client.key
# for kubeadm
# caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
# certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
# keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key
Monitoring Nginx
cat << EOF > /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
metrics:
enabled: true
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10254"
serviceMonitor:
enabled: true
namespace: "monitoring"
EOF
kaf /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
Monitoring Core-DNS
cat << EOF > default-network-dns-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-network-dns-policy
namespace: kube-system
spec:
ingress:
- ports:
- port: 53
protocol: TCP
- port: 53
protocol: UDP
- port: 9153
protocol: TCP
podSelector:
matchLabels:
k8s-app: kube-dns
policyTypes:
- Ingress
EOF
kaf default-network-dns-policy.yaml
Monitor cert-manager
nano 01-cert-managger.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: ingress-system
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: cert-manager
namespace: ingress-system
spec:
repo: "https://charts.jetstack.io"
chart: cert-manager
targetNamespace: ingress-system
valuesContent: |-
installCRDs: true
clusterResourceNamespace: "ingress-system"
prometheus:
enabled: true
servicemonitor:
enabled: true
namespace: "monitoring"
kubectl apply -f 01-cert-managger.yaml
Longhorn
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: longhorn-prometheus-servicemonitor
namespace: monitoring
labels:
name: longhorn-prometheus-servicemonitor
spec:
selector:
matchLabels:
app: longhorn-manager
namespaceSelector:
matchNames:
- longhorn-system
endpoints:
- port: manager
Logging
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm search repo loki
helm upgrade --install loki-stack grafana/loki-stack \
--create-namespace \
--namespace loki-stack \
--set promtail.enabled=true,loki.persistence.enabled=true,loki.persistence.size=10Gi
Promtail dose not working with enabled selinux
, because this promtail deployment store som files on the host filesystem and selinux
dose not allow to write it, so you need ti use fluent-bit.
helm upgrade --install loki-stack grafana/loki-stack \
--create-namespace \
--namespace loki-stack \
--set fluent-bit.enabled=true,promtail.enabled=false \
--set loki.persistence.enabled=true,loki.persistence.size=10Gi
Add datasource to grafana:
type: loki
name: Loki
url: http://loki-stack.loki-stack:3100