Workshop: Chaos Engineering
Chaos Engineering is defined as the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
This document captures some hands on excercises I used during a chaos engineering workshop.
Application level experiments
Leveraging a combination of OpenShift, Istio, Kiali, ArgoCD and Grafana we can run a great workshop for application level chaos engineering experiments using service mesh fault injection.
A guide for this portion of the workshop is available here.
Cluster level experiments
After completing the above individual hands on excercises the workshop group will come back together to discuss cluster level experiments and follow through the outline below to run some basic experiments.
Ensure we are logged into our experiment cluster
oc login --token <token> --server <server>
Start a cerberus cluster monitoring instance
podman run --net=host --name=cerberus --env-host=true --privileged -d -v /home/james/.kube/config:/root/.kube/config:Z quay.io/openshift-scale/cerberus:kraken-hub
Test that cerberus is serving and cluster is ready
curl -v localhost:8080
Start kraken with cerberus enabled and inject pod failures
export CERBERUS_ENABLED=true
export CERBERUS_URL=http://0.0.0.0:8080
export NAMESPACE=openshift-etcd
export POD_LABEL=app=etcd
export DISRUPTION_COUNT=1
export EXPECTED_POD_COUNT=3
podman run --privileged --name=kraken --net=host --env-host=true -v /home/james/.kube/config:/root/.kube/config:Z -d quay.io/openshift-scale/kraken:pod-scenarios