All COSSAS projects are hosted on GitLab with a push mirror to GitHub. For issues/contributions check CONTRIBUTING.md
RELEASED: March 6th, 2025
LANGUAGE: Python
LICENSE: Apache 2.0
Documentation of the source code can be found through docs.
In the continuous battle between cyber attackers and defenders, the ultimate objective is to make software and systems autonomously cyber resilient. One way to implement autonomous resilience on the software deployment level is provided by TNO’s Self-Healing for Cyber Security (SH4CS) software, inspired by biological defence mechanisms.
It makes use of defensive mechanisms of the human body, where from three fundamental properties of the immune system inspiration is taken from, to make their systems autonomously resilient:
- Disposability: cell duplication and programmed or targeted cell death results in continuous cell regeneration, eliminating undetected abnormalities and reducing the likelihood of successful infections. Disposability of body cells is a prerequisite for the effectiveness of the immune system.
- Distribution: the more local the defence mechanism, the faster (but also less targeted) the response. The innate immune system acts much faster than the adaptive immune system, which in turn is faster than immunization.
- Response proportionality: the innate immune system is always the first line of defence. The more energy-consuming adaptive immune system is only activated to support the innate one when and where necessary.
The current SH4CS software primarily consists of Python code that implements (a) a decentralized rule system – also referred to as ‘Lymphocyte software’ – that executes healing functionality for an individual application container (by running as a sidecar in the same Kubernetes POD), (b) a metrics processor that enables the specification of monitorable metrics (using the Prometheus open source software) that will alert the Lymphocyte software. The software code was developed for deployment in modern container platforms empowered by Kubernetes and Prometheus.
An architecture diagram can be found here, alongside a more elaborate diagram of what the vision of Self-Healing for Cyber Security looks like. The draw.io files are included.
First make sure that you have minikube installed, then start a local minikube cluster.
minikube start
Next build the development images
docker compose -f compose.build.yaml build
Then load the development images into the cluster so we don't have to deal with pulling images from a (private) repository
minikube image load ci.tno.nl:4567/tri/self-healing/sh4cs2-testbed/lymphocyte:development
minikube image load ci.tno.nl:4567/tri/self-healing/sh4cs2-testbed/testapp:development
minikube image load ci.tno.nl:4567/tri/self-healing/sh4cs2-testbed/scenario-tester:development
Next, apply all manifests at once
kubectl apply -k manifests/
The regeneration demo demonstrates the pod becoming unready 450 seconds after the lympho has started, and restarting the test application 600s after the lympho has started (there can be some discrepancy ).
This demo is already running after applying the manifests.
Simply watch the regeneration deployment
kubectl get pod --watch -l app=regeneration-demo
Or the events related to the pod
kubectl events --for "pod/$(kubectl get pods -l app=regeneration-demo --output jsonpath='{.items[0].metadata.name}')" --watch
Or using the included monitor
kubectl exec deploy/scenario-tester -it -- /opt/app/monitor.py
Then look at the regeneration-demo
deployment.
Its readiness probe should turn red after 75% of the TTL has passed, and should restart after 100% of the TTL has passed.
This demo demonstrates increments in threat levels. In this scenario there are two ways to fire prometheus alerts:
- Perform failed logins
- Generate download file errors (status code 404) Each of these alerts lets the threat-level increment by one. When the the threat-level is 2, and either of the alerts fire, then the application will be restarted.
To get this demo running, start the included monitor:
kubectl exec deploy/scenario-tester -it -- /opt/app/monitor.py
Next, in a different shell execute the script
kubectl exec deploy/scenario-tester -it -- /opt/app/test-scenario-bruteforce.py
This demo demonstrates threat levels being passed to other applications.
When the threat-level of testapp
is zero, the testapp2
is not rate-limited, but in response to the threat-level of testapp
becoming nonzero, the testapp2
becomes rate-limited.
Like in previous demos, a way to trigger an increase of the threat-level of testapp
is to perform failed logins.
To get this demo running, start the included monitor:
kubectl exec deploy/scenario-tester -it -- /opt/app/monitor.py
Next, in a different shell execute the script
kubectl exec deploy/scenario-tester -it -- /opt/app/test-scenario-rate-limiter.py
This demo demonstrated the ability to perform readiness and liveness probes from the lympho container to the application container. The application container's readiness and liveness probe endpoints can be set at will using an api. Meanwhile, the liveness probe of the application container points to the lympho container, which gives the latter the ability to restart the application container via probes.
To get this demo running, start the included monitor:
kubectl exec deploy/scenario-tester -it -- /opt/app/monitor.py
Next, in a different shell execute the script
kubectl exec deploy/scenario-tester -it -- /opt/app/test-scenario-healthcheck.py
V.1.0 of this software was originally developed within the Partnership for Cyber Security Innovation (PCSI), a Dutch innovation ecosystem that features leading companies across several industries. V2.0 builds on feedback from PCSI partners and additional insights, and was fully developed by TNO.