In the 19th-century fairy tale “Snow White” the Brothers Grimm unconsciously issued a warning against lack of Kubernetes Runtime Security.
The Seven Dwarfs return home and notice someone has breached security of their house and sabotaged their rooms (they were cleaned). As it turned out, the burglar was Snow White, who accidentally entered their cottage while wandering through the forest. Fortunately for the Seven Dwarfs no private items from their rooms were stolen and their cottage was not secretly used as a Bitcoin miner.
The story can be translated into… today’s security issues with Kubernetes. Let’s imagine that the Seven Dwarfs are SRE team members, their cottage is Kubernetes cluster and Snow White is an evil attacker.
Prevention vs. Detection
When we think of Kubernetes security, we often think of features like RBAC, Network Policies, Admission Controller etc. Those utilities can be compared to door locks, or security guard vetting anyone entering the cottage. We definitely need those kind of security measures, however they will only act as a preventive layer.
Apart from installing extra door locks, the Seven Dwarfs should also install alarm system inside the cottage, scanning and detecting suspicious activities – runtime security. With this kind of system in place Snow White would immediately trigger an alarm when she used privileged resource like bed or kitchen table. Additionally after detection, the system could also start a recovery action, for example destroy the cottage with attacker inside and create a new one… of course it would not be a fairy tale anymore, but at least the SRE Dwarfs would not have to investigate “Someone’s been eating from my plate” problems.
Falco is a cloud-native runtime security project, first one to be accepted by CNCF. It is taking advantage of Linux Kernel eBPF (Extended Berkeley Packet Filter) mechanism allowing Falco to inject it’s code into the kernel in a completely safe manner. On the contrary to kernel modules which could potentially crash the system, eBPF programs run in a sandbox environment, making it impossible for them to crash the OS.
Attaching kernel hooking points allows Falco to analyze kernel events and based on it’s rules engine, decide if the event is suspicious or not. It all creates unlimited possibilities of attack detection.
Here is just an example of what kind of questions can be answered :
- Whether or not the container spawned additional process ?
- Did someone connect directly to my container using
- Did the container change it’s config files while running ?
- Did the container mount any of the host sensible directory/file ?
- Was the Kubernetes API accessed from inside the container ?
- Did my container suddenly started to listen on new port ?
Who’s been eating off my plate?
Great thing about Falco is that it’s rules are written in YAML and are very easy to understand and customize. Additional benefit is that it’s aware of Kubernetes resources like pods, namespaces etc. For example if we want to know if anyone started an interactive bash shell inside our Jenkins pod running in default namespace we can write below condition
condition: container.id != host and proc.name = bash and k8s.ns.name = default and k8s.pod.name contains jenkins and proc.tty != 0
and whole rule would look like
- rule: Terminal shell in container desc: A shell was spawned in container with attached terminal condition: container.id != host and proc.name = bash and k8s.ns.name = default and k8s.pod.name contains jenkins and proc.tty != 0 output: > A shell was spawned in a container with an attached terminal (user=%user.name %container.info shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline terminal=%proc.tty container_id=%container.id image=%container.image.repository) priority: NOTICE
Falco can trigger an arbitrary command in the response of a suspicious event. We definitely want to send Slack/Teams notification when this situation happens. Falco team created also a small proxy service called Falco Sidekick which is capable of sending such notifications and include Rule outputs. The result of our Jenkins rule could look as below notification in MS teams
Great thing about Falco is that it can be deployed using HELM chart. It will run as Daemon Set on every host. Additionally the official HELM chart comes with predefined set of rules, ready to monitor Kubernetes clusters (although without customization they will create a lot of false positive alerts ).
helm install --name my-release stable/falco
We wanted to use Sidekick together with Falco, so we had to create our own HELM chart with two of those containers combined. Sidekick runs inside the same pod as Falco, on port 2801, so whole integration came down to below lines in falco.yml config file
json_output: true json_include_output_property: true http_output: enabled: true url: http://localhost:2801/"
And they lived happily ever after
The next step for us will be to add automated reactions to the security events. We want to automatically delete compromised pod, or even remove whole node from the cluster. Our journey with Falco is just getting started, however we already get great benefit of having deep Insight into our clusters.