18 May 2020
By Mohammed Abubakar
Few things are more confusing or frustrating than getting stuck in the middle of a complex technical transformation. In the ever-expanding cloud native ecosystem, more often than not organisations embark on their Kubernetes journey unsure as to what path to follow. Covering all of your bases and avoiding common pitfalls and mistakes are worthy goals. No one wants to make the wrong decision and pay for it in the future.
There is no one “right” path to Kubernetes success; instead, several good paths exist. In this series of blogs we dive into the core areas of Kubernetes: security, efficiency, and reliability. Our goal is to provide you with Kubernetes best practices for adoption and implementation so you can realise long-term value across your entire organisation.
Organisations are still incredibly vulnerable to three common threats:
As organisations transition to cloud native technologies like containers and Kubernetes, the core business challenge remains the same: figuring out how to accelerate development velocity while maintaining reliability, scalability, and security. Even in the world of Kubernetes and containers, these two business objectives are still in tension.
Kubernetes is becoming a mainstream solution for managing how stateless microservices run in a cluster because the technology enables teams to strike a balance between velocity and resilience. It abstracts away just enough of the infrastructure layer to enable developers to deploy freely without sacrificing important governance and risk controls. But all too often, those governance and risk controls go underutilised. Since everything is working, it’s easy to think that there aren’t any problems. It’s not until you get hit with a DoS attack or a security breach that you realise a Kubernetes deployment was misconfigured or that access control wasn’t properly scoped. Running Kubernetes securely is quite complicated, which can cause headaches for development, security, and operations folks.
The fastest way to set up a Kubernetes cluster is the most insecure way. Development teams new to Kubernetes may neglect some critical pieces of deployment configuration. For example, deployments may seem to work just fine without readiness and liveness probes in place or without resource requests and limits, but neglecting these pieces will almost certainly cause headaches down the line. And from a security perspective, it’s not always obvious when a Kubernetes deployment is over-permissioned — often the easiest way to get something working is to give it root access.
Security will always make life a bit harder before it makes it easier. Organisations tend to do things in an insecure way by default, because they don’t know what they don’t know, and Kubernetes is full of these unknown unknowns. It’s easy to think your job is done because the site is up and working. But if you haven’t tightened up the security posture in a way that adheres to best practices, it’s only a matter of time before you start learning lessons the hard way.
Below, I will highlight the following key Kubernetes best practices related to security:
With a distributed-denial-of-service (DDoS) attack, an attacker who has access to many different machines (which they’ve probably broken into) can bombard a website with seemingly legitimate traffic. Sometimes these “attacks” aren’t even nefarious — it might just be one of your customers trying to use your API with a buggy script. Kubernetes allows applications to scale up and down in response to increases in traffic. That’s a huge benefit as increases in traffic (legitimate or nefarious) won’t result in end-users experiencing any degradation of performance. But, if you are attacked, your application will consume more resources in your cluster and you’ll get the bill.
While services like Cloudflare and Cloudfront serve as a good first line of defense against DoS attacks, a well designed Kubernetes ingress policy can add a second layer of protection. To help mitigate a DDoS threat, you can configure an ingress policy that sets limits on how much traffic a particular user can consume before they get shut off. You can set limits on the number of concurrent connections; the number of requests per second, minute, or hour; the size of request bodies; and even tune these limits
Kubernetes comes out with a few releases a year, each of which fixes bugs and security holes. As painful as upgrading can be, keeping your Kubernetes version up to date is essential. Old versions quickly become stale, and new security holes are being announced all the time. On top of that, it’s common to have several add-ons installed in your cluster to enhance the functionality Kubernetes provides out of the box. For instance, you might use cert-manager to help keep your site’s external certificates up to date, Istio to handle mutual TLS encryption inside your cluster, or metrics-server and Prometheus to gather metrics about how applications are running. With each of these add-ons, your attack surface and your risks increase.
Staying up to date on bug fixes and new releases is important. Each time a new release comes out, you’ll need to test those updates to make sure they don’t break anything. Where possible, test on internal and staging clusters and roll updates out slowly, monitoring possible problems and making course corrections along the way. Finally, be sure to keep the underlying Docker image up to date for each of your applications. The base image you’re using can go stale quickly, and new Common Vulnerabilities and Exposures (CVEs) are always being announced. To fight back, you can use container scanning tools like Trivy to check every image for vulnerabilities. But making sure the base operating system and any installed libraries are up-to-date is the safest policy.
The easiest way to deploy a new application or provision a new user is to give away admin permissions. A person or application with admin permissions has free range to do whatever they want—create resources in the cluster, view application secrets, or delete an entire Kubernetes deployment. The problem is that if an attacker gains access to that account, they too can do anything they want. They could spin up new workloads that mine bitcoin, access your database credentials, or delete everything in the cluster.
If you’ve got an application that doesn’t need extensive control over the cluster, giving it admin-level access is quite dangerous. If all it needs to do is view logs, you can pare down its access so that an attacker can’t do anything more than that—no mining bitcoin, viewing secrets, or deleting resources. Giving simple applications admin-level access is dangerous. If all an application needs to do is view logs, pare down its access so an attacker can’t view secrets or delete resources.
To manage access, Kubernetes provides role-based accesscontrol (RBAC). RBAC is used to grant fine-grained permissions to access different resources in the cluster. Setting up thoughtful Kubernetes RBAC rules according to the principle of least privilege is important for reducing the potential for splash damage when an account is compromised. It’s a delicate balance, as you might end up withholding necessary permissions. But it’s worth that minor inconvenience to avoid the major headaches that come from a security breach. While RBAC configuration can be confusing and verbose, tools like rbac-manager can help simplify the syntax. This helps prevent mistakes and provides a clearer sense for who has access to what.
Network policy is similar to RBAC, but instead of deciding who has access to which resources in your cluster, network policy focuses on who can talk to who inside your cluster. In a large enterprise, dozens of applications may run inside the same Kubernetes cluster, and by default every application has network access to everything else running inside the cluster. Of course, some network access is usually necessary. But while a given workload might need to talk to a database and a handful of microservices, that workload probably won’t need access to every other application inside the cluster. It’s up to you to write a network policy that cuts off communications to unnecessary parts of the cluster.
Without a strict network policy, an attacker will be able to probe the network and spread throughout the cluster. With proper network policies in place, however, an attacker who gains access to a particular workload will be restricted to that one workload and its dependencies. Network policy can also be used to manage cluster ingress and egress— where incoming traffic can come from and where outgoing traffic can go. You can make sure internal-only applications only accept traffic from IP addresses inside your firewall and make sure all partner IP addresses are whitelisted for partner-driven applications. For outgoing traffic, you may also want to whitelist allowed domains. This way, if a hacker gains access to the cluster and tries to push data out to an external URL, they’ll be stopped by your network policy. With strict ingress and egress rules, you can limit the potential attack surface of your applications.
Network policy is easy to neglect, especially as you’re building out a Kubernetes cluster for the first time. But it’s a good way to harden your cluster from a security standpoint and limit the extent of damage after attackers find a security hole. As with RBAC, there’s a tradeoff between over-permissioning to make sure everything works properly versus limiting permissions and making sure any problems are contained. Again, you’re sacrificing short-term convenience to avoid the fallout from a major security breach.
Kubernetes empowers Infrastructure as Code (IaC) workflows more than any other platform. By encoding all of your infrastructure choices in YAML, Terraform, and other configuration formats, you ensure your infrastructure is 100% reproducible. Even if your cluster disappeared overnight, you’d be able to recreate it in a matter of hours or minutes so long as you’re utilising IaC.
But there’s one catch: your applications need access to secrets. Database credentials, API keys, admin passwords, and other bits of sensitive information are required for most applications to function properly. You may be tempted to check these credentials into your IaC repository, so that your builds are 100% reproducible. But once they’re checked in, they’re permanently exposed to anyone with access to your Git repository. If you care about security, it’s imperative to avoid this temptation.
The solution is to split the difference: by encrypting all of your secrets, you can safely check them into your repository without fear of exposing them. Then you’ll just need access to a single encryption key to “unlock” your IaC repository and have perfectly reproducible infrastructure. Tools like Mozilla’s SOPS make this easy. Simply create a single encryption key using Google’s or Amazon’s key management stores, and any YAML file can be fully encrypted and checked-in to your Git repository.
Applications change constantly, and there’s no way to ensure that your application code is bulletproof. What Kubernetes does really well is mitigate the severity of attacks and contain splash damage. When someone penetrates your application and makes it through that first layer, they won’t get much (or any) farther if you’ve optimised security settings in accordance with the Kubernetes best practices described here.
With the proper knowhow and attention, a Kubernetes implementation will be more secure and easier to maintain than other systems, specifically because it provides a single platform for everything related to cloud computing. Kubernetes has strong built-in security features, as well as a massive ecosystem of third-party security tooling. Although it can feel overwhelming to adopt Kubernetes, many tools are available that can help you manage the process. You also might explore a Kubernetes enablement platform that goes beyond some managed Kubernetes solutions. And without a platform like Kubernetes, it’s much harder to minimise the attack surface or even understand its breadth.
No code is 100% bug-free, and all applications have flaws. Since your applications need to serve traffic to the outside world, it’s a matter of if, not when, someone manages to find a hole. However, a properly configured Kubernetes cluster will severely limit the blast radius of an attack. It can make the difference between a minor security incident and a crippling breach.