26 May 2020
By Mohammed Abubakar
There is no one “right” path to Kubernetes success; instead, several good paths exist. In this series of blogs we dive into the core areas of Kubernetes: security, efficiency, and reliability. Our goal is to provide you with Kubernetes best practices for adoption and implementation so you can realise long-term value across your entire organisation.
In the first blog of the series, I discussed the security challenges of running Kubernetes at scale and ways to ensure the security of your Clusters through best practices.
One reason container technology has surpassed the capabilities of traditional virtual machines is its inherent efficiency with regard to infrastructure utilisation. Whereas in a traditional virtual machine environment one application is typically run per host, in a containerised environment you can run multiple applications per host, each within its own container. Packing multiple applications per host reduces your overall number of compute instances and thus your infrastructure costs.
Kubernetes is a dynamic system that automatically adapts to your workload’s resource utilisation. Kubernetes has two levels of scaling. Each individual Kubernetes deployment can be scaled automatically using a Horizontal Pod Autoscaler (HPA), while the cluster at large is scaled using Cluster Autoscaler.
HPAs monitor the resource utilisation of individual pods within a deployment, and they add or remove pods as necessary to keep resource utilisation within specified targets per pod. Cluster Autoscaler, meanwhile, handles scaling of the cluster itself. It watches the resource utilisation of the cluster at large and adds or removes nodes to the cluster automatically.
A key feature of Kubernetes that enables both of these scaling actions is the capability to set specific resource requests and limits on your workloads. By setting sensible limits and requests on how much CPU and memory each pod uses, you can maximise the utilisation of your infrastructure while ensuring smooth application performance.
To maximise the efficient utilisation of your Kubernetes cluster, it is critical to set resource limits and requests correctly. Setting your limits too low on an application will cause problems. For example, if your memory limits are too low, Kubernetes is bound to kill your application for violating its limits. Meanwhile, if you set your limits too high, you’re inherently wasting resources by over-allocating, which means you will end up with a higher bill.
While Kubernetes best practices dictate that you should always set resource limits and requests on your workloads, it is not always easy to know what values to use for each application. As a result, some teams never set requests or limits at all, while others set them too high during initial testing and then never course correct. The key to ensuring scaling actions work properly is dialing in your resource limits and requests on each pod so workloads run efficiently.
If you are interested in fine-tuning the instances that your workloads run on, you can use different instance group types and node labels to steer workloads onto specific instance types.
Different business systems often have different-sized resource needs, along with specialised hardware requirements (such as GPUs). The concept of node labels in Kubernetes allows you to put labels onto all of your various nodes. Pods, meanwhile, can be configured to use specific “nodeSelectors” set to match specific node labels, which decide which nodes a pod can be scheduled onto. By utilising instance groups of different instance types with appropriate labelling, you can mix and match the underlying hardware available from your cloud provider of choice with your workloads in Kubernetes.
If you have different-sized workloads with different requirements, it can make sense strategically and economically to place those workloads on different instance types and use labels to steer your workloads onto those different instance types.
Spot instances (from AWS) and preemptible instances (from Google Cloud) tie into this idea. Most organisations are familiar with paying for instances on demand or on reserved terms over fixed durations. However, if you have workloads that can be interrupted, you may want to consider using spot instances on AWS or preemptible instances on Google Cloud. These instance types allow you to make use of the cloud provider’s leftover capacity at a significant discount—all at the risk of your instance being terminated when the demand for regular on-demand instances rises.
If the risk of random instance termination is something that some of your business workloads can tolerate, you can use the same concept of node labelling to specifically schedule those workloads onto these types of instance groups and gain substantial savings.
Setting up and managing clusters and then telling software developers to deploy their apps to those clusters is a complex process. It’s not uncommon for developers to deploy apps but not know how to set the right resource limits or requests. Enabling Cluster Autoscaler ensures any extra nodes are removed when they are unused, which saves time and money.
Setting resource limits and requests is key to operating applications on Kubernetes clusters as efficiently and reliably as possible.