Why high availability (HA)
Within a High Availability setup the idea is that the application you created almost never goes down.
Of course this is partially up to your application itself, and the way you roll out your application.
These are factors that are part of a being highly available, and usually within your own control.
Usually when people are talking about HA setups, they are talking about the infrastructure and not the application itself.
Therefore the rollout and the application itself will not be the part of this post.
The idea of a high availability setup is thus that we limit the effect that external factors have on the availability of the application.
For example, unexpectedly losing a node, should not affect the availability.
Not only a node though. The idea is that we can lose a whole data center or availability zone.
This can happen for example because of updates in the DC, switches that break, power outages, or cables that get cut.
These are factors that are outside of our own control, and something that we should limit the effect of.
And something we can do something about when we are running on Kubernetes.
Principles
Limit the single points of failure
- Limit the amount of the same application pods that are running on the same node.
- Split those nodes across multiple datacenters / availability zones
- Multiple Countries / Regions
- Multiple Cloud providers
This is the ideal, though for many organizations not really feasible.
As this also brings a lot of extra complexity we won’t be talking about this.
With this out of the way, let’s see how we can do this.
What tools are available
Instead of going into details about each and every tool available, I’ve added separate posts for each of them. So if you want more information about how it works, or what it does then just follow the link.
Native tools
- pod(Anti)Affinity
Defines what pods can and can’t run together on the same node. - Topology spread constraints
Defines how pods should be spread around - Cluster autoscaler
Scales your cluster according to the resources you request - Pod requirements
Define what resources your pod needs - Pod disruption budget
Defines how many pods can be unavailable at a single time
Non native tools
- Karpenter
A better autoscaler - Descheduler
Helps with keeping your spread in line with the spread configuration when your cluster downscales.
How to combine the tools
These available tools are all powerful, however, each on their own they only solve part of the high availability equation.
Combining them however does give you the complete set of tools for a high availability setup.
For example having a cluster autoscaler has no use when you don’t know how many resources your pod requires.
And having a Pod disruption budget is great. However when all your pods run on a single node, then that doesn’t really help a whole lot.
So in this part we are going to see how these tools can combine to a high available setup.
I will assume that you already have a cluster autoscaler setup when possible.
Usually this will be managed by a techops | ops | infra team.
1: Setup your pod resources
As a first step we should determine how many resources our pods need.
You should define the cpu and memory that the pods needs to properly run.
apiVersion: v1
kind: Pod
metadata:
name: application
spec:
containers:
- name: app
image: example/app:latest
resources:
requests:
memory: "512Mi"
cpu: "1.0"
limits:
memory: "512Mi"
cpu: "1.0"
YAMLMake sure your request is enough to run the pod. This is all you are guaranteed. The limit is how far your container can burst in resource usage.
If you are unsure, you can start high. As from an availability standpoint a container can never have to many resources.
This is of course unless you are limited in the resources you can deploy. For example because you are running your own servers.
We should keep the request and the limit the same. Otherwise the container could crash with an out of memory exception.
For more information check the Pod requirements documentation.
2: Pod (anti-)affinity
So, you’ve set up your resources. However all pods are possibly still running on a single giant node.
This can cause outages when that single node is unavailable for whatever reason.
So, let’s solve that.
First we need to know how to identify the pod. A good way of doing this, is adding labels to your pods, and referencing them in your anti-affinity rules.
Then we can write an anti-affinity rule for that:
apiVersion: v1
kind: Pod
metadata:
name: with-pod-anti-affinity
labels:
app: app-name
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: app-name
topologyKey: kubernetes.io/hostname
containers:
- name: with-pod-anti-affinity
image: example/app:latest
YAMLThe podAntiAffinity
combined with the topologyKey: kubernetes.io/hostname
we created a rule not to place pods on the same node as the labelSelector
we defined. Using requiredDuringSchedulingIgnoredDuringExecution
means that this is required and not just preferred.
As you can see, the labelSelector
does describe the label app
with the value app-name
.
As the pod itself has the same label, this means that it does match other pods that are deployed with the same configuration.
The combination means that 2 pods with this configuration will not be put on the same node.
Now there is a lot more you can do with affinity and anti-affinity.
For that check out my post about pod(Anti)Affinity
3: Create a Pod Disruption Budget
So far, we’ve checked out anti affinity, and assigning resources to a pod.
I’m however going to skip forward and assume you know how replicaSets and deployments work, as for this topic we should use them.
A pod Disruption budget allow a certain amount of pods to be unavailable at a single time.
We can do this with the minAvailable
and maxUnavailible
properties.
Both are important as together they allow a disruption budget while being able to scale to whatever amount of pods you need.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-name-pdb
labels:
app: app-name
spec:
minAvailable: 2
maxUnavailble: 2
selector:
matchLabels:
app: app-name
YAMLIn this example we always want 2 healthy pods at the minimum. However if we have 30 pods running, then only 2 can be unavailable. That means we would have 28 pods running.
You can read more about this here: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
4: Create topology spread constrains
Now we have an awesome setup, however we are potentially still limited to a single datacenter or zone.
So, let’s fix that.
First, we need find the topology key. For us that will be: topology.kubernetes.io/zone
. This will allow us to define a spread across zones.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
labels:
app: app-name
spec:
topologySpreadConstraints:
- maxSkew: 1
labelSelector:
matchLabels:
app: app-name
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
containers:
- name: pod-across-zones
image: example/app:latest
YAMLThe maxSkew means that we only ever want a maximum of 1 pod difference between our zones.
So if we have 2 zones, we want them to be 12 and 11. But not 12 and 10.
The topologyKey we defined is one of the keys’s supported by kubernetes by default. This describes that we want this spread across zones.
The whenUnsatisfiable is important. This means that if for whatever reason, we cannot adhere to the topologySpreadConstraints, we will not block the deployment of the pod.
There is a lot more someone can do with these constraints. But that is outside of the scope of this post.
You can
5: Karpenter
Awesome, now we have everything for our high availability setup right?
Unfortunately no.
The kubernetes cluster autoscaler is awesome but it does have 2 important limitations.
The first one is downscaling (more on that next), the other is knowledge about the available availability zones.
See, to use the topology spread contraints across multiple az’s one does need to know what az’s are available. Kubernetes does only know about whatever is running inside kubernetes.
This is where Karpenter comes into play. Karpenter does know about all the available az’s and thus knows how to use them.
To use Karpenter, you just need to deploy it, and that is outside of this posts scope. When it is deployed, you are good to go!
6: Descheduler
Now back to the first limitation mentioned.
When kubernetes scales down, it doesn’t take into account the topologySpreadConstraints.
Yes this is weird, however, they created a tool that solves this problem.
It’s called Descheduler and you can find it on GitHub: Descheduler.
It is imho not a perfect tool as you need to configure it using it’s ConfigMap, but it does do the job.
An example here is:
apiVersion: v1
kind: ConfigMap
metadata:
name: my-app-descheduler
data:
policy.yaml: |
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: "DefaultEvictor"
args:
nodeFit: true
- name: ProfileName
pluginConfig:
- name: "RemovePodsViolatingTopologySpreadConstraint"
args:
constraints:
- DoNotSchedule
- ScheduleAnyway
plugins:
balance:
enabled:
- "RemovePodsViolatingTopologySpreadConstraint"
YAMLI’ve just added the most imporant part of the configmap. And that is the policy.yaml part.
In here you can create a policy to keep the topology spread in balance.
Now, you can do a whole lot more with it, but that is out of scope for this post.
Conclusion
If you followed everything until now, you should have yourself a nice high available kubernetes setup. And from here you can start optimizing it even further.
As you could see, the basics are not that difficult even though you do need some extra tools.
Below i do have an example setup for you to see how everything works together.
Have Fun!
Examples
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
labels:
app: my-app
spec:
replicas: 10
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: >-
my-app:latest
resources:
limits:
cpu: '2'
memory: 512Mi
requests:
cpu: '2'
memory: 512Mi
imagePullPolicy: Always
restartPolicy: Always
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: my-app
---
apiVersion: v1
kind: ConfigMap
metadata:
name: my-app-descheduler
data:
policy.yaml: |
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: "DefaultEvictor"
args:
nodeFit: true
- name: ProfileName
pluginConfig:
- name: "RemovePodsViolatingTopologySpreadConstraint"
args:
constraints:
- DoNotSchedule
- ScheduleAnyway
plugins:
balance:
enabled:
- "RemovePodsViolatingTopologySpreadConstraint"
YAMLMore information
Kubernetes scheduling
Cluster autoscaler
Descheduler
Karpenter
pod(Anti)Affinity
Topology spread constraints
2 Replies to “High availability in Kubernetes”
Comments are closed.