What are TopologySpreadConstraints?

With TopologySpreadConstraints kubernetes has a tool to spread your pods around different topology domains.
Wait, topology domains? What are those? I hear you, as I had the exact same question.
A topology is simply a label name or key on a node.
A domain then is a distinct value of that label.
If for example we have these 3 nodes

apiVersion: v1
kind: Node
metadata:
  name: node-1
  labels:
    topology.kubernetes.io/zone: eu-west-1a
---
apiVersion: v1
kind: Node
metadata:
  name: node-2
  labels:
    topology.kubernetes.io/zone: eu-west-1b
---
apiVersion: v1
kind: Node
metadata:
  name: node-3
  labels:
    topology.kubernetes.io/zone: eu-west-1b

YAML

We have the ability to reference the topology topology.kubernetes.io/zone.
For that topologyKey there are 2 domains: eu-west-1a and eu-west-1b.

Now that we know this, how can we use that?

Using TopologySpreadConstraints

As stated before, we can use TopologySpreadConstraints to spread pods around different nodes.
This way we can for example create a HighAvailability setup (I have a post about that: High availability in Kubernetes). We can also use it to ensure each domain has a certain pod. That might be useful in more performance oriented applications.

How does it do that?

The TopologySpreadConstraints are part of the pod spec, and used during scheduling of the pod.
During scheduling it will take a look at the available nodes. And check what domains there are for the given topology.
In the previous example for te topologyKey topology.kubernetes.io/zone those were eu-west-1a and eu-west-1b.
It will check how many of the pods (matching the given label) are already running on that domain.
Then it checks the maxSkew or in other words, how big the difference can be between the different domains.
If we have a maxSkew=1 then when eu-west-1a has 2 matching pods, and eu-west-1b has 1 matching pod. Then the we kubernetes will try to schedule the pod on node-2 or node-3 as they are part of eu-west-1b. It will take into account nodeTaints and affinityPolicies by default. More about that in the specs.

Combining TopologySpreadConstraints

So, we’ve seen how to use the TopologySpreadConstraints. Now can we combine multiple TopologySpreadConstraints?
The anwer is: Yes! Yes we can combine them.
When combining TopologySpreadConstraints they act with an and rule.
However, it does take the whenUnsatisfiable constraint into account.

Now take the following example:

apiVersion: v1
kind: Node
metadata:
  name: node-1
  labels:
    nodegroup: group-1
    topology.kubernetes.io/region: eu-west-1
---
apiVersion: v1
kind: Node
metadata:
  name: node-2
  labels:
    nodegroup: group-2
    topology.kubernetes.io/region: eu-west-1
---
apiVersion: v1
kind: Node
metadata:
  name: node-3
  labels:
    nodegroup: group-1
    topology.kubernetes.io/region: eu-east-1
---
apiVersion: v1
kind: Pod
metadata:
  name: my-app
  labels:
      app: app-name
spec:
  topologySpreadConstraints:
    - maxSkew: 1
      labelSelector:  
        matchLabels:  
          app: app-name  
      topologyKey: nodegroup 
      whenUnsatisfiable: ScheduleAnyway
    - maxSkew: 1
      labelSelector:  
        matchLabels:  
          app: app-name  
      topologyKey: topology.kubernetes.io/region 
      whenUnsatisfiable: DoNotSchedule
  containers:
  - name: pod-across-zones
    image: example/app:latest

YAML

There are 2 topologyKeys we are looking at nodegroup and topology.kubernetes.io/region.
Now with this config we will make sure that the pods are evenly spread among the different regions, and we try to evenly spread pods among nodegroups. However this is not a requirement.
That means that after we’ve scheduled 5 pods (1 on node-1, 2 on node-2, 2 on node-3), the 6th pod should be scheduled on node 3 again. Otherwise the topology.kubernetes.io/region constraint cannot be met.

TopologySpreadConstraints Spec

Let’s check out the spec of the the TopologySpreadConstraints: (yaml copied from the kubernetes docs)

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  # Configure a topology spread constraint
  topologySpreadConstraints:
    - maxSkew: <integer>
      minDomains: <integer> # optional; beta since v1.25
      topologyKey: <string>
      whenUnsatisfiable: <string>
      labelSelector: <object>
      matchLabelKeys: <list> # optional; beta since v1.27
      nodeAffinityPolicy: [Honor|Ignore] # optional; beta since v1.26
      nodeTaintsPolicy: [Honor|Ignore] # optional; beta since v1.26
  ### other Pod fields go here

YAML

maxSkew
MaxSkew is used to determine what the permitted difference is between the defined topologyKeys.
This is an integer and the integer is the amount in pods.
So when you want at max 1 pod per topology domain you should set it to 1
When whenUsatisfiable: DoNotSchedule is used. Then the pods will not be scheduled when the difference would become to large.
It is also used as a minimum amount of pods per topology domain! This is only the case when the number of domains is less then the amount of minDomains. Otherwise the minimum number of matching pods is zero.
When whenUnsatisfiable: ScheduleAnyway is used, then the scheduler gives higher precedence to topology domains that have less pods (reducing the skew)
minDomains
Used in combination with maxSkew and whenUnsatisfiable=DoNotSchedule.
It is a beta feature that is turned on by default (since 1.27).
When enabling can set a positive integer. This is the minimum number of topology domains that need to match for the maxSkew to act with the global minimum.
topologyKey
This is the key used to select the topology. These are node labels for example topology.kubernetes.io/zone. For a list of well knows labels check here
whenUnsatisfiable
DoNotSchedule Prevents scheduling when topology skew cannot be within the maxSkew.
ScheduleAnyway Makes maxSkew a recommendation.
labelSelector
Used to find pods that count towards the topology domain. The value of this field will determine what pods the skew will be calculated for.
matchLabelKeys
List of label keys to refine the labelSelector even more. For example you can have a labelSelector for app, and a labelKey for pod-template-hash. This makes it that different revisions of the app will be spread accordingly.
nodeAffinityPolicy
During scheduling, the skew will be calculated by analyzing the available nodes.
When we have 6 nodes, each in 3 different zones, then we check where our pods are already deployed.
With this flag set to Honor, we will only include nodes that adhere to the nodeAffinity selector.
So when 2 out of 5 nodes match the affinity selector, then only those 2 pods are included in the spread calculation.
This works the same way with the nodeSelector.
When it is set to ignore, then all nodes are taken into consideration when calculating the spread.
nodeTaintsPolicy
How to treat tainted nodes during spread calculation.
There are 2 settings:
Honor
Excludes tainted nodes unless the pod has a tolaration for the taint.
Ignore.
Ignores any taint on the nodes. And this includes all nodes that adhere to the constraints configured earlier.

Known limitations

There’s no guarantee that the constraints remain satisfied when Pods are removed.!
For example, scaling down a Deployment that has 10 pods, is part of 2 topologies with a maxSkew of 1. Then when scaling down to 3 pods, it can happen that all 3 pods are located on a single topology domain.
This can be mitigated by using a tool like Descheduler to rebalance the Pods distribution.
When calculating what topology domains are available, the scheduler only has knowledge of existing nodes. This could lead to problems with autoscaled clusters, as on those clusters only a minimum amount of nodes are running. Thus not all possible topology domains are available.
There are autoscalers that do have prior knowledge of (certain) topology domains, like karpenter

Links

Kubernetes docs
Descheduler
karpenter
Well-Known Labels, Annotations and Taints
High availability in Kubernetes

Spread the love

What are TopologySpreadConstraints?

What are TopologySpreadConstraints?

Using TopologySpreadConstraints

How does it do that?

Combining TopologySpreadConstraints

TopologySpreadConstraints Spec

Known limitations

Links

Related Posts:

Pod(Anti)Affinity

High availability in Kubernetes

One Reply to “What are TopologySpreadConstraints?”