Scale a Kubernetes cluster

Scaling a Kubernetes cluster means increasing or decreasing the number of worker nodes in your cluster to meet workload demands. This ensures your applications have the resources they need, and you’re not over-provisioning compute unnecessarily.

How you scale depends on where your Kubernetes cluster is running (cloud provider, on-prem, managed service) and whether you're using manual or autoscaling methods.

This page focuses on generic concepts. For Hybrid Manager clusters, refer to monitoring cluster resources and ensure your underlying infrastructure can handle resource scaling.


When to scale your cluster

You might want to scale your cluster when:

  • Your Pods are stuck in Pending state due to insufficient resources
  • You need to deploy more applications or higher-throughput workloads
  • Resource usage is constantly near the limit (CPU/memory)
  • You want to scale down during low-usage periods to save costs

Ways to scale

1. Manual scaling

You can manually add or remove nodes using your infrastructure provider or cluster management tooling.

Examples:

  • Amazon EKS:

  • Update desired node count via the EC2 Auto Scaling Group

  • Or use EKS-managed node groups via eksctl or the AWS Console

  • GKE (Google Kubernetes Engine):

  • Resize via gcloud CLI or GCP Console

  • OpenShift or RHOS:

  • Use oc scale or update MachineSet resources


2. Cluster autoscaler

Most cloud-managed Kubernetes clusters support an autoscaler that automatically adjusts node counts based on pending Pods and resource pressure.

Autoscalers typically only work if:

  • Pods have correct resource requests defined
  • There's a configured range for min/max node group size

3. Vertical and horizontal Pod autoscaling

Don't confuse cluster autoscaling (adding/removing nodes) with:

  • Horizontal Pod Autoscaler (HPA): Adds/removes Pod replicas based on CPU/memory or custom metrics
  • Vertical Pod Autoscaler (VPA): Adjusts resource requests/limits for a Pod

These help application scaling inside the cluster, not the infrastructure itself.


Best practices

  • Always set resource requests/limits for your workloads
  • Monitor node usage over time (CPU, memory, disk) using [Prometheus/Grafana dashboards]
  • Don’t scale beyond your cloud provider’s quota
  • Test failover and autoscaling behaviors in staging


Could this page be better? Report a problem or suggest an addition!