Scale a Kubernetes cluster

Suggest edits

Scaling a Kubernetes cluster means increasing or decreasing the number of worker nodes in your cluster to meet workload demands. This ensures your applications have the resources they need, and you’re not over-provisioning compute unnecessarily.

How you scale depends on where your Kubernetes cluster is running (cloud provider, on-prem, managed service) and whether you're using manual or autoscaling methods.

This page focuses on generic concepts. For Hybrid Manager clusters, refer to monitoring cluster resources and ensure your underlying infrastructure can handle resource scaling.

When to scale your cluster

You might want to scale your cluster when:

Your Pods are stuck in Pending state due to insufficient resources
You need to deploy more applications or higher-throughput workloads
Resource usage is constantly near the limit (CPU/memory)
You want to scale down during low-usage periods to save costs

Ways to scale

1. Manual scaling

You can manually add or remove nodes using your infrastructure provider or cluster management tooling.

Examples:

Amazon EKS:
Update desired node count via the EC2 Auto Scaling Group
Or use EKS-managed node groups via eksctl or the AWS Console
GKE (Google Kubernetes Engine):
Resize via gcloud CLI or GCP Console
OpenShift or RHOS:
Use oc scale or update MachineSet resources

2. Cluster autoscaler

Most cloud-managed Kubernetes clusters support an autoscaler that automatically adjusts node counts based on pending Pods and resource pressure.

EKS: Cluster Autoscaler for AWS
GKE: Built-in
AKS: Built-in
OpenShift: MachineAutoscaler

Autoscalers typically only work if:

Pods have correct resource requests defined
There's a configured range for min/max node group size

3. Vertical and horizontal Pod autoscaling

Don't confuse cluster autoscaling (adding/removing nodes) with:

Horizontal Pod Autoscaler (HPA): Adds/removes Pod replicas based on CPU/memory or custom metrics
Vertical Pod Autoscaler (VPA): Adjusts resource requests/limits for a Pod

These help application scaling inside the cluster, not the infrastructure itself.

Best practices

Always set resource requests/limits for your workloads
Monitor node usage over time (CPU, memory, disk) using [Prometheus/Grafana dashboards]
Don’t scale beyond your cloud provider’s quota
Test failover and autoscaling behaviors in staging

Could this page be better? Report a problem or suggest an addition!