Scale a Kubernetes cluster
Scaling a Kubernetes cluster means increasing or decreasing the number of worker nodes in your cluster to meet workload demands. This ensures your applications have the resources they need, and you’re not over-provisioning compute unnecessarily.
How you scale depends on where your Kubernetes cluster is running (cloud provider, on-prem, managed service) and whether you're using manual or autoscaling methods.
This page focuses on generic concepts. For Hybrid Manager clusters, refer to monitoring cluster resources and ensure your underlying infrastructure can handle resource scaling.
When to scale your cluster
You might want to scale your cluster when:
- Your Pods are stuck in Pending state due to insufficient resources
- You need to deploy more applications or higher-throughput workloads
- Resource usage is constantly near the limit (CPU/memory)
- You want to scale down during low-usage periods to save costs
Ways to scale
1. Manual scaling
You can manually add or remove nodes using your infrastructure provider or cluster management tooling.
Examples:
Amazon EKS:
Update desired node count via the EC2 Auto Scaling Group
Or use EKS-managed node groups via
eksctlor the AWS ConsoleGKE (Google Kubernetes Engine):
Resize via
gcloudCLI or GCP ConsoleOpenShift or RHOS:
Use
oc scaleor update MachineSet resources
2. Cluster autoscaler
Most cloud-managed Kubernetes clusters support an autoscaler that automatically adjusts node counts based on pending Pods and resource pressure.
- EKS: Cluster Autoscaler for AWS
- GKE: Built-in
- AKS: Built-in
- OpenShift: MachineAutoscaler
Autoscalers typically only work if:
- Pods have correct
resource requestsdefined - There's a configured range for min/max node group size
3. Vertical and horizontal Pod autoscaling
Don't confuse cluster autoscaling (adding/removing nodes) with:
- Horizontal Pod Autoscaler (HPA): Adds/removes Pod replicas based on CPU/memory or custom metrics
- Vertical Pod Autoscaler (VPA): Adjusts resource requests/limits for a Pod
These help application scaling inside the cluster, not the infrastructure itself.
Best practices
- Always set resource requests/limits for your workloads
- Monitor node usage over time (CPU, memory, disk) using [Prometheus/Grafana dashboards]
- Don’t scale beyond your cloud provider’s quota
- Test failover and autoscaling behaviors in staging
Related guides
Could this page be better? Report a problem or suggest an addition!