GKE Workload Scaling

This reference covers scaling workloads on GKE. The golden path enables VPA, OPTIMIZE_UTILIZATION autoscaling profile, and Node Auto Provisioning by default.

MCP Tools: get_k8s_resource, describe_k8s_resource, apply_k8s_manifest, patch_k8s_resource, get_cluster, update_cluster, update_node_pool

Golden Path Scaling Defaults

Setting	Golden Path Value	Notes
`autoscaling.autoscalingProfile`	`OPTIMIZE_UTILIZATION`	Aggressive scale-down for cost savings
`verticalPodAutoscaling.enabled`	`true`	VPA recommendations available
`autoscaling.enableNodeAutoprovisioning`	`true`	NAP creates node pools on demand
GPU resource limits (T4, A100)	`1000000000` each	NAP can provision GPU nodes

Scaling Mechanisms

1. Manual Scaling

kubectl-only — no MCP equivalent for kubectl scale. Use kubectl directly.

kubectl scale deployment <DEPLOYMENT> --replicas=<N> -n <NAMESPACE>

2. Horizontal Pod Autoscaling (HPA)

Scales the number of pods based on metrics.

Quick setup (kubectl-only — no MCP equivalent for kubectl autoscale):

kubectl autoscale deployment <DEPLOYMENT> --cpu-percent=50 --min=1 --max=10

Manifest approach (recommended — use MCP apply_k8s_manifest):

See assets/hpa-example.yaml for a template.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: <DEPLOYMENT>-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <DEPLOYMENT>
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

3. Vertical Pod Autoscaling (VPA)

Adjusts CPU and memory requests to match actual usage. Enabled by default on golden path.

Update modes:

Off — recommendations only (safest, start here)
Initial — sets resources only at pod creation
Auto — restarts pods to apply new resource values
InPlaceOrRecreate — updates resources without restart when possible (GKE 1.34+)

Create VPA in recommendation mode:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: <DEPLOYMENT>-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <DEPLOYMENT>
  updatePolicy:
    updateMode: "Off"

Read recommendations (prefer MCP describe_k8s_resource):

# MCP (preferred)
describe_k8s_resource(parent="...", resourceType="verticalpodautoscaler", name="<DEPLOYMENT>-vpa", namespace="<NAMESPACE>")

# kubectl fallback
kubectl get vpa <DEPLOYMENT>-vpa -o jsonpath='{.status.recommendation}'

See assets/vpa-example.yaml for a full template.

4. Cluster Autoscaler / Node Auto Provisioning (NAP)

On Autopilot (golden path), node scaling is fully managed. NAP automatically creates and sizes node pools based on workload demands.

For Standard clusters:

# Enable cluster autoscaler on a node pool
gcloud container clusters update <CLUSTER_NAME> --region <REGION> \
  --enable-autoscaling --node-pool <POOL_NAME> \
  --min-nodes <MIN> --max-nodes <MAX> \
  --quiet

# Enable NAP
gcloud container clusters update <CLUSTER_NAME> --region <REGION> \
  --enable-autoprovisioning \
  --min-cpu <MIN_CPU> --max-cpu <MAX_CPU> \
  --min-memory <MIN_MEM> --max-memory <MAX_MEM> \
  --quiet

Autoscaling profiles:

Profile	Behavior	Golden Path?
`BALANCED`	Default GKE; conservative scale-down	No
`OPTIMIZE_UTILIZATION`	Aggressive scale-down; lower idle	Yes
: : resources : :

Best Practices

Define resource requests: HPA and VPA rely on accurate requests. Always set them.
Avoid metric conflicts: Do not use HPA and VPA on the same metric. Typical pattern: HPA on CPU, VPA on memory.
Pod Disruption Budgets: Define PDBs for all production workloads to ensure availability during scaling events.
HPA stabilization: HPA has a default 5-minute stabilization window. Tune behavior for faster response if needed.
VPA "Auto" caution: Auto mode restarts pods. Ensure your app handles SIGTERM gracefully. VPA requires at least 2 replicas for evictions by default.
Use ComputeClasses: For workload-specific node targeting (Spot fallback, GPU, specific machine families), use ComputeClasses instead of node selectors.

Rightsizing Workflow

Deploy VPA in Off mode for 24+ hours
Read recommendations: kubectl describe vpa <NAME>
Compare target values against current requests
Apply with 20% buffer: new_request = target * 1.2
Use patch format to update Deployment

Condition	Recommendation	Risk
CPU request >5x P95 actual	Reduce to `P95 * 1.2`	Medium
Memory request >3x P95 actual	Reduce to `P95 * 1.2`	Medium
CPU request >2x P95 actual	Rightsizing with 20% buffer	Low
No resource limits set	Add limits to prevent noisy-neighbor	Low

Files3

3 files · 12.0 KB

Select a file to preview

Overall Score

82/100

Grade

B

Good

Safety

85

Quality

80

Clarity

85

Completeness

75

Summary

This skill guides AI agents through configuring GKE autoscaling mechanisms, including Horizontal Pod Autoscaling (HPA), Vertical Pod Autoscaling (VPA), and Node Auto-Provisioning (NAP). It documents the golden path (OPTIMIZE_UTILIZATION profile, VPA enabled, NAP enabled) with clear boundaries between mechanism types, example YAML manifests, and best practices for production use.

Detected Capabilities

k8s manifest deployment via apply_k8s_manifest MCP toolk8s resource queries via get_k8s_resource and describe_k8s_resourcek8s resource patching via patch_k8s_resourcecluster configuration updates via update_cluster and update_node_poolgcloud CLI execution for NAP and cluster autoscaler enablementkubectl CLI execution for manual scaling and HPA inspection

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

gke autoscaling setuphorizontal pod autoscalervertical pod autoscalergke node auto-provisioningright-size workloadshpa configurationvpa recommendations

Referenced Domains

External domains referenced in skill content, detected by static analysis.

www.apache.org

Use Cases

Configure HPA for CPU-based pod scaling on GKE deployments
Enable VPA in recommendation mode to identify resource rightsizing opportunities
Set up Node Auto-Provisioning (NAP) on GKE Standard clusters
Apply autoscaling profiles (OPTIMIZE_UTILIZATION vs BALANCED) to existing clusters
Implement Pod Disruption Budgets and stabilization strategies for safe scaling
Right-size workloads using VPA recommendations with 20% buffer margins

Quality Notes

Clear scope boundaries: explicitly distinguishes HPA/VPA/NAP/manual scaling to avoid feature confusion
Excellent best practices section covering metric conflicts, PDBs, VPA restart handling, and stabilization windows
Golden path clearly defined in table format with rationale (OPTIMIZE_UTILIZATION for cost, VPA enabled by default, NAP enabled)
Supporting YAML templates provided (hpa-example.yaml, vpa-example.yaml) with inline comments explaining safest starting points (VPA Off mode)
Comprehensive rightsizing workflow with condition-based recommendations and risk levels
Good attention to guardrails: warns against Auto mode without graceful shutdown, requires >=2 replicas for VPA evictions
Explicitly documents MCP tool mapping (get_k8s_resource, apply_k8s_manifest, etc.) with fallback kubectl equivalents
Update modes clearly explained (Off, Initial, Auto, InPlaceOrRecreate) with GKE version requirements noted
Table comparing autoscaling profiles helps users understand trade-offs
Explicit note: do not use for static cluster sizing or node-level machine styles (cross-references gke-compute-classes skill)
Missing: no documented cluster prerequisites (e.g., metrics-server required for HPA, VPA requires controller setup), no troubleshooting section for common failures (HPA unable to compute metrics, VPA stuck in status.conditions)

Model: claude-haiku-4-5-20251001Analyzed: Jun 24, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

gke-scaling