Capacity planning as fuel reserves

You can schedule Pods until nodes are full. That does not mean you should. I learned that lesson twice: once in a simulator when we ran the tanks dry on paper and still “landed,” and again years later when a marketing email hit a cluster that had no room left to breathe. The second time hurt more because I was supposed to know better.

Rows of servers in a data centre

Photo by Tom Swinnen on Pexels

Headroom is not waste

In flight planning, fuel reserves are not pessimism. They are the margin between a plan and reality. Weather diverts you. ATC holds you. An instrument approach eats more gas than the brief suggested. Nobody brags about landing with extra fuel, but plenty of reports mention people who wished they had it.

Clusters behave the same way, only the units are different. Surge during deploys means new Pods spin up before old ones terminate. A node failure means workloads need somewhere to land without a midnight rebuild. Traffic spikes mean the Horizontal Pod Autoscaler needs headroom before scheduling turns into a fight over the last half-empty node. Storage fills up. IP pools exhaust. These are ordinary events, not black swans.

I used to treat unused capacity as money left on the table. Finance teams often agree. That is fair up to a point. The mistake is measuring efficiency only at rest. Production is not at rest. It is a sequence of changes, failures, and Tuesdays.

What reserves look like on the ground

Alternate fuel in aviation has names and categories: trip fuel, contingency, alternate, final reserve, extra if the captain wants it. Operations teams translate that into policies. In Kubernetes, I think in layers:

Baseline headroom on nodes. I want enough allocatable CPU and memory left after normal scheduling that a single node loss does not immediately create a pile of Pending Pods. “Normal” here means honest resource requests, not fantasy numbers copied from a slide deck two years ago.

Deploy surge. Rolling updates, maxSurge, and blue-green cutovers all assume temporary duplication. If your cluster runs at ninety-eight percent allocation steady state, a routine Deployment can look like an outage.

Autoscaling runway. HPA needs nodes to land on, or you need time for the cluster autoscaler to add capacity. If max replicas times per-Pod requests exceeds what the fleet can run, you have planned a go-around without leaving airspace to do it.

Dependencies with their own tanks. Databases, message brokers, and API rate limits are fuel you do not store on the node. I have seen perfect node headroom while Postgres connection pools or upstream SaaS throttles became the actual ceiling.

None of this is glamorous work. It is spreadsheets, dashboards, and occasional arguments with people who want the cluster “right-sized” to the penny. I still prefer that argument to the one where we explain why checkout was down during a sale.

Signals I actually watch

Metrics are easy to drown in. I try to keep a short list that maps to decisions, not vanity.

Allocation versus capacity, not just usage. Kubernetes scheduling looks at requests. A node at forty percent CPU usage can still be unschedulable if requests are pinned high. I watch allocatable versus requested CPU and memory per node, and I pay attention to imbalance across the fleet.

Pending Pods and scheduling events. A Pending Pod is not always an emergency, but a rising count is the equivalent of low fuel lights. I read events: insufficient cpu, insufficient memory, volume topology, taints. The message is usually blunt if you look.

HPA max replicas versus physical ceiling. If the autoscaler wants ten replicas but the cluster can only run eight at current requests, you have built a glass ceiling. The HPA will try; scheduling will refuse; on-call will learn.

Storage and IP exhaustion. These fail quietly until they fail loudly. PVC usage growth, subnet IP consumption, and load balancer quotas belong in the same mental folder as fuel: boring until they are not.

Latency and error budgets under load tests. I do not treat a one-off Gatling run as gospel for everyday sizing, but I do treat it as a stress test for reserves. If peak test traffic pins nodes and raises p99 latency with no node failures simulated, everyday traffic plus one bad deploy will be worse.

I am not claiming this list is complete. It is what I return to when I am tired and someone asks, “Are we fine?”

Requests, limits, and honest math

Under-provisioned requests make scheduling look healthy until every Pod competes at once. Over-provisioned requests waste money and hide real utilization. The middle path is boring: measure actual usage over time, set requests near sustained need, set limits to protect neighbors, revisit quarterly or after major releases.

I have made every mistake in that sentence. I copied requests from a tutorial. I set limits equal to requests “for safety” and wondered why nodes stranded CPU. I left requests unset and discovered why the scheduler is not your friend.

Limits are not free throttle pedals. They are guardrails. Requests are promises the scheduler believes. Treating them as decoration breaks capacity planning the same way using minimum fuel on a form breaks flight planning: the paperwork works until it does not.

For bursty workloads, Vertical Pod Autoscaler or rightsizing tools help if you trust them and read the recommendations instead of auto-applying blindly. I have seen VPA suggest changes that were correct statistically and wrong for a specific release window. Automation still wants judgment.

How I plan without pretending to predict the weather

I do not forecast traffic perfectly. Nobody I know does. What I can do is define scenarios and attach reserves:

Lose one node. Does everything still schedule? If not, how fast can we add capacity, and is that fast enough for the service level we promised?

Routine deploy at peak. Run the rollout math: old replicas plus new replicas plus surge settings. If the answer exceeds allocatable resources, change the rollout or add headroom before Friday.

Traffic step change. Marketing calendars, product launches, school nights for consumer apps. Ask early, even if the answer is a shrug. A shrug with dates is better than surprise.

Regional or zone loss. If you run multi-zone, test whether your idea of spread matches reality. Anti-affinity rules that look good in YAML still fail if image pull secrets, GPU counts, or PVC topology pin you to one zone.

I write these down in a short doc nobody reads until they need it. That is fine. The act of writing forces me to notice gaps.

When “efficient” clusters hurt people

The worst capacity incidents I have been near were not mysterious. They were policy choices.

Running nodes at sustained high allocation to save money. Turning off cluster autoscaling to avoid surprise bills. Setting HPA max replicas to “what we hope we need” instead of what physics allows. Treating stateful services like stateless ones for surge math. Assuming cloud provider APIs always succeed quickly when you are already in trouble.

On the human side, on-call engineers get told to “scale it” while the dashboard shows nowhere to scale. That is a cruel position. Reserves are partly kindness to the person holding the pager.

What I still get wrong

I treat peak load test numbers as everyday capacity more often than I admit. One good test at fifty thousand virtual users does not mean we need that footprint on a quiet morning, but it should inform how much reserve we want before marketing sends an email. I have forgotten that step.

I also underestimate non-compute ceilings: API gateways, DNS TTLs, certificate issuance, egress NAT ports. The cluster has room; the path out does not.

Finally, I confuse utilization with health. Busy nodes can be healthy. Idle nodes can be misconfigured. The question is whether the system can absorb the next ordinary bad thing without a hero.

Closing thought

Capacity work is unglamorous. So are fuel calculations. Both beat explaining why you had nowhere to land.

I am still learning the cloud-native versions of lessons aviation drilled early: plans are fiction, margins are real, and the goal is not to use every drop but to arrive with options intact. If you are sizing a cluster this week, ask one extra question before you declare victory: what happens if we lose one node and deploy at the same time? If the honest answer makes you uncomfortable, that discomfort is the reserve talking. Listen to it.