Cast AI report finds 5% GPU use in Kubernetes clusters

Wed, 22nd Apr 2026

Cast AI has published its 2026 State of Kubernetes Optimisation Report, which found average GPU utilisation of 5% across 23,000 Kubernetes clusters.

The findings point to wide gaps between allocated and used infrastructure in environments running AI and machine learning workloads. Organisations are assigning about 20 times more GPU capacity than they actively use, while average CPU utilisation is 8% and memory utilisation is 20%.

The data adds to concerns about the cost of AI infrastructure, as businesses buy access to scarce, expensive graphics processors but fail to keep them busy. Across the clusters analysed, roughly 95% of GPU capacity sat unused on average, a level of waste that Cast AI argues now carries a larger financial penalty than traditional cloud inefficiency.

Rising Cost

Kubernetes has become a standard way to run containerised applications across cloud environments, and its growing role in AI has increased the number of clusters using GPU-equipped nodes. The report argues that efficiency has not improved with wider adoption and that the expansion of AI workloads is adding a more expensive layer to an existing problem.

Low CPU and memory utilisation have long been associated with overprovisioned cloud systems, but GPUs change the economics because the hardware is far more expensive. As a result, idle resources now hit infrastructure budgets harder than in earlier phases of cloud adoption.

The report also challenges the idea that infrastructure tuning can be handled as a one-off exercise during deployment. It argues that changing workloads, shifting traffic patterns, spot instance selection, autoscaler settings, commitment use, and node lifecycle management all require continuous adjustment rather than periodic manual review.

Cast AI presents the issue as one of operational discipline as much as technical set-up, arguing that many organisations still rely on static configurations even as demand changes over time.

A quote in the report highlighted the scale of the cost issue:

"A GPU sitting idle costs dollars per hour. A CPU sitting idle costs cents. And 95% of GPU capacity is doing nothing. Cloud vendors just raised H200 prices 15%, breaking a 20-year trend of falling compute costs. That's not a configuration problem, it's a business emergency. Autonomous optimisation isn't a nice-to-have. It's the only rational response to infrastructure economics that are moving against you."

Spending Pressure

The findings come as companies face closer scrutiny over AI spending and pressure to show that infrastructure purchases are translating into productive use. Large language models, inference systems, and training workloads have driven a rush to secure GPU supply, but the report suggests much of that reserved capacity remains underused once deployed in Kubernetes environments.

That matters because many businesses have treated GPU access as a strategic constraint, often prioritising availability over efficiency. The Cast AI data suggests this approach may leave substantial amounts of costly compute stranded inside clusters configured for peak demand but operating far below those levels much of the time.

The report is based on Cast AI's analysis of real-world utilisation data across Kubernetes workloads and is intended to help engineering and infrastructure leaders identify where inefficiencies begin and what changes are needed to reduce them.

Its central argument is that the economics of AI infrastructure now make underutilisation harder to ignore. Across the clusters analysed, GPU utilisation averaged just 5%.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google