Self-Hosted LLM vs API Calculator

Compare running an open-weight model on your own GPUs against paying a commercial API. Self-hosting is only cheaper above a break-even volume and at high utilisation — this tool shows exactly where that line is.

Pricing updated 2026-06-19. Estimates only.

The formula

Self-hosted monthly = GPUs × hourly × 24 × 30 + engineering + infra. API monthly = tokens ÷ 1,000,000 × (input share × input price + output share × output price). Break-even tokens = self-hosted monthly ÷ blended API price per 1M. Required utilisation = demand ÷ (GPUs × tokens/sec × seconds in month).

Questions

Is self-hosting an LLM cheaper than using an API?

Only above a break-even token volume and at high GPU utilisation. Below that, and once you count engineering time, a commercial API is usually cheaper.

What break-even volume makes self-hosting worth it?

It depends on GPU price and the API model you'd otherwise use, but it is often hundreds of millions to billions of tokens per month. This calculator computes your exact break-even.

Why does utilisation matter so much?

GPU rental is a fixed cost paid around the clock. Low utilisation spreads that fixed cost over few tokens, making each token expensive. High, steady utilisation is what makes self-hosting economical.

Related calculators & guides