The formula
Self-hosted monthly = GPUs × hourly × 24 × 30 + engineering + infra. API monthly = tokens ÷ 1,000,000 × (input share × input price + output share × output price). Break-even tokens = self-hosted monthly ÷ blended API price per 1M. Required utilisation = demand ÷ (GPUs × tokens/sec × seconds in month).
Questions
Is self-hosting an LLM cheaper than using an API?
Only above a break-even token volume and at high GPU utilisation. Below that, and once you count engineering time, a commercial API is usually cheaper.
What break-even volume makes self-hosting worth it?
It depends on GPU price and the API model you'd otherwise use, but it is often hundreds of millions to billions of tokens per month. This calculator computes your exact break-even.
Why does utilisation matter so much?
GPU rental is a fixed cost paid around the clock. Low utilisation spreads that fixed cost over few tokens, making each token expensive. High, steady utilisation is what makes self-hosting economical.