Per-minute vs reserved GPU pricing: when each model wins

CClodei teamApril 15, 20264 min read

Vendor calculators like to draw the per-minute vs reserved decision as a clean line. Above 60% utilisation, reserved wins. Below 60%, per-minute wins. Directionally fine, and almost useless in practice. The honest answer depends on three variables that the calculators usually hide.

The headline numbers

A typical 1-year reserved A100 80GB on a hyperscaler costs around 60% of the per-hour rate. A 3-year commitment lands near 50%. Per-minute (or per-hour, depending on the vendor) is the on-demand baseline.

If you keep a single A100 100% busy 24/7, the 3-year reservation buys you about 50% off. That is a real saving and the right call when the workload is genuinely steady.

The trap is that "100% busy 24/7" rarely matches what teams actually do.

Three variables that flip the math

1. Utilisation, measured properly

Most teams overestimate utilisation by 2–4x. A model that "trains overnight" usually trains for around 7 hours and the GPU sits idle the other 14. A serving cluster sized for peak traffic often runs at 30% of peak in the valleys.

The honest measurement is GPU-hours billed divided by GPU-hours reserved. Anything below 70% means your reservation is subsidising the on-demand price for everyone else.

2. How volatile your workload looks

Reservations assume the workload looks roughly the same week to week. Many real workloads don't. Some examples:

A research team with no fixed sprint schedule. Some weeks 5 GPU-days, others 0.5.
A SaaS with a marketing campaign that quadruples traffic for two weeks then drops back.
A new product still finding traction, with no reliable demand baseline.

For volatile workloads, per-minute pricing is insurance you didn't have to buy in advance. The premium over reservation is usually smaller than what you'd pay for capacity you never used.

3. Switching cost

Reservations look cheap on paper but they lock you in. If a better GPU lands (the H200 to B200 transition is a textbook example), the reserved fleet on older hardware becomes a liability you have to amortise instead of a tool you can drop in a week.

Per-minute teams swap hardware in days. Reserved teams negotiate exit clauses with sales reps.

A more useful framework

Forget the 60% line. Ask three questions instead.

Can you predict GPU demand a quarter ahead with ±10% accuracy? If yes, you're a candidate for reservation. If no, you're not.

What is the alternative use of that money? If you'd otherwise spend it on more headcount or a bigger experiment, the marginal value of capital is high. Reservations consume capital up front.

How fast does your hardware tier rotate? Frontier compute has historically rotated every 18–24 months. A 3-year reservation on yesterday's GPU is rarely a good trade.

If the answers are "no", "high", or "fast" on any of those, default to per-minute. The headline saving from reservation is real but small compared with the optionality you give up.

The hybrid play

The advanced version is a small reservation for the predictable baseline plus per-minute for the spiky top.

Reserve at the level your workload actually sustains. Usually 30–50% of peak. Burst into per-minute for everything above the reservation. Re-evaluate the reserved size every quarter.

That setup tends to cut 20–30% off the total bill while keeping the optionality where it matters. The catch is operational: you need cost reporting that can split usage across the two pools, which most teams don't have on day one.

Per-minute economics in EU GPU specialists

The vendors specialising in per-minute pricing (us included) have one structural advantage over hyperscalers. We don't backfill our spot pool from on-demand customers. The per-minute rate is the rate. There is no surge multiplier when the pool runs hot. There is no eviction penalty when a higher bidder wants the slot.

That predictability matters more than people expect. Burst-friendly workloads that try to live on hyperscaler spot end up budgeting for the on-demand rate anyway, because spot runs out exactly when you need it. EU specialists tend to be priced halfway between hyperscaler spot and hyperscaler on-demand, with availability closer to the on-demand side.

The decision in one sentence

Reserved is the right call when utilisation is high, predictable, and the hardware will still be relevant at the end of the term. Per-minute is the right call for everything else, which is most workloads most of the time. Don't let the headline percentage saving tempt you into pretending you're in the first category when you aren't.