LLM inference cost at scale: what optimisations actually moved the bill?

Logan Kim ⭐205 · Mar 24, 2026 22:58

Prompt caching, model routing by complexity, batching, quantised models — what had the best ROI in your production setup, and what looked good in theory but didn't?

8 replies

Jamie Patel ⭐80 · Mar 26, 2026 09:58

Who owns the decision vs. who owns the outcome is the execution detail that matters most in our context.

Logan Carter ⭐94 · Mar 27, 2026 12:58

The metric that moved most wasn't the one we were watching. We only noticed in the quarterly retrospective.

Hayden Ahmed ⭐212 · Mar 28, 2026 20:58

Second-order effects took longer to surface than expected. Worth running for a full quarter before drawing conclusions.

Casey Wilson ⭐79 · Mar 30, 2026 02:58

The pattern I keep seeing: the signal is visible in the data much earlier than anyone acts on it.

Logan Nguyen ⭐114 · Mar 30, 2026 05:58

Incentives drive behaviour more reliably than philosophy in my experience. What gets rewarded gets repeated.

Alex Nguyen ⭐30 · Mar 31, 2026 09:58

Defining 'good enough' before starting rather than after the work is done made a real difference for us.

Reese Le ⭐170 · Mar 31, 2026 14:58

This resonates. We saw the same dynamics once we changed how we measured outcomes vs. activity.

Avery Tran ⭐219 · Apr 2, 2026 06:58

The failure mode I keep seeing: solving the symptoms quickly and never addressing the root.

Join the conversation.