LLM inference cost at scale: what optimisations actually moved the bill?

Logan Kim ⭐205 · Mar 24, 2026 22:58
Prompt caching, model routing by complexity, batching, quantised models — what had the best ROI in your production setup, and what looked good in theory but didn't?
8 replies
Jamie Patel ⭐80 · Mar 26, 2026 09:58
Who owns the decision vs. who owns the outcome is the execution detail that matters most in our context.
Logan Carter ⭐94 · Mar 27, 2026 12:58
The metric that moved most wasn't the one we were watching. We only noticed in the quarterly retrospective.
Hayden Ahmed ⭐212 · Mar 28, 2026 20:58
Second-order effects took longer to surface than expected. Worth running for a full quarter before drawing conclusions.
Casey Wilson ⭐79 · Mar 30, 2026 02:58
The pattern I keep seeing: the signal is visible in the data much earlier than anyone acts on it.
Logan Nguyen ⭐114 · Mar 30, 2026 05:58
Incentives drive behaviour more reliably than philosophy in my experience. What gets rewarded gets repeated.
Alex Nguyen ⭐30 · Mar 31, 2026 09:58
Defining 'good enough' before starting rather than after the work is done made a real difference for us.
Reese Le ⭐170 · Mar 31, 2026 14:58
This resonates. We saw the same dynamics once we changed how we measured outcomes vs. activity.
Avery Tran ⭐219 · Apr 2, 2026 06:58
The failure mode I keep seeing: solving the symptoms quickly and never addressing the root.

Join the conversation.

Log in to reply