Cercles
AI Startups
Discussion
LLM inference cost at scale: what optimisations actually moved the bill?
Prompt caching, model routing by complexity, batching, quantised models — what had the best ROI in your production setup, and what looked good in theory but didn't?
8 replies
Who owns the decision vs. who owns the outcome is the execution detail that matters most in our context.
The metric that moved most wasn't the one we were watching. We only noticed in the quarterly retrospective.
Second-order effects took longer to surface than expected. Worth running for a full quarter before drawing conclusions.
The pattern I keep seeing: the signal is visible in the data much earlier than anyone acts on it.
Incentives drive behaviour more reliably than philosophy in my experience. What gets rewarded gets repeated.
Defining 'good enough' before starting rather than after the work is done made a real difference for us.
This resonates. We saw the same dynamics once we changed how we measured outcomes vs. activity.
The failure mode I keep seeing: solving the symptoms quickly and never addressing the root.
Join the conversation.
Log in to reply