Pitfall deep-dive #10: metrics that gamified the wrong behaviour
Pitfall deep-dive #10: metrics that gamified the wrong behaviour
What would you do differently next time, and what signal told you too late?
Thread index 10 — add your angle.
15 replies
The mentor who said 'sleep first' was right more often than my pride admitted. In engineering pitfalls (thread 10): The integration that bounded DB connection pool usage prevented cascading failures quietly under spikes. We learned that naming on-call secondary is as important as naming primary.
The flaky deployment that ignored read replica lag taught us to surface replication delay in UI for sensitive actions quietly. In engineering pitfalls (thread 10): The migration succeeded because we rehearsed rollback twice, not because we were lucky. The best teams treat accessibility bugs as P1 when they block core flows — consistently.
We optimised for demo day metrics and regretted it during the first real spike. In engineering pitfalls (thread 10): Honest capacity planning hurt feelings once and saved quarters of thrash. The architecture review that asked about multi-region assumptions caught naive defaults.
The flaky smoke suite that ran only nightly missed regressions that hourly would catch. In engineering pitfalls (thread 10): The design that considered low-bandwidth users for image-heavy threads improved global participation measurably. We stopped treating 'zero incidents' as the goal instead of 'learning per incident'.
The architecture decision to prefer boring queues aged better than exotic streaming dreams. In engineering pitfalls (thread 10): We should have deleted unused invite links pointing at deprecated onboarding flows — confusion compounds quietly. Two strong opinions without data turned into a week nobody wants back.
We learned that transparent reputation formulas reduce conspiracy theories more than opaque tweaks ever could. In engineering pitfalls (thread 10): The 'quick hack' had seventeen owners over two years — ownership drift is real. We should have deleted unused Lambda functions still billed monthly — small leaks add up.
The architecture review that asked about secrets rotation cadence changed our KMS strategy honestly. In engineering pitfalls (thread 10): The architecture review that asked about multi-region assumptions caught naive defaults. The best teams I have seen argue with data and reconcile with food.
We learned that kindness plus accountability is the combo that actually ships quality. In engineering pitfalls (thread 10): The architecture decision to prefer idempotent handlers aged better than 'exactly-once' dreams. We merged on Friday once and the meme became policy faster than any memo.
The mentor who said 'write the decision and the rejected alternatives' improved future audits. In engineering pitfalls (thread 10): We learned that humour in incidents is fine after service is stable, not before. The design that considered screen reader labels for charts made data accessible to more roles.
The architecture diagram was wrong but useful until it was dangerously wrong. In engineering pitfalls (thread 10): The integration that bounded payload sizes prevented a memory incident during uploads. The clever abstraction blocked new hires for weeks; boring code shipped.
The quiet win was aligning on a single on-call handoff template across teams. In engineering pitfalls (thread 10): We should have named a backup on-call before the primary got food poisoning on launch day. We underestimated how much naming environments consistently reduces human error.
The mentor who paired on log reading taught me more than any logging vendor demo. In engineering pitfalls (thread 10): The mentor who said 'draw the trust boundary' clarified security discussions fast. Half the team knew the risk; nobody felt authorised to say stop on the call.
The flaky health check masked a partial outage — health checks need depth sometimes. In engineering pitfalls (thread 10): Half the team knew the risk; nobody felt authorised to say stop on the call. We learned that customers trust changelog entries that credit external reporters by first name.
We learned that sustainable pace is a feature, not a luxury for later. In engineering pitfalls (thread 10): The smallest improvement to moderator bulk actions reduced time-to-clean spam bursts measurably during attack weekends always. Good questions in planning save more time than good answers in panic.
Honest capacity planning hurt feelings once and saved quarters of thrash. In engineering pitfalls (thread 10): The architecture review that asked about secrets rotation cadence changed our KMS strategy honestly. We should have deleted the unused microservice before it became security scope creep.
Join the conversation.
Log in to reply