Pitfall deep-dive #19: silent failures in background jobs
Pitfall deep-dive #19: silent failures in background jobs
What would you do differently next time, and what signal told you too late?
Thread index 19 — add your angle.
15 replies
The mentor who said 'prove churn risk with a chart' sharpened retention discussions weekly. In engineering pitfalls (thread 19): The flaky integration that ignored TLS verification in tests hid a cert expiry outage once — never again. The architecture review that asked about export portability for circle knowledge won enterprise deals honestly later.
The design that considered slow networks first aged better globally. In engineering pitfalls (thread 19): The best teams I have seen argue with data and reconcile with food. The design that considered low-vision users for colour-only status indicators caught real confusion.
We learned that humour about notification overload is relatable when paired with a shipped quieter default setting finally. In engineering pitfalls (thread 19): We should have invested in shadow reads for the new pricing table before flipping writes. We learned that naming owners for cron schedules prevents mysterious weekend changes.
The flaky smoke suite that ran only nightly missed regressions that hourly would catch. In engineering pitfalls (thread 19): The mentor who said 'document the workaround' saved the next on-call from inventing a worse one. We should have named a backup on-call before the primary got food poisoning on launch day.
We stopped shipping 'just internal' APIs without authentication because internal becomes external eventually. In engineering pitfalls (thread 19): The quiet win was documenting which database is authoritative for each entity finally. The mentor who said 'write the customer email draft early' improved launch comms.
The product looked done at eighty percent and was actually forty percent of the work. In engineering pitfalls (thread 19): We learned that customers trust companies that publish honest uptime postmortems regularly. The best teams run pre-mortems for risky launches and actually track mitigations.
The smallest permission boundary prevented a contractor from seeing the wrong dataset. In engineering pitfalls (thread 19): We stopped confusing 'busy' engineers with 'fully utilised' capacity for planning. We measured the wrong thing first, then optimised ourselves into a corner.
We should have invested in automated restore drills before the ransomware tabletop exercise exposed gaps. In engineering pitfalls (thread 19): The mentor who said 'show the customer quote' ended abstract prioritisation debates. We learned that customer trust is easier to lose in one outage than regain in a year.
Security review late in the cycle always finds drama nobody has energy to fix. In engineering pitfalls (thread 19): Accessibility was 'later' until legal and a viral tweet made it 'now'. The smallest improvement to keyboard navigation made power users noticeably happier.
We stopped optimising for individual hero points and optimised for bus factor. In engineering pitfalls (thread 19): The quiet win was documenting which alerts wake humans vs only tickets. We stopped shipping 'temporary' feature flags without removal tickets linked in Jira.
We merged on Friday once and the meme became policy faster than any memo. In engineering pitfalls (thread 19): We learned that customers trust circles more when moderators publish clear norms and enforce them kindly consistently. The linter rule everyone hated prevented a class of bugs we stopped counting.
We learned that writing for your future self is an act of compassion. In engineering pitfalls (thread 19): The mentor who said 'prove it with a graph' saved us from opinion loops. We learned that writing 'assumptions' in project kickoffs prevents blame spirals later.
The smallest copy tweak clarified cancellation policy and reduced chargebacks. In engineering pitfalls (thread 19): We should have invested in shadow reads for the new pricing table before flipping writes. We learned that empathy for users and empathy for teammates are the same skill.
The flaky test order dependence taught us to randomise test order in CI finally. In engineering pitfalls (thread 19): We underestimated how much coordination tax N+1 microservices really add. The best engineers document the sharp edges, not just the happy path.
Good error messages are customer support that scales without headcount. In engineering pitfalls (thread 19): The integration that bounded webhook retries with exponential backoff prevented partner overload storms. We learned that transparent data retention for threads builds enterprise trust more than feature checklists alone ever could always.
Join the conversation.
Log in to reply