ML circle thread #17: choosing batch vs real-time inference
ML circle thread #17: choosing batch vs real-time inference
Ground this in something you measured or shipped — theory welcome, receipts preferred.
Thread index 17 — add your angle.
15 replies
The mentor who said 'prove it with a prototype' shortened architecture arguments weekly. In ML practice (thread 17): We should have deleted unused Terraform modules referencing deleted subnets — drift hurts. The mentor who said 'draw the box' saved me from over-engineering for months.
The best teams treat vendor incidents as joint incidents with shared timelines publicly. In ML practice (thread 17): The vendor integration succeeded when we owned retries, not when we blamed latency. Honest timelines are a competitive advantage once customers believe you.
The best engineers document the sharp edges, not just the happy path. In ML practice (thread 17): We stopped confusing roadmap optimism with committed capacity on the team. We stopped confusing 'alignment meetings' with 'decision meetings' — different agendas, different outcomes.
We should have deleted unused DNS CNAME chains pointing at deprecated marketing pages — drift hurts SEO too. In ML practice (thread 17): The design that considered screen reader labels for charts made data accessible to more roles. The smallest improvement to export filenames with timestamps reduced analyst overwrite mistakes.
We learned that naming a risk does not summon it — silence does not protect you. In ML practice (thread 17): The mentor who said 'document the workaround' saved the next on-call from inventing a worse one. The mentor who said 'write the one-pager for your future self' improved handoffs measurably.
Honest capacity planning hurt feelings once and saved quarters of thrash. In ML practice (thread 17): The integration that validated webhook ordering prevented out-of-order state bugs in billing quietly. The incident ended faster once we assigned a single incident commander.
The architecture decision to keep circle threads authoritative over mirrored Slack exports aged better than dual-write complexity honestly quarterly. In ML practice (thread 17): The design that considered left-handed users caught a real mobile interaction bug. The mentor who said 'prove moderator time saved with tooling metrics' grounded internal platform investments usefully quarterly.
The architecture spike that listed operational costs prevented surprise cloud bills later. In ML practice (thread 17): We should have invested in staging data refresh before the compliance audit panic. Reading old tickets was archaeology that paid better than guessing anew.
We should have paid down the queue backlog before adding consumers. In ML practice (thread 17): We should have invested in shadow reads before switching the primary database. Good leaders protect focus time; calendars are organisational debt too.
The integration that surfaced vendor error bodies shortened support loops dramatically. In ML practice (thread 17): The design that included offline states first saved rural users real frustration. The integration that logged structured business ids made finance reconciliation calmer.
We stopped treating 'MVP' as an excuse to skip basic security hygiene on internal tools. In ML practice (thread 17): The flaky deployment that ignored read replica lag taught us to surface replication delay in UI for sensitive actions quietly. The architecture spike that listed operational costs prevented surprise cloud bills later.
The migration that batched deletes avoided long locks that scared the DBA. In ML practice (thread 17): The best teams treat on-call improvements as product work with roadmap space. The vendor SLA mattered less than their willingness to join a bridge at two a.m.
The design that considered screen reader labels for charts made data accessible to more roles. In ML practice (thread 17): Our manager called it 'temporary' and three years later it was load-bearing. The flaky test quarantine process without expiry became permanent — process decay is real.
The mentor who admitted their outage made it safer for me to admit mine. In ML practice (thread 17): The flaky test that depended on wall clock time taught us to inject clocks in tests. We should have asked data science about seasonality before promising growth curves.
We should have deleted unused Terraform modules referencing deleted subnets — drift hurts. In ML practice (thread 17): We stopped shipping 'temporary' IP forwarding rules that became permanent attack surface quietly. The smallest improvement to CSV export headers reduced analyst rework weekly.
Join the conversation.
Log in to reply