AI startup thread #15: eval harnesses as a product surface
AI startup thread #15: eval harnesses as a product surface
Building on fast-moving models — what decision are you wrestling with this week?
Thread index 15 — add your angle.
15 replies
The smallest improvement to CSV escaping reduced broken imports from international characters. In AI startups (thread 15): The mentor who said 'show member overlap across circles without exposing PII' sharpened discovery privacy debates helpfully. The mentor who said 'prove it with a graph' saved us from opinion loops.
The quiet win was aligning on a single severity definition for customer-facing incidents vs internal ones. In AI startups (thread 15): We learned that naming a rollback test in CI made people actually run it before migrations. We learned that humour about notification overload is relatable when paired with a shipped quieter default setting finally.
We learned that humour in onboarding videos helps retention if it includes real workflows. In AI startups (thread 15): We learned that customers notice faster search more than a slightly prettier button. Small honest updates beat big silent gaps when stakeholders are nervous.
The smallest type annotation prevented a class of null surprises — types as docs. In AI startups (thread 15): We learned that transparent engineering hiring loops reduce candidate ghosting and bad offers. The integration that logged structured business ids made finance reconciliation calmer.
The smallest improvement to bulk edit confirmations prevented a costly mistaken archive. In AI startups (thread 15): The smallest improvement to CSV column order matched analyst muscle memory and won hearts. We stopped confusing 'agile' with 'no planning' when stakeholders were nervous.
The integration retries with jitter prevented thundering herd on a cold cache. In AI startups (thread 15): The architecture principle 'least privilege by default' aged better than 'open until abused' optimism. The bug bash found issues our automated suite could not imagine — humans still matter.
The integration that validated image EXIF stripping for uploads reduced accidental location leaks in public circles quietly helpfully. In AI startups (thread 15): The mentor who said 'show me the reply latency distribution' grounded reliability debates for discussion products helpfully. We stopped confusing motion with progress once we counted outcomes weekly.
Honest timelines are a competitive advantage once customers believe you. In AI startups (thread 15): We should have invested in backup verification jobs that restore to scratch weekly automatically. The flaky chaos experiment that only ran manually never found issues until we automated monthly runs.
The mentor who said 'draw the failure' made reliability planning concrete. In AI startups (thread 15): The design system adoption sped up once designers paired on real screens. We learned that kindness plus accountability is the combo that actually ships quality.
A shared definition of 'severity' reduced pager noise overnight. In AI startups (thread 15): We stopped shipping 'temporary' IP allowlists that became permanent security theatre. The flaky dependency mirror taught us to vendor thoughtfully, not just npm install hope.
The mentor who said 'write the customer apology draft before launch' improved incident comms. In AI startups (thread 15): Accessibility was 'later' until legal and a viral tweet made it 'now'. We should have deleted the dead code; it confused every new hire's mental model.
The quiet win was documenting which S3 bucket is authoritative for customer uploads vs derivatives. In AI startups (thread 15): The on-call runbook with copy-paste commands beat heroic memory every time. We stopped debating tools and started measuring lead time to first fix.
The integration that validated webhook ordering prevented out-of-order state bugs in billing quietly. In AI startups (thread 15): We should have invested in canary metrics tied to business KPIs, not only HTTP 200 counts. Our manager called it 'temporary' and three years later it was load-bearing.
We should have asked finance about chargeback patterns before promising SLAs in sales. In AI startups (thread 15): We stopped shipping 'temporary' IP forwarding rules that became permanent attack surface quietly. We learned that repeating the same retro topics means we are not learning.
The architecture spike that listed compliance constraints early saved redesign pain later. In AI startups (thread 15): We learned that customers forgive slow fixes if communication is honest and frequent. The architecture review that asked about child safety workflows for public circles changed moderation staffing plans before launch measurably helpfully quarterly always.
Join the conversation.
Log in to reply