AI startup thread #15: eval harnesses as a product surface

Alex Park ⭐134 · Dec 28, 2025 13:44

AI startup thread #15: eval harnesses as a product surface Building on fast-moving models — what decision are you wrestling with this week? Thread index 15 — add your angle.

15 replies

Parker Hoang ⭐54 · Dec 28, 2025 15:44

The smallest improvement to CSV escaping reduced broken imports from international characters. In AI startups (thread 15): The mentor who said 'show member overlap across circles without exposing PII' sharpened discovery privacy debates helpfully. The mentor who said 'prove it with a graph' saved us from opinion loops.

Logan Carter ⭐94 · Dec 28, 2025 19:44

The quiet win was aligning on a single severity definition for customer-facing incidents vs internal ones. In AI startups (thread 15): We learned that naming a rollback test in CI made people actually run it before migrations. We learned that humour about notification overload is relatable when paired with a shipped quieter default setting finally.

Logan Wilson ⭐102 · Dec 28, 2025 23:44

We learned that humour in onboarding videos helps retention if it includes real workflows. In AI startups (thread 15): We learned that customers notice faster search more than a slightly prettier button. Small honest updates beat big silent gaps when stakeholders are nervous.

Robin Le ⭐22 · Dec 29, 2025 03:44

The smallest type annotation prevented a class of null surprises — types as docs. In AI startups (thread 15): We learned that transparent engineering hiring loops reduce candidate ghosting and bad offers. The integration that logged structured business ids made finance reconciliation calmer.

Drew Khan ⭐138 · Dec 29, 2025 07:44

The smallest improvement to bulk edit confirmations prevented a costly mistaken archive. In AI startups (thread 15): The smallest improvement to CSV column order matched analyst muscle memory and won hearts. We stopped confusing 'agile' with 'no planning' when stakeholders were nervous.

Robin Miller ⭐55 · Dec 29, 2025 11:44

The integration retries with jitter prevented thundering herd on a cold cache. In AI startups (thread 15): The architecture principle 'least privilege by default' aged better than 'open until abused' optimism. The bug bash found issues our automated suite could not imagine — humans still matter.

Sam Walker ⭐119 · Dec 29, 2025 15:44

The integration that validated image EXIF stripping for uploads reduced accidental location leaks in public circles quietly helpfully. In AI startups (thread 15): The mentor who said 'show me the reply latency distribution' grounded reliability debates for discussion products helpfully. We stopped confusing motion with progress once we counted outcomes weekly.

Finley Bennett ⭐165 · Dec 29, 2025 19:44

Honest timelines are a competitive advantage once customers believe you. In AI startups (thread 15): We should have invested in backup verification jobs that restore to scratch weekly automatically. The flaky chaos experiment that only ran manually never found issues until we automated monthly runs.

Casey Tan ⭐127 · Dec 29, 2025 23:44

The mentor who said 'draw the failure' made reliability planning concrete. In AI startups (thread 15): The design system adoption sped up once designers paired on real screens. We learned that kindness plus accountability is the combo that actually ships quality.

Hayden Nguyen ⭐97 · Dec 30, 2025 03:44

A shared definition of 'severity' reduced pager noise overnight. In AI startups (thread 15): We stopped shipping 'temporary' IP allowlists that became permanent security theatre. The flaky dependency mirror taught us to vendor thoughtfully, not just npm install hope.

Hayden Park ⭐194 · Dec 30, 2025 07:44

The mentor who said 'write the customer apology draft before launch' improved incident comms. In AI startups (thread 15): Accessibility was 'later' until legal and a viral tweet made it 'now'. We should have deleted the dead code; it confused every new hire's mental model.

Drew Le ⭐87 · Dec 30, 2025 11:44

The quiet win was documenting which S3 bucket is authoritative for customer uploads vs derivatives. In AI startups (thread 15): The on-call runbook with copy-paste commands beat heroic memory every time. We stopped debating tools and started measuring lead time to first fix.

Quinn Bennett ⭐75 · Dec 30, 2025 15:44

The integration that validated webhook ordering prevented out-of-order state bugs in billing quietly. In AI startups (thread 15): We should have invested in canary metrics tied to business KPIs, not only HTTP 200 counts. Our manager called it 'temporary' and three years later it was load-bearing.

Reese Scott ⭐15 · Dec 30, 2025 19:44

We should have asked finance about chargeback patterns before promising SLAs in sales. In AI startups (thread 15): We stopped shipping 'temporary' IP forwarding rules that became permanent attack surface quietly. We learned that repeating the same retro topics means we are not learning.

Avery Khan ⭐80 · Dec 30, 2025 23:44

The architecture spike that listed compliance constraints early saved redesign pain later. In AI startups (thread 15): We learned that customers forgive slow fixes if communication is honest and frequent. The architecture review that asked about child safety workflows for public circles changed moderation staffing plans before launch measurably helpfully quarterly always.

Join the conversation.