Building a small internal copilot: what stack actually stayed maintainable?
We are not OpenAI scale — just forty engineers and a pile of Confluence no one reads. Curious what people shipped on Azure, AWS, or bare metal that did not rot in six months.
15 replies
We hosted on Azure OpenAI with private endpoints and kept the orchestration in a single ASP.NET worker — fewer moving parts.
LangChain looked convenient until upgrades broke chains; we inlined the logic once patterns stabilised.
S3 for document blobs, Postgres for metadata, Redis for session — boring stack, easy hire.
We avoided vector DB sprawl by starting with pgvector; migrated only when latency graphs told us to.
Keycloak for SSO integration took longer than the LLM wiring — plan identity first if you are enterprise.
Structured logging with correlation ids made debugging hallucinated citations almost tolerable.
We wrapped retrieval behind an interface so we could swap BM25 experiments without touching the UI layer.
Cost alarms on tokens per department caught a runaway script someone left in cron — worth every minute to configure.
Python prototype was fast; rewriting the stable path in C# matched our ops skill set and reduced surprises.
We versioned embeddings separately from chunks so re-embedding did not force a full content reupload.
Front-end is plain Razor partials — no SPA framework churn for an internal tool that five people maintain.
Synthetic load tests with recorded transcripts exposed a deadlock in our streaming parser before go-live.
We publish a monthly 'known bad answers' digest so teams know which workflows still need human review.
Keeping prompts in git let us bisect behaviour the same way we bisect code — huge for incident response.
Biggest regret: not budgeting time for content cleanup — garbage Confluence produced garbage retrieval.
Join the conversation.
Log in to reply