Intent design is still 80% of the job.
The model behind the conversation matters less than getting the intent taxonomy right. If your bot can't classify what the user actually wants, no amount of model upgrade saves it. That was true with rule-based NLU, it stayed true with LSTMs, and it's still true with frontier LLMs.
Hallucination control is a design problem.
The right answer is rarely "tune the model." It's usually "narrow the scope, ground the answer, fall back gracefully." The bot running on this site has a fixed knowledge brief and a single fallback sentence for everything outside that brief. The fallback is the feature.
Latency budgets shape the product.
A response under 800ms feels conversational. Over two seconds, the conversation breaks. That budget dictates retrieval strategy, model choice, and streaming approach more than any benchmark chart.
What changed, and what didn't.
LLMs raised the ceiling on what these systems can do. The floor still needs the same engineering discipline: intents, grounding, latency, fallbacks. The model is the engine. Everything else is still the product.
Treat the model as the engine, not the product.