
You are shipping AI hallucinations. Here’s how to stop.
When the AI hallucinates, whose mistake is it?
It’s easy to point at the model.
Easy to blame hallucinations on the complexity of LLMs, the training data, or the pace of AI research. But what if the failure isn’t just computational?
What if the failure is procedural?
What if it’s ours?
We have inherited a powerful toolset capable of generating ideas, patterns, words, screens, and research insights within seconds. But with that, we’ve also inherited a critical risk: losing our critical lens when content appears fluent.
AI outputs often feel finished. They look polished. But polished does not equal truthful.
When AI hands us something persuasive, we too often skip the question:
Where did this come from?
And—can I trust it?
📌 What’s Inside
- Hallucinations aren’t random. They emerge from gaps we failed to fill.
- Are we designing for speed or for trust?
- Prompting is now a core design skill
- Accountability and traceability
🧠Hallucinations aren’t random. They emerge from gaps we failed to fill.
Let’s set the record straight: LLMs don’t “know” facts. They produce statistical likelihoods, not truths. When we leave gaps in prompts or context, they invent. And they do so confidently.
“Hallucinations occur when a generative AI system generates output data that seems plausible but is incorrect or nonsensical,” as Nielsen Norman Group succinctly warns.
Or, some name it plainly:
“Your AI is making shit up.”
Perhaps that’s blunt. But that bluntness lays bare the risk: when a model fabricates, and we don’t validate, we become complicit.
And it’s not rare.
Legal tools hallucinate 17–33% of the time.
AI dashboards register 13.5–19% inaccuracies, even optimistically.
Healthcare systems see 8–20% error rates.
High fluency gives false confidence. The output sounds right, so it gets through.
“Fundamentally, the problem is that LLMs don’t know when they’ve gotten something correct; they are NOT structured fact databases.
Its “knowledge” comes from having processed large parts of the internet (which, as you are probably aware, contains a lot of falsehoods) and learning patterns in that corpus.”
But when it hallucinates-that’s not a model issue. It’s a design filter failure.
⚡Are we designing for speed or for trust?
Speed is seductive. A neat-looking AI output fast-tracks design. But if its claims aren’t provable, its insights aren’t stable—and its scope creep is invisible.
We must ask ourselves:
Are we accelerating clarity or accelerating risk?
If your workflow involves AI-generated UX copy, research summaries, or UI suggestions without traceability, what you’re shipping is volatility, not vision.
🛠️Prompting is now a core design skill
We’ve built design systems to reduce ambiguity. We define tokens, components, and voice guidelines. We test patterns. Why do we trust prompt deja vu to behave any differently?
If prompt engineering remains ad hoc, your system has a silent blind spot: hallucinations.
Thought exercise: If the model invents a consent policy and legal catches it too late, did the model fail, or did the prompt?
“Treat every prompt like a design brief. If your prompt wouldn’t pass design review, neither should the output.”
This means the UX discipline must now formally include prompt architecture as part of its design system. Not ad hoc. Not siloed. Not treated as a “power user” skill. Prompting is now a core part of how we construct, test, and ship interface content.
So what belongs in a production-grade prompt?
1️⃣ Explicit role definition
For instance, start with: “You are a UX writer working within a medication reminder app for older adults.”
This primes the model toward behavioural patterns relevant to your audience and voice. NN/g confirms that a lack of role or user context increases hallucination likelihood by encouraging vague, generalised output.
2️⃣ Scoped task instruction
Don’t ask for “onboarding copy.” Ask for:
“3 onboarding microcopy snippets for screens 1–3 of our welcome flow. Each under 100 characters. Friendly, non-patronising, clear.”
Prompts that leave scope undefined invite the model to invent functionality or structure that doesn’t exist.
3️⃣ Constraints and system truth injection
We work within limits. So should AI. Supply content length caps, avoided terminology, tone rules and any other constraints you’re working with.
Don’t rely on the model’s training. Feed it your product feature specs, internal documentation snippets, legal pre-approved language blocks etc. Be as specific as you can be.
“LLMs don’t know your company, your users, or your systems. They know approximations.”
If your output is based on unstated assumptions, you’ve already lost control of the truth.
4️⃣ Output format control
Define what good looks like. Don’t just ask “write copy.” Instead:
Use tables with fields, force bullet point summaries, request JSON if feeding dev workflows, specify no brand names, no external links, or structured tags, etc
Structure is friction, but friction is what ultimately catches hallucinations.
🧾Accountability and traceability
It means designing with friction. Intentional, protective, visible friction.
Every AI output should be scrutinised on:
What assumptions did it make?
Where are the sources?
Who validated this before it entered the product?
Perhaps the crux is in treating AI hallucinations as default outputs, rather than edge cases, unless proven otherwise?
Generative AI tools are great for reducing tedium and filling skill gaps, but be careful about overextending your trust in these tools or overestimating their abilities. Users should interact with genAI only to the extent they can check the AI’s outputs using their own knowledge and skills.
AI sometimes hallucinates because it’s built to generate language, not to understand reality.
It doesn’t “know” things the way we do; it just predicts what words are likely to come next based on patterns it’s seen before.
So when we ask it something tricky, or something it hasn’t seen enough examples of, it fills in the blanks with whatever sounds plausible. We might say that’s lying, but rather, it’s guessing with confidence.
The problem is the confidence. Because when an AI speaks with authority, it’s easy to mistake its output for truth.
And unless we verify, it’s misleading by virtue.
If your interface implies a capability your product lacks, what will users conclude?
Unchecked hallucinations are ethical failures.
It’s therefore worth pausing to ask:
• Are our prompt templates clearly documented, version-controlled, and annotated for clarity?
• Do we require source material before generating outputs, or are we letting the model guess?
• Are we labelling confidence levels in AI-generated content so users know what’s solid and what’s speculative?
• Do we audit AI outputs with the same rigour we apply to localisation, accessibility, or UX writing?
If your team answered “no” to any of these, it’s a signal to rethink the approach. The conversation needs to evolve from “what can AI do for us?” to “what must we do to work responsibly with AI?”
AI will continue to hallucinate. That’s inevitable.
But shipping those hallucinations is a matter of choice.
Subscribe on Substack⬇️
If you’ve found this content valuable, here’s how you can show your support.⬇️❤️
You might also like:
📚 Sources & Further Reading
- Nielsen Norman Group – “AI Hallucinations: What Designers Need to Know”
- Nielsen Norman Group – “When Should We Trust AI? Magic-8-Ball Thinking”
- Wojciech Bolikowski – “Your AI Is Making Shit Up: The Brutal Reality of Hallucinations”
Share this article: