Getting a Handle on AI HallucinationsGetting a Handle on AI Hallucinations

When AI starts to see things, it's time to take immediate corrective action. Here's how to successfully address this challenging -- and potentially dangerous -- phenomenon.

John Edwards, Technology Journalist & Author

November 11, 2024

5 Min Read

Abstract background of cyborg face and technology.Big data and learning machine.Algorithm programming and artificial intelligence concept

Carloscastilla via Alamy Stock Photo

AI hallucination occurs when a large language model (LLM) -- frequently a generative AI chatbot or computer vision tool -- perceives patterns or objects that are nonexistent or imperceptible to human observers, generating outputs that are either inaccurate or nonsensical.

AI hallucinations can pose a significant challenge, particularly in high-stakes fields where accuracy is crucial, such as the energy industry, life sciences and healthcare, technology, finance, and legal sectors, says Beena Ammanath, head of technology trust and ethics at business advisory firm Deloitte. With generative AI's emergence, the importance of validating outputs has become even more critical for risk mitigation and governance, she states in an email interview. "While AI systems are becoming more advanced, hallucinations can undermine trust and, therefore, limit the widespread adoption of AI technologies."

Primary Causes

AI hallucinations are primarily caused by the nature of generative AI and LLMs, which rely on vast amounts of data to generate predictions, Ammanath says. "When the AI model lacks sufficient context, it may attempt to fill in the gaps by creating plausible sounding, but incorrect, information." This can occur due to incomplete training data, bias in the training data, or ambiguous prompts, she notes.

LLMs are generally trained for specific tasks, such as predicting the next word in a sequence, observes Swati Rallapalli, a senior machine learning research scientist in the AI division of the Carnegie Mellon University Software Engineering Institute. "These models are trained on terabytes of data from the Internet, which may include uncurated information," she explains in an online interview. "When generating text, the models produce outputs based on the probabilities learned during training, so outputs can be unpredictable and misrepresent facts."

Detection Approaches

Depending on the specific application, hallucination metrics tools, such as AlignScore, can be trained to capture any similarity between two text inputs. Yet automated metrics don't always work effectively. "Using multiple metrics together, such as AlignScore, with metrics like BERTScore, may improve the detection," Rallapalli says.

Another established way to minimize hallucinations is by using retrieval augmented generation (RAG), in which the model references the text from established databases relevant to the output. "There's also research in the area of fine-tuning models on curated datasets for factual correctness," Rallapalli says.

Yet even using existing multiple metrics may not fully guarantee hallucination detection. Therefore, further research is needed to develop more effective metrics to detect inaccuracies, Rallapalli says. "For example, comparing multiple AI outputs could detect if there are parts of the output that are inconsistent across different outputs or, in case of summarization, chunking up the summaries could better detect if the different chunks are aligned with facts within the original article." Such methods could help detect hallucinations better, she notes.

Ammanath believes that detecting AI hallucinations requires a multi-pronged approach. She notes that human oversight, in which AI-generated content is reviewed by experts who can cross-check facts, is sometimes the only reliable way to curb hallucinations. "For example, if using generative AI to write a marketing e-mail, the organization might have a higher tolerance for error, as faults or inaccuracies are likely to be easy to identify and the outcomes are lower stakes for the enterprise," Ammanath explains. Yet when it comes to applications that include mission-critical business decisions, error tolerance must be low. "This makes a 'human-in the-loop', someone who validates model outputs, more important than ever before."

Hallucination Training

The best way to minimize hallucinations is by building your own pre-trained fundamental generative AI model, advises Scott Zoldi, chief analytics officer at analytics software company FICO. He notes, via email, that many organizations are now already using, or planning to use, this approach utilizing focused-domain and task-based models. "By doing so, one can have critical control of the data used in pre-training -- where most hallucinations arise -- and can constrain the use of context augmentation to ensure that such use doesn't increase hallucinations but re-enforces relationships already in the pre-training."

Outside of building your own focused generative models, one needs to minimize harm created by hallucinations, Zoldi says. "[Enterprise] policy should prioritize a process for how the output of these tools will be used in a business context and then validate everything," he suggests.

A Final Thought

To prepare the enterprise for a bold and successful future with generative AI, it's necessary to understand the nature and scale of the risks, as well as the governance tactics that can help mitigate them, Ammanath says. "AI hallucinations help to highlight both the power and limitations of current AI development and deployment."

About the Author

John Edwards

Technology Journalist & Author

John Edwards is a veteran business technology journalist. His work has appeared in The New York Times, The Washington Post, and numerous business and technology publications, including Computerworld, CFO Magazine, IBM Data Management Magazine, RFID Journal, and Electronic Design. He has also written columns for The Economist's Business Intelligence Unit and PricewaterhouseCoopers' Communications Direct. John has authored several books on business technology topics. His work began appearing online as early as 1983. Throughout the 1980s and 90s, he wrote daily news and feature articles for both the CompuServe and Prodigy online services. His "Behind the Screens" commentaries made him the world's first known professional blogger.

See more from John Edwards

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Recent in More

Getting a Handle on AI HallucinationsGetting a Handle on AI Hallucinations

Primary Causes

Detection Approaches

Hallucination Training

A Final Thought

About the Author

Editor's Choice

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Recent in More

<span class="ArticleBase-LargeTitle">Getting a Handle on AI Hallucinations</span>Getting a Handle on AI HallucinationsGetting a Handle on AI Hallucinations

Primary Causes

Detection Approaches

Hallucination Training

A Final Thought

About the Author

Editor's Choice

Getting a Handle on AI HallucinationsGetting a Handle on AI Hallucinations