Getting a Handle on AI Hallucinations

When AI starts to see things, it's time to take immediate corrective action. Here's how to successfully address this challenging -- and potentially dangerous -- phenomenon.

John Edwards, Technology Journalist & Author

November 11, 2024

5 Min Read
Abstract background of cyborg face and technology.Big data and learning machine.Algorithm programming and artificial intelligence concept
Carloscastilla via Alamy Stock Photo

AI hallucination occurs when a large language model (LLM) -- frequently a generative AI chatbot or computer vision tool -- perceives patterns or objects that are nonexistent or imperceptible to human observers, generating outputs that are either inaccurate or nonsensical. 

AI hallucinations can pose a significant challenge, particularly in high-stakes fields where accuracy is crucial, such as the energy industry, life sciences and healthcare, technology, finance, and legal sectors, says Beena Ammanath, head of technology trust and ethics at business advisory firm Deloitte. With generative AI's emergence, the importance of validating outputs has become even more critical for risk mitigation and governance, she states in an email interview. "While AI systems are becoming more advanced, hallucinations can undermine trust and, therefore, limit the widespread adoption of AI technologies." 

Primary Causes 

AI hallucinations are primarily caused by the nature of generative AI and LLMs, which rely on vast amounts of data to generate predictions, Ammanath says. "When the AI model lacks sufficient context, it may attempt to fill in the gaps by creating plausible sounding, but incorrect, information." This can occur due to incomplete training data, bias in the training data, or ambiguous prompts, she notes. 

Related:How to Find and Train Internal AI Talent

LLMs are generally trained for specific tasks, such as predicting the next word in a sequence, observes Swati Rallapalli, a senior machine learning research scientist in the AI division of the Carnegie Mellon University Software Engineering Institute. "These models are trained on terabytes of data from the Internet, which may include uncurated information," she explains in an online interview. "When generating text, the models produce outputs based on the probabilities learned during training, so outputs can be unpredictable and misrepresent facts." 

Detection Approaches 

Depending on the specific application, hallucination metrics tools, such as AlignScore, can be trained to capture any similarity between two text inputs. Yet automated metrics don't always work effectively. "Using multiple metrics together, such as AlignScore, with metrics like BERTScore, may improve the detection," Rallapalli says. 

Another established way to minimize hallucinations is by using retrieval augmented generation (RAG), in which the model references the text from established databases relevant to the output. "There's also research in the area of fine-tuning models on curated datasets for factual correctness," Rallapalli says. 

Related:Defining an AI Governance Policy

Yet even using existing multiple metrics may not fully guarantee hallucination detection. Therefore, further research is needed to develop more effective metrics to detect inaccuracies, Rallapalli says. "For example, comparing multiple AI outputs could detect if there are parts of the output that are inconsistent across different outputs or, in case of summarization, chunking up the summaries could better detect if the different chunks are aligned with facts within the original article." Such methods could help detect hallucinations better, she notes. 

Ammanath believes that detecting AI hallucinations requires a multi-pronged approach. She notes that human oversight, in which AI-generated content is reviewed by experts who can cross-check facts, is sometimes the only reliable way to curb hallucinations. "For example, if using generative AI to write a marketing e-mail, the organization might have a higher tolerance for error, as faults or inaccuracies are likely to be easy to identify and the outcomes are lower stakes for the enterprise," Ammanath explains. Yet when it comes to applications that include mission-critical business decisions, error tolerance must be low. "This makes a 'human-in the-loop', someone who validates model outputs, more important than ever before." 

Related:Preparing for AI-Augmented Software Engineering

Hallucination Training 

The best way to minimize hallucinations is by building your own pre-trained fundamental generative AI model, advises Scott Zoldi, chief analytics officer at analytics software company FICO. He notes, via email, that many organizations are now already using, or planning to use, this approach utilizing focused-domain and task-based models. "By doing so, one can have critical control of the data used in pre-training -- where most hallucinations arise -- and can constrain the use of context augmentation to ensure that such use doesn't increase hallucinations but re-enforces relationships already in the pre-training." 

Outside of building your own focused generative models, one needs to minimize harm created by hallucinations, Zoldi says. "[Enterprise] policy should prioritize a process for how the output of these tools will be used in a business context and then validate everything," he suggests. 

A Final Thought 

To prepare the enterprise for a bold and successful future with generative AI, it's necessary to understand the nature and scale of the risks, as well as the governance tactics that can help mitigate them, Ammanath says. "AI hallucinations help to highlight both the power and limitations of current AI development and deployment." 

About the Author

John Edwards

Technology Journalist & Author

John Edwards is a veteran business technology journalist. His work has appeared in The New York Times, The Washington Post, and numerous business and technology publications, including Computerworld, CFO Magazine, IBM Data Management Magazine, RFID Journal, and Electronic Design. He has also written columns for The Economist's Business Intelligence Unit and PricewaterhouseCoopers' Communications Direct. John has authored several books on business technology topics. His work began appearing online as early as 1983. Throughout the 1980s and 90s, he wrote daily news and feature articles for both the CompuServe and Prodigy online services. His "Behind the Screens" commentaries made him the world's first known professional blogger.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights