Large language models (LLMs) have, perhaps more than anything else within the field of artificial intelligence, embedded AI in the public psyche, opening people’s eyes to both its risks and rewards.
Their appeal is rooted in their extraordinary capabilities and the futuristic narrative that surrounds them. Though the record has now been surpassed by Meta's Threads rollout, ChatGPT became the fastest-growing app ever when it reached 100million users in the two months following its launch in November 2022. New users, however, swiftly realised a phenomenon that has been a topic of discussion in the AI community for some time – LLMs hallucinate.
Examples of LLM hallucinations include answering confidently when asked for the weather forecast in a fictional city or providing fabricated references in an academic context. But while “hallucination” has emerged as a blanket term for all LLM inaccuracies, the situation is more nuanced in reality. There are, in fact, different types of hallucination, with specific characteristics separating them from one another.
In this article, we outline the four primary categories of hallucination and explore how each contributes to the unique, and sometimes puzzling, behavior of these advanced AI systems.
When AI models generate information that is neither present in nor correctly inferred from their training data, that is classified as a hallucination. This occurs due to the probabilistic nature of these models, which are trained to optimise the likelihood of their output based on input, often leading to plausible-sounding but incorrect or nonsensical outputs.
Engineers are actively taking steps towards preventing LLM hallucinations. The practice of hallucination detection – across all forms of generative AI, but particularly LLMs – identifies when machine learning models generate false or unsupported information. It aims to improve AI reliability by identifying and correcting such instances.
It has been suggested that the term ‘hallucination’ is an inaccurate metaphor for the AI training and response process. For example, in their paper 'False Responses From Artificial Intelligence Models Are Not Hallucinations', Professor Søren Dinesen Østergaard and Kristoffer Nielbo propose that "false analogy", "hasty generalizations", "false responses", or "misleading responses" are in fact more appropriate. Criticism of the term 'hallucination' in a machine learning context has also stemmed from those who believe that it reinforces stigma around neurological or mental illnesses, which is particularly pertinent given AI's links to the field of medicine and psychiatry.
However, for the purposes of this article, we will describe the different types of model inaccuracies which are, in the field of AI and now more broadly across the general public, described as 'hallucinations'.
Dialogue history-based hallucinations occur when an LLM mixes up names or relations of entities. For example, if the user mentions that their friend John likes hiking, and later says their uncle Mark is coming to visit, the AI might incorrectly link John and Mark together as the same person due to faulty recall. Furthermore, during a conversation, an LLM can create new incorrect inferences based on previous errors within the dialogue history, further distorting the context and content of a conversation in a snowball effect.
It is important to remember what causes hallucinations in LLMs. These mistakes often occur in dialogue because LLMs rely on pattern recognition and statistics. Without a grounding in common sense or factual knowledge, LLMS can get lost and generate hallucinations.
An abstractive summarisation system is a model commonly used in LLMs to generate summaries of textual information, often for the purposes of making a piece of text more coherent and comprehensible.
Despite their usefulness in condensing information, abstractive summarisation systems can be prone to errors or semantic transformations between the original and generated data, triggering a hallucination in an LLM’s output.
Again, this is because they lack true comprehension of the source text, instead relying on pattern recognition and statistics. They may, as a result, distort or even entirely fabricate details, inferring unsupported causal relationships or retrieving unrelated background knowledge.
This type of hallucination occurs when an LLM makes an erroneous inference from its source information and arrives at an incorrect answer to a user question. This can happen even when relevant source material is provided.
For example, if a user asks, "Which private research university is located in Chestnut Hill, Massachusetts - Boston College or Stanford University?" and context is provided stating Boston College is located in Chestnut Hill, an LLM may still incorrectly respond "Stanford University" due to its own prior knowledge about Stanford being a top private research university. Rather than accurately recalling the pre-existing source information, the model ignores the evidence and makes an unjustified inference based on its existing knowledge.
In the context of large language models, a general data generation hallucination refers to a situation where the model generates outputs that may appear plausible or coherent but are not supported by factual or reliable information. It is a type of error where the model fabricates details or makes assumptions that go beyond the input data it has been trained on. This can result in the generation of false or misleading information that may seem convincing to humans but lacks a proper factual basis. Unlike other types of hallucination, the root cause of a general data hallucination is an overextension beyond training data rather than an incorrect inference or lack of grounding. The mode essentially imagines new information that isn't warranted by its training data.
Holistic AI are global leaders in AI governance, risk management, and regulatory compliance. We are dedicated to ensuring that organisations are able to navigate AI's complex terrain and harness the power of technologies like LLMs with confidence and efficiency.
Schedule a call with our expert team to discover how our innovative solutions and AI risk management platform can help.
Written by Adam Williams, Content Writer at Holistic AI.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Subscribe to our newsletter!
Join our mailing list to receive the latest news and updates.