Hallucinations in AI-generated medical summaries remain a grave concern

A study by Mendel and UMass Amherst shows different types of hallucinations in AI-summarised medical records and the need for robust detection.

The study found that summaries created by AI models can “generate content that is incorrect or too general according to information in the source clinical notes”. Image Credit: CHIEW / Shutterstock.

Artificial intelligence (AI) startup Mendel and the University of Massachusetts Amherst (UMass Amherst) have jointly published a study detecting hallucinations in AI-generated medical summaries.

The study evaluated medical summaries generated by two large language models (LLMs), GPT-4o and Llama-3. It categorises the hallucinations into five categories based on where they occur in the structure of medical notes – patient information, patient history, symptoms / diagnosis / surgical procedures, medicine-related instructions, and follow-up.

The study found that summaries created by AI models can “generate content that is incorrect or too general according to information in the source clinical notes”, which is called faithfulness hallucination. AI hallucinations are a well-documented phenomenon. Google’s use of AI in its search engine has prompted some absurd responses, such as “eating one small rock per day” and “adding non-toxic glue to pizza to stop it from sticking”. However, in cases of medical summaries these hallucinations can undermine the reliability and accuracy of the medical records.

The pilot study prompted GPT-4o and Llama-3 to create 500-word summaries of 50 detailed medical notes. Research found that GPT-4o had 21 summaries with incorrect information and 50 summaries with generalised information, while Llama-3 had 19 and 47, respectively. The researchers noted that Llama-3 tended to report details “as is” in its summaries whilst GPT-40 made “bold, two-step reasoning statements” that can lead to hallucinations.

The use of AI has been increasing in recent years, GlobalData expects the global revenue for AI platforms across healthcare to reach an estimated $18.8bn by 2027. There have also been calls to integrate AI with electronic health records to support clinical decision-making.

GlobalData is the parent company of Clinical Trials Arena.

GlobalData Strategic Intelligence

US Tariffs are shifting - will you react or anticipate?

Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.

By GlobalData

The UMass Amherst and Mendel study establishes a need for a hallucination detection system to boost reliability and accuracy of the AI-generated summaries. The research found that it took 92 minutes on average for a well-trained clinician to label an AI-generated summary, which can be expensive. To overcome this, the research team employed Mendel’s Hypercube system to detect hallucinations.

It also found that while Hypercube tended to overestimate the number of hallucinations. Furthermore, it detected hallucinations that are otherwise missed by human experts. The research team proposed the use of Hypercube system as “an initial hallucination detection step, which can then be integrated with human expert review to enhance overall detection accuracy”.

Sections

Sections

Sections

Sections

Sections

Hallucinations in AI-generated medical summaries remain a grave concern

Go deeper with GlobalData

Intelligent Treatments (Healthcare) - Thematic Research

Use of Artificial Intelligence in Pharmaceutical Drug Development a...

Data Insights

US Tariffs are shifting - will you react or anticipate?

Intelligent Treatments (Healthcare) - Thematic Research

Use of Artificial Intelligence in Pharmaceutical Drug Development a...

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Intelligent Treatments (Healthcare) - Thematic Research

Use of Artificial Intelligence in Pharmaceutical Drug Development a...

Data Insights

US Tariffs are shifting - will you react or anticipate?

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Intelligent Treatments (Healthcare) - Thematic Research

Use of Artificial Intelligence in Pharmaceutical Drug Development a...

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing