Google DeepMind Develops SAFE, an AI Fact-Checker, to Validate LLM Outputs with 72% Accuracy


Published on:

Google DeepMind has introduced a new AI system named SAFE (Search-Augmented Factuality Evaluator), aimed at enhancing the accuracy of content generated by Large Language Models (LLMs) such as ChatGPT, as described in a paper published on the arXiv.

LLMs are known for their ability to produce text, answer questions, and solve mathematical problems efficiently. However, their accuracy has been a concern, prompting the need for manual verification. The introduction of SAFE by Google DeepMind is a significant step toward addressing these accuracy issues in LLM-generated content.

This system works by breaking down the statements or facts presented by an LLM, then using Google Search to find credible sources that can confirm or refute these facts. The SAFE system then assesses the consistency between the LLM’s output and the search results to determine factuality. This method emulates the way humans use search engines to check facts, enhancing the reliability of the information produced by LLMs.

In their testing phase, the DeepMind team evaluated SAFE on around 16,000 statements from various LLMs. The results showed that SAFE’s fact-checking aligned with human evaluators 72% of the time, demonstrating its potential as a reliable fact-checking tool. Remarkably, SAFE outperformed human fact-checkers in 76% of cases where there were discrepancies, highlighting its superior accuracy in certain situations.

The researchers also noted that SAFE offers a cost-effective and efficient alternative to human fact-checking, with a performance improvement of over 20 times. Their analysis across thirteen different language models revealed that larger models tend to perform better in factuality assessments.

By automating the fact-checking process, SAFE promises to reduce the manual labor involved in verifying the accuracy of AI-generated information. DeepMind has made SAFE’s code available on GitHub, encouraging further development and application of this tool within the AI community. This move allows other developers and researchers to explore and enhance SAFE’s capabilities.

Related Posts:

Leave a Reply

Please enter your comment!
Please enter your name here