OpenAI has recently published a paper highlighting its advancements in tackling a common issue known as “hallucinations.” Hallucinations occur when an AI system generates fabricated information, often leading to inaccurate or misleading outputs.
In an effort to combat this problem, OpenAI has developed two models, namely outcome supervision and process supervision, which aim to eliminate hallucinations and improve the reliability of AI-generated content.
The outcome supervision model involves training a reward model that provides feedback on the final results produced by the AI system. OpenAI tested both models using a math dataset and found that the process supervision approach demonstrated “significantly superior performance” compared to outcome supervision. However, it is important to note that process monitoring has primarily been evaluated in the field of mathematics, and its broader applicability requires further investigation and research.
OpenAI envisions process monitoring as a potential solution that combines the advantages of both outcome and process supervision. The organization suggests that if the observed results are generalized, process monitoring could offer a more effective and consistent means of addressing hallucinations.
While these step-by-step testing methodologies hold promise in combating hallucinations, it is crucial to acknowledge the need for further evaluation in diverse domains. The issue of hallucinations gained attention recently when a lawyer, relying on ChatGPT for his work, encountered a fabricated precedent, exposing the severity of the problem. Consequently, there is considerable anticipation that OpenAI’s research will pave the way for mitigating this critical issue, which is currently a top concern in the field.
OpenAI has not provided a specific timeline for when process monitoring will be implemented in the publicly available ChatGPT model. The research is still in its early stages and necessitates additional testing and refinement before it can be applied more broadly.
While initial results are promising, OpenAI cautions that a safer approach may come with a performance penalty referred to as an “alignment tax.” So far, the evaluation conducted on math problems has shown that process monitoring does not incur this tax. However, it remains to be seen how it will perform with more generalized information.
In support of further research on this topic, OpenAI has made its PRM800K dataset available on GitHub. This comprehensive dataset comprises 800,000 step-level human feedback labels, which can aid researchers in developing and refining techniques related to AI supervision and monitoring.