OpenAI has released a new paper outlining some advancements it has made in eliminating the common problem of hallucinations where AI just makes stuff up. The paper outlines two models called outcome supervision and process supervision to weed out hallucinations and how they perform.
With outcome supervision, OpenAI trains reward models to provide feedback on the final result the AI gives. With process supervision, the reward model provides feedback at every step of the way, creating a human-like chain of thought.
In its research paper, OpenAI tested both models on a math dataset and found that the process supervision method led to “significantly better performance”. It’s important to note that the process supervision method has only been tested in the area of mathematics so far and that it will take more work to see how it performs more generally.
Explaining the possible outcomes of the process supervision method, OpenAI said:
“If these results generalize, we may find that process supervision gives us the best of both worlds – a method that is both more performant and more aligned than outcome supervision.”
It’s still too early to say how much this step-by-step verification will help to address hallucinations more generally, but hopefully, it will because hallucinations are probably the number one issue with LLMs right now. Just this week, a lawyer that had used ChatGPT for his work and submitted false information detailing fake cases that the AI had dreamt up.
OpenAI has not given a timeline for how long it will take to implement process supervision in ChatGPT which is available to the public. It’s still in the research phase and needs to be tested on general information.
While initial results are good, OpenAI does mention that safer methods can incur reduced performance called an alignment tax. The results show so far that process supervision doesn’t incur this tax while working on math problems but we don’t know what will happen on more general information.