OpenAI is transferring to publish the outcomes of its inner AI mannequin security evaluations extra recurrently in what the outfit is pitching as an effort to extend transparency.
On Wednesday, OpenAI launched the Security Evaluations Hub, a webpage exhibiting how the corporateās fashions rating on varied checks for dangerous content material technology, jailbreaks, and hallucinations. OpenAI says that itāll use the hub to share metrics on an āongoing foundation,ā and that it intends to replace the hub with āmain mannequin updatesā going ahead.
āBecause the science of AI analysis evolves, we purpose to share our progress on growing extra scalable methods to measure mannequin functionality and security,ā wrote OpenAI in a weblog put up. āBy sharing a subset of our security analysis outcomes right here, we hope this is not going to solely make it simpler to grasp the security efficiency of OpenAI methods over time, but additionally assist neighborhood effortsā to extend transparency throughout the sphere.ā
OpenAI says that it might add extra evaluations to the hub over time.
In latest months, OpenAI has raised the ire of some ethicists forĀ reportedlyĀ speeding the security testing of sure flagship fashions andĀ failing to launch technical stories for others. The corporateās CEO, Sam Altman, additionallyĀ stands accusedĀ of deceptive OpenAI executives about mannequin security critiques previous to hisĀ transient ousterĀ in November 2023.
Late final month, OpenAI was compelled to roll again an replace to the default mannequin powering ChatGPT, GPT-4o, after customers started reporting that it responded in a very validating and agreeable means.Ā X turned flooded with screenshots of ChatGPT applauding all types of problematic,Ā harmfulĀ selectionsĀ andĀ concepts.
OpenAI mentioned that it might implement a number of fixes and adjustments to forestall future such incidents, together withĀ introducing an opt-in āalpha partā for some fashions that will enable sure ChatGPT customers to check the fashions and provides suggestions earlier than launch.