ARTICLE AD
Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company’s approach to AI safety, has joined OpenAI rival Anthropic to lead a new “superalignment” team.
In a post on X, Leike said that his team will focus on various aspects of AI safety and security, specifically “scalable oversight,” “weak-to-strong generalization” and automated alignment research.
A source familiar with the matter tells TechCrunch that Leike will report directly to Jared Kaplan, Anthropic’s chief science officer, and that Anthropic researchers currently working on scalable oversight — techniques to control large-scale AI’s behavior in predictable and desirable ways — will move to report to Leike as Leike’s team spins up.
I'm excited to join @AnthropicAI to continue the superalignment mission!
My new team will work on scalable oversight, weak-to-strong generalization, and automated alignment research.
If you're interested in joining, my dms are open.
In many ways, Leike’s team at Anthropic sounds similar in mission to OpenAI’s recently-dissolved Superalignment team. The Superalignment team — which Leike co-led — had the ambitious goal of solving the core technical challenges of controlling superintelligent AI in the next four years.
Anthropic has often attempted to position itself as more safety-focused than OpenAI.
Anthropic’s CEO, Dario Amodei, was the former VP of research at OpenAI, and reportedly split with OpenAI after a disagreement over the company’s direction — namely OpenAI’s increasingly commercial focus. Amodei brought with him a number of OpenAI employees including OpenAI’s former policy lead Jack Clark.