Benchmark Modpack - Search News

'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?

On Thursday, Scale AI and the Center for AI Safety (CAIS) released Humanity's Last Exam (HLE), a new academic benchmark aiming to "test the limits of AI knowledge at the frontiers of human expertise," ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results

Feedback

'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?

Trending now