GDPval: Measuring Real-World AI Performance

OpenAI introduces GDPval, a new evaluation measuring AI model performance on economically valuable, real-world tasks across 44 occupations, aiming to transparently communicate progress on AI's real-world applications.

Bridging the Gap Between Benchmarks and Reality

The Problem

Traditional Artificial Intelligence (AI) evaluations, like academic tests, are crucial for advancing model reasoning but often fail to capture the complexity of tasks people perform in their daily work. This creates a disconnect between lab performance and real-world utility.

The Solution: GDPval

We introduce GDPval, an evaluation that measures model performance on tasks drawn directly from the knowledge work of experienced professionals across a wide range of economically significant occupations, providing a clearer picture of real-world capabilities.

Example of a GDPval task for a Manufacturing Engineer

Frequently Asked Questions