OpenAI

OpenAI open sourcing new GPT-4 Turbo evals

Published

1 year ago

April 11, 2024

OpenAI today announced that it is open-sourcing a GitHub repository to run popular evals on various models including the new GPT-4 Turbo.

The company has improved writing, math, logical reasoning, and coding capabilities with the new GPT-4 Turbo. The model comes with responses that are more direct and less verbose. The responses will have more conversational language compared to the predecessor.

OpenAI GPT-4 Turbo (Image Credit: OpenAI)

The repository on Github contains a library of evaluating language models. These now include:

MMLU: Measuring Massive Multitask Language Understanding
MATH: Measuring Mathematical Problem Solving With the MATH Dataset
GPQA: A Graduate-Level Google-Proof Q&A Benchmark,
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
MGSM: Multilingual Grade School Math Benchmark (MGSM), Language Models are Multilingual Chain-of-Thought Reasoners
HumanEval: Evaluating Large Language Models Trained on Code
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Evals are sensitive to prompting and there’s a variation in the formulations used in recent publications and libraries. These approaches are carryovers from evaluating base models and from models that were worse at following instructions.

For example, when writing with ChatGPT, responses will be more direct, less verbose, and use more conversational language. pic.twitter.com/PHxrmCtpyl

— OpenAI (@OpenAI) April 12, 2024

Related Topics:ChatGPT Github News OpenAI