Connect with us


OpenAI open sourcing new GPT-4 Turbo evals



OpenAI GPT-4 Turbo Evals

OpenAI today announced that it is open-sourcing a GitHub repository to run popular evals on various models including the new GPT-4 Turbo.

The company has improved writing, math, logical reasoning, and coding capabilities with the new GPT-4 Turbo. The model comes with responses that are more direct and less verbose. The responses will have more conversational language compared to the predecessor.

OpenAI GPT-4 Turbo Evals

OpenAI GPT-4 Turbo (Image Credit: OpenAI)

The repository on Github contains a library of evaluating language models. These now include:

  • MMLU: Measuring Massive Multitask Language Understanding
  • MATH: Measuring Mathematical Problem Solving With the MATH Dataset
  • GPQA: A Graduate-Level Google-Proof Q&A Benchmark,
  • DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
  • MGSM: Multilingual Grade School Math Benchmark (MGSM), Language Models are Multilingual Chain-of-Thought Reasoners
  • HumanEval: Evaluating Large Language Models Trained on Code
  • MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Evals are sensitive to prompting and there’s a variation in the formulations used in recent publications and libraries. These approaches are carryovers from evaluating base models and from models that were worse at following instructions.


Sophia says technology is raising the bar of human living and she is actively trying to promote awareness among people about the latest changes in social media platforms. Social media has the power to make many positive impacts and she is continuously sharing the latest updates with fellow readers. In some spare time, she likes to tag along with friends for a walk.

Continue Reading