xAI

Grok 0 vs Grok 1 vs Grok 1.5: AI Benchmark Comparison

Published

2 years ago

March 28, 2024

AI company, xAI has released its new Grok 1.5 large language model (LLM) and it comes with improved reasoning performance and increased tokens. We can now compare Grok 0 vs Grok 1 and Grok 1.5 based on their AI benchmark test shared by xAI.

The test includes benchmarking on the following platforms:

MMLU – Used to benchmark the AI model’s reasoning and comprehension capability in natural language. It also evaluates a model’s ability to process text data and identify key elements, draw logical inferences, and solve problems based on the language.
MATH – It measures difficulty level and types of problems in solving mathematical problems.
GSM8K – GSM8K refers to a standardized dataset containing 8,000-word problems related to middle school mathematics while focusing on geometric shapes.
HumanEval – This benchmark is a suite of benchmarks used to evaluate different aspects of AI that resemble human cognitive abilities.

Comparison:

Grok-0 scored 65.7 percent (5-shot) on MMLU, 15.7 percent (4-shot) on MATH, 56.8 percent (8-shot) on GSM8k, and 39.7 percent (0-shot) on HumanEval.

Grok-1 scored 73 percent (5-shot) on MMLU, 23.9 percent (4-shot) on MATH, 62.9 percent (8-shot) on GSM8k, and 63.2 percent (0-shot) on HumanEval.

Grok-1.5 scored 81.3 percent (5-shot) on MMLU, 50.6 percent (4-shot) on MATH, 90 percent (8-shot) on GSM8k, and 74.1 percent (0-shot) on HumanEval.

Advancements:

Grok 0 was trained with 33 billion parameters and the recent release of Grok-1 open source code revealed that Grok-1 has 314 billion parameters, which is a massive improvement. However, the parameter count for Grok-1.5 remains unknown for the time being.

Grok-1.5 has improved its reasoning and compression capability by 8.3 percent compared to Grok-1 and 15.7 percent more than Grok-0. Its math solving has optimized by 26.7 percent more than Grok 1 and 34.9 percent more than Grok 0. The new model has 27.1 percent better GSM8K than version 1 and 33.2 percent better than version 0.

Grok will be released soon as early access for X social media subscribers and expand gradually.

(Source)

Related Topics:AI Grok Large Language Model LLM News xAI