root@moksha:~/intexp/ai-iq|← back
$ cat ai_iq.md

> AI_IQ: UNDERSTANDING_THE_PERCENTILES

IQ to Percentile Conversion

Here's how IQ scores translate to percentiles, giving you a sense of where an AI system stands relative to human performance:

IQ ScorePercentileDescription
702ndSignificantly below average
8516thBelow average
10050thAverage
11584thAbove average
13098thHighly gifted
14599.9thExceptionally gifted
16099.99thProfoundly gifted

Remember: When an AI achieves "130 IQ," it means it outperformed 98% of humans on that specific test. The same AI might perform at 70 IQ on a different type of reasoning task.

Frontier AI IQ — June 2026

aiiq.org maps 12 hard benchmarks (FrontierMath, ARC-AGI-2, GPQA Diamond, SWE-bench, Humanity's Last Exam...) onto the human IQ scale. The frontier as of June 10, 2026:

ModelEst. IQHuman percentile
GPT-5.5~130–13698th–99th
Claude Opus 4.8~13098th
Gemini 3.1 Pro~12796th
Claude Opus 4.7~12696th
Kimi K2.6~12293rd
GLM-5.1~12192nd
Qwen 3.7 Max~11888th
Grok 4.3~11787th
  • • One year ago the frontier was o3 at ~112. That's ≈ +1.3 IQ points per month, measured.
  • • On novel, offline tests scores drop 20–40 points — public-test results partly reflect memorization.
  • • The jagged truth: the same models solving Erdős problems still fumble tasks a child finds trivial. IQ is one lens, not the territory.
  • • Live trackers: aiiq.org, trackingai.org, metr.org

The "Highest Human IQ" Question

  • • Real IQ tests stop measuring around 160 (99.99th percentile) — beyond that there aren't enough humans to norm the test against.
  • • Terence Tao's famous "220–230 IQ" is a ratio-IQ extrapolation from scoring 760 on the math SAT at age 8 — not a measured deviation IQ. Tao himself calls the figure noise and has claimed only "greater than 175."
  • • Marilyn vos Savant's record 228 was a childhood ratio score; her adult deviation-test result was ≈186. Guinness retired the "Highest IQ" category in 1990 as statistically unreliable.
  • • So when honest AI scores reach ~160, "AI IQ" stops being a number at all. That's why the IntExp chart marks everything above 160 as off the human scale — from there on, you need different rulers: task horizons, percentile-of-experts, things no human can do at any speed.

AI IQ Benchmarks & Progress

Click on any image below to view it fullscreen and explore the latest AI capability assessments:

GPQA Diamond Benchmark Progress
Click to Enlarge

GPQA Diamond Benchmark Progress

AI performance on graduate-level scientific questions (July 2023 to May 2025)

PhD-Level Scientific Reasoning
Click to Enlarge

PhD-Level Scientific Reasoning

Benchmarking AI on doctoral-level scientific problems

AI IQ on Codeforces
Click to Enlarge

AI IQ on Codeforces

Competitive programming performance showing AI reaching exceptional problem-solving levels

AI IQ Progress Tracking (11 Months)
Click to Enlarge

AI IQ Progress Tracking (11 Months)

MaximumTruth's tracking of AI IQ improvements over time - offline testing, not MENSA certified