Calibrated probabilistic model ยท United 2026

World Cup 2026
Prediction Engine

Honest, well-calibrated match probabilities for all 48 teams โ€” built on team strength ratings, a Dixon-Coles scoreline model, and a 10,000-run Monte Carlo tournament simulation. No false certainty.

๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‡จ๐Ÿ‡ฆ ๐Ÿ‡ฒ๐Ÿ‡ฝ Hosts 48 teams ยท 12 groups 104 matches Simulatingโ€ฆ
โš ๏ธ Read this first โ€” why no model can promise 99.99% accuracy โ–พ

Football is high-variance by design. The best forecasting systems on earth โ€” bookmakers, Opta's supercomputer, FiveThirtyEight's old model โ€” land around 50โ€“55% accuracy on win/draw/loss. The single most likely exact score is correct only about 10โ€“12% of the time. The World Cup is exactly where favorites fall (Saudi Arabia beat Argentina in 2022; Germany went out in the 2018 group stage).

So this tool does the honest thing: it outputs probabilities and confidence, not fake certainty. A "60% win" means the favorite still loses 40% of the time โ€” and that's the truth, not a bug.

Factors this model uses

  • Team strength ratings (World Football Elo)
  • Home advantage for host nations
  • Goal expectation โ†’ full scoreline distribution
  • Draw inflation (Dixon-Coles correction)
  • Monte Carlo tournament paths

Factors deliberately excluded

  • "Coach mind-reading" / tactical-IQ scores
  • Social-media sentiment & harmony scores
  • Deep H2H beyond what strength captures
  • Weather micro-effects
  • Any "99.99% confidence" claim

These excluded factors are either unmeasurable, statistically noisy, or already baked into team strength. Adding them doesn't improve accuracy โ€” it just adds false precision.

๐Ÿ“ How the 48โ†’32 format works โ€” and the 104-match breakdown โ–พ

Why 32 in the knockouts, not 24? A single-elimination bracket must halve cleanly every round, so the team count has to be a power of two. 32 works perfectly: 32 โ†’ 16 โ†’ 8 โ†’ 4 โ†’ 2 โ†’ 1. 24 does not (24 โ†’ 12 โ†’ 6 โ€ฆ), which would force unfair byes.

Why groups of 4, not 3? FIFA first planned 16 groups of 3 (top 2 โ†’ a clean Round of 32). But a group of 3 ends on a two-team final matchday โ€” inviting collusion (the 1982 "Disgrace of Gijรณn") โ€” and gives each team only two guaranteed games. So the format became 12 groups of 4.

Reaching 32: top 2 from each of the 12 groups = 24, plus the 8 best third-placed teams = a perfect 32.

StageMatches
Group stage (12 groups ร— 6)72
Round of 3216
Round of 168
Quarter-finals4
Semi-finals2
Third-place play-off1
Final1
Total โ€” 2026104

That's +40 matches vs 2022 (which had 64 = 48 group + 16 knockout), a 62.5% jump. It also adds +24 over the abandoned 16ร—3 plan (80 matches) โ€” all in the group stage. The champion now plays up to 8 games (3 group + 5 knockout), one more than in 2022.

How the 8 best third-placed teams are ranked. The twelve group-third teams are compared directly (no head-to-head, since they're in different groups), in this order:

  1. Most points
  2. Best goal difference
  3. Most goals scored
  4. Fewest disciplinary points (fair play)
  5. Drawing of lots by FIFA

This engine ranks thirds by points โ†’ goal difference โ†’ goals scored, then breaks any remaining ties at random โ€” a fair stand-in for the fair-play and drawing-of-lots steps, which can't be modelled before the tournament. The exact Round-of-32 slot each qualifying third fills follows a FIFA lookup table; this model uses a randomised, winner-protected bracket instead (see footer).

Step 1

Model controls & what-if

Adjust the global assumptions, then re-run the simulation. Every prediction below updates from these inputs โ€” drop a team's rating to simulate a key injury, or boost home advantage to test host energy.

Apply temperature calibration fixes overconfidence
Tones down over-confident favourites using T = 1.845, learned from real 2018 + 2022 results. โ“˜ how it worksWe found the model was too sure about huge favorites. Temperature scaling tones down extreme predictions so that when the model says 80%, the event actually happens about 80% of the time. It's a statistical fix learned from real past World Cup results.
Tournament outlook

Who wins it all?

Probability of winning the trophy and reaching each stage, from the full Monte Carlo run. Notice how flat the top is โ€” that's the real shape of World Cup uncertainty, not a single "lock."

Win title Reach final
๐Ÿ“‹ Groups & all 72 group matches
๐Ÿ”ฌ Match Lab (any matchup)

Every group with simulated qualification odds, plus all six match predictions per group. Click any match to see the three most likely scorelines and a plain-English breakdown.

Head-to-head simulator

Pick any two teams to model a hypothetical knockout-style matchup with current ratings.

vs
The new rule, live

The third-place race

Eight of the twelve third-placed teams survive to the Round of 32. These are the sides most likely to grab one of those eight lifelines โ€” finishing 3rd in their group and ranking among the best thirds (points โ†’ goal difference โ†’ goals scored). A high number usually means a solid team drawn into a tough group.

Advances as a best third
The evidence

Is this model actually any good?

Don't take "trust me" for an answer. Here we run this exact model against two real past World Cups โ€” 2018 & 2022 โ€” using each team's strength rating from the eve of that tournament, and score every group-stage prediction with the metrics professional forecasters actually use. Fixtures and results are real; ratings are point-in-time approximations. These charts respond to the controls above โ€” tune the model, or flip the temperature-calibration toggle, and watch its history change.

Both tournaments ยท 96 matches
2018 only ยท 48
2022 only ยท 48

Reliability โ€” are the probabilities honest?

When the model says "30%," does it happen ~30% of the time? Dots on the green diagonal = perfectly calibrated; dot size = number of forecasts. Faded grey = raw model, solid blue = calibrated โ€” watch the top-right (over-confident) dots pull toward the line.

Sharpness โ€” does it make bold calls?

Distribution of the model's top probability per match. Bars near 33% = timid; bars to the right = confident distinctions. Sharp + calibrated is the goal.

Where the model got burned โ€” calls it rated >60% that still lost