Get oriented (and read the one rule)
When the page loads, the engine has already simulated the whole tournament for you โ there's nothing to install or press to begin. The very first thing you'll see is an honesty banner. Read it once. It explains the single most important idea in the whole tool:
Football is gloriously unpredictable. The best forecasters on earth get the result right only about 50โ55% of the time, and the most likely exact score lands only ~10โ12% of the time. So this engine gives you well-calibrated probabilities, never false certainty.
See who's likely to win it all
Scroll to "Who wins it all?" Each row is a team, with two bars:
- Solid blue bar + the % on the right โ chance of winning the trophy.
- Faint bar underneath โ chance of reaching the final.
Notice how even the favourite sits around 20%, and the chances taper off gently rather than dropping off a cliff. That flat top is the real shape of World Cup uncertainty โ the best team in the world is still more likely not to win than to win.
Dive into the groups & match predictions
The Groups tab shows all 12 groups. Each card has two parts: a mini-table and the six match predictions for that group.
- Qual = chance of reaching the knockouts. Win = chance of finishing top of the group. (Colours: green = likely through, amber = on the bubble, grey = long shot.)
- Every match has a three-colour bar: green = left team wins ยท grey = draw ยท red = right team wins.
- Click any match to expand it: you'll see the three most likely scorelines and a one-line "Model read" explaining the edge.
Run your own what-ifs
The Model controls & what-if panel lets you bend the model's assumptions and watch everything update. The sliders are plain-English knobs:
- Host home advantage โ how big a boost the USA, Canada & Mexico get at home.
- Avg goals / match โ the tournament's overall scoring level.
- Strength sensitivity โ how much a rating gap translates into a result. Lower = bigger teams dominate more.
- Draw tendency โ nudges how often matches end level.
- Temperature (T) โ the calibration dial (see Step 6).
- Edit team ratings โ the most fun button: drop a team's rating to simulate a key injury, or boost one to test a hot streak.
Reset defaults puts everything back.Pit any two teams in the Match Lab
Switch to the Match Lab tab to model any head-to-head โ even teams in different groups. Pick Team A and Team B, choose a neutral venue or give one side home advantage, and you get a full breakdown:
Check the evidence (and tune the calibration)
This is what sets the engine apart: it doesn't just say "trust me." Scroll to "Is this model actually any good?" The engine replays its exact model over the real 2018 & 2022 World Cup group stages (96 matches) and scores itself. A few terms, in plain English:
- Accuracy โ how often the top pick was right (~57%).
- Log-loss / RPS / Brier โ quality scores that punish confident mistakes. Lower is better.
- Reliability diagram โ the honesty test: when the model says "30%", does it happen ~30% of the time? Dots should sit on the diagonal line.
- Sharpness โ whether it makes bold calls or hedges everything near 33%.
The headline finding is the Temperature (T) calibration. The raw model was over-confident on big favourites (it was 92% sure Brazil would beat Cameroon in 2022 โ Brazil lost). Temperature scaling gently tones those extremes down. Here's the before/after:
The Temperature (T) slider in Model Controls lets you explore this yourself: slide it down for bolder, sharper calls; up for safer, smoother ones. The default 1.845 is the cross-validated sweet spot, and the green box shows the quality score updating live as you drag.
The one rule: how to read a probability
A win probability is a frequency, not a verdict. "Spain 72% to win the group" means: if this group were played 100 times, Spain top it about 72 of them โ and miss out the other 28.
So even in a match Spain are favoured to win, there's a 44% chance they don't (draw + loss). The engine is being honest about that โ and that honesty is the whole point.
Controls cheat-sheet
Every knob, what it does, its default, and a quick experiment.
| Control | What it does | Default | Try |
|---|---|---|---|
| Host home advantage | Boost for USA / Canada / Mexico at home | +70 | Set 0 to neutralise home edge |
| Avg goals / match | Overall scoring level of the tournament | 2.6 | Raise it for a higher-scoring event |
| Strength sensitivity | How much a rating gap decides the result | 180 | Lower = favourites dominate more |
| Draw tendency (ฯ) | Nudges how often matches end level | โ0.08 | Push toward 0 for fewer draws |
| Temperature (T) | Calibration dial (confidence of probabilities) | 1.845 | Down = bolder, up = safer |
| Simulations | How many tournaments are played out | 10,000 | More = smoother title odds |
| Apply calibration | Master on/off for temperature scaling | ON | Toggle off to see the raw model |
| Edit team ratings | Hand-edit any team's strength | โ | Simulate an injury or hot streak |