How the F3K Pilot Rating System Works
The Idea
Every F3K pilot has a single number — their rating — that reflects how strong they are right now. Win against tough opponents, your rating climbs. Lose to weaker pilots, it drops. Stop competing for a while, it gradually fades.
The system borrows from chess ELO but adapts it for F3K's unique structure: multiple pilots competing simultaneously in a flight group, where seconds of flight time determine who outperformed whom.
The Match Unit: Flight Groups
In F3K competitions, pilots are assigned to flight groups. Everyone in a group flies the same task, at the same time, under the same weather conditions. This is what makes the rating system fair — we never compare pilots who flew in different conditions.
Each group becomes one "match" for the rating system. A typical competition has 7–14 rounds with 2–4 groups each, so a weekend event generates 20–50 rating events per pilot.
How a Single Group Updates Ratings
When a group finishes flying, the system compares every pilot against every other pilot in that group. For a group of 8 pilots, that's 28 pairwise matchups per pilot.
Worked Example
Four pilots compete in a Task B (Best 2 flights, 4 min max, 10 min working time):
| Pilot | Flight Times | Total (seconds) | Rating Before |
|---|---|---|---|
| Alice | 3:58 + 3:55 | 473 | 1650 |
| Bob | 3:52 + 3:47 | 459 | 1580 |
| Carol | 3:41 + 3:38 | 439 | 1520 |
| Dave | 3:10 + 2:55 | 365 | 1490 |
For each pair, the system computes two things:
1. Expected Score — what should happen based on current ratings:
E = 1 / (1 + 10^((opponent_rating - your_rating) / 400))
Alice vs Bob: E = 1 / (1 + 10^((1580-1650)/400)) = 0.60 Alice vs Carol: E = 1 / (1 + 10^((1520-1650)/400)) = 0.68 Alice vs Dave: E = 1 / (1 + 10^((1490-1650)/400)) = 0.72
Alice is expected to beat all three — she's the highest rated.
2. Actual Score — what did happen, with margin of victory:
S = 1 / (1 + e^(-(your_points - their_points) / 50))
Alice vs Bob (473 - 459 = +14): S = 0.57 (slight win) Alice vs Carol (473 - 439 = +34): S = 0.66 (solid win) Alice vs Dave (473 - 365 = +108): S = 0.90 (dominant win)
3. Rating Change:
delta = K × (S_total - E_total) / sqrt(N - 1)
Alice's totals: S_total = 0.57 + 0.66 + 0.90 = 2.13, E_total = 0.60 + 0.68 + 0.72 = 2.00
delta = 24 × (2.13 - 2.00) / sqrt(3) = 24 × 0.13 / 1.73 = +1.8
Alice performed slightly better than expected — she gains 1.8 points. Her rating moves from 1650 to 1651.8.
What about Dave?
Dave lost all three matchups, but the system considers how much he was expected to lose:
Dave vs Alice: E = 0.28, S = 0.10 Dave vs Bob: E = 0.37, S = 0.13 Dave vs Carol: E = 0.46, S = 0.19
S_total = 0.42, E_total = 1.11
delta = 24 × (0.42 - 1.11) / 1.73 = -9.6
Dave loses 9.6 points. He was expected to lose, but he lost by larger margins than his rating predicted.
Margin of Victory Matters
The system doesn't treat a win as just a win. Beating someone by 200 seconds counts more than beating them by 5 seconds.
The margin sigmoid converts point differences into scores between 0 and 1:
| Point Difference | Actual Score | Meaning |
|---|---|---|
| +200 | 0.98 | Dominant win |
| +100 | 0.88 | Strong win |
| +50 | 0.73 | Clear win |
| +25 | 0.62 | Moderate edge |
| +10 | 0.55 | Slight edge |
| 0 | 0.50 | Draw |
| -50 | 0.27 | Clear loss |
| -200 | 0.02 | Dominant loss |
This means maxing out a flight (landing exactly at the target time) matters — every second counts toward your rating.
Beating Stronger Pilots Rewards You More
The expected score is based on the rating gap between you and your opponent. When you beat someone rated much higher than you, your expected score was low — so the surprise factor is large.
Example: The Upset
Scenario: A 1500-rated pilot beats a 1700-rated pilot by 30 seconds.
Expected score against the 1700 pilot: E = 1 / (1 + 10^((1700-1500)/400)) = 0.24 Actual score (30-second margin): S = 1 / (1 + e^(-30/50)) = 0.65
Surprise = S - E = 0.65 - 0.24 = +0.41
Scenario: The same 1500-rated pilot beats a 1480-rated pilot by 30 seconds.
Expected score: E = 1 / (1 + 10^((1480-1500)/400)) = 0.53 Actual score: S = 0.65 (same margin, same score)
Surprise = 0.65 - 0.53 = +0.12
The same flight performance (winning by 30 seconds) generates 3.4x more rating gain when it's against a pilot rated 200 points above you versus one rated 20 points below you.
This is why competing in strong groups at major events is the fastest path to a high rating — and why dominating weak local fields produces only modest gains.
K-Factor: How Fast Ratings Move
New pilots need their ratings to adjust quickly to find their true level. Established pilots should have more stable ratings. The K-factor controls this:
| Experience Level | Groups Played | K-Factor | Character |
|---|---|---|---|
| Provisional | 0–15 | 48 | Rating moves fast — finding your level |
| Intermediate | 16–50 | 36 | Settling in |
| Established | 51+ | 24 | Stable — only big performances shift it |
A provisional pilot gaining the same "surprise" as an established pilot will move their rating twice as fast (48/24 = 2x).
What this looks like in practice
A new pilot enters their first competition rated 1500. In a group of 8 with average rating 1550, they score well — finishing 2nd. Their K-factor of 48 might give them +15 points per round. After a 10-round competition, they could jump from 1500 to 1620.
An established pilot (200+ groups) in the same situation with K=24 would gain about +7 per round — moving from 1550 to 1620 over the same event.
Group Size Normalization
Groups range from 3 to 16 pilots. Without normalization, a 16-pilot group would move ratings far more than a 3-pilot group (because there are more pairwise comparisons). The system divides by sqrt(N-1) to keep the impact comparable:
| Group Size | Normalization Factor | Effect |
|---|---|---|
| 3 pilots | sqrt(2) = 1.41 | Moderate per-group impact |
| 6 pilots | sqrt(5) = 2.24 | Slightly diluted per matchup |
| 10 pilots | sqrt(9) = 3.00 | Many matchups, each counts less |
| 16 pilots | sqrt(15) = 3.87 | Lots of information, gentle per-pair |
Larger groups provide more statistical signal but each individual matchup within them has proportionally less weight.
Inactivity Decay
A rating should reflect current ability, not historical peak. If a pilot stops competing, their rating gradually decreases:
- Grace period: 6 months of inactivity before any decay begins
- Decay rate: 3 points per month after the grace period
- Floor: Rating cannot fall below 50% of the distance between 1500 and your peak
Decay Example
A pilot with a peak rating of 1800 stops competing:
Floor calculation:
floor = 1500 + (peak - 1500) × 0.5
= 1500 + (1800 - 1500) × 0.5
= 1650
After 6 months: no change (grace period) After 12 months: 1800 - (6 × 3) = 1782 After 18 months: 1800 - (12 × 3) = 1764 After 24 months: 1800 - (18 × 3) = 1746 After 56 months: would calculate to 1650 — hits floor, stops
The floor ensures that a pilot who was once world-class doesn't decay all the way to beginner level. Their history is acknowledged — but they still need to prove current form by competing.
Decay is Real, Not Cosmetic
When a pilot returns after a break, their rating has already been reduced. They re-enter the system at their decayed rating and must earn their way back up through results. This ensures the leaderboard always reflects who is performing well now.
Ranked vs Unranked
To appear as a ranked pilot on the leaderboard, you need:
- At least 30 group appearances (roughly 3–4 full competitions)
- At least 4 distinct competitions
Until these thresholds are met, a pilot has a rating (visible in their profile and history) but doesn't receive a numbered rank on the leaderboard. This prevents a single strong weekend from placing someone at #1.
Round Batching
Within a competition round, multiple groups fly simultaneously. The system processes all groups in a round using the same pre-round ratings, then applies all changes at once. This means:
- Group A and Group B in the same round don't influence each other's calculations
- Your round 3 performance is evaluated against ratings that include rounds 1 and 2
- No pilot gets an advantage from the order groups are processed
Tie Dampening
In some tasks (especially Task A — Last Flight), large portions of a group may achieve the maximum score. When over 80% of pilots tie at the top, the system reduces the K-factor proportionally:
If >80% tied at max: effective_K = K × max(0.3, 1.0 - tie_fraction)
A group where 90% tie: effective_K = K × 0.3 (minimal rating movement) A group where 85% tie: effective_K = K × 0.15... capped at 0.3
This prevents large rating swings from rounds with little meaningful differentiation.
Peak Rating
Every pilot has a peak rating — the highest rating they've ever achieved. This never decays. It represents the best form a pilot has shown and serves as a historical record, separate from the current (potentially decayed) rating.
Summary
| Mechanism | Purpose |
|---|---|
| Pairwise within groups | Fair comparison (same conditions) |
| Margin-aware scoring | Every second matters |
| Expected vs actual | Upsets rewarded, beating weaker pilots less so |
| K-factor scaling | New pilots settle fast, veterans stay stable |
| Group size normalization | Consistent impact regardless of group size |
| Inactivity decay with floor | Reflects current form, respects history |
| Round batching | No processing-order advantage |
| Tie dampening | Prevents noise from maxed-out tasks |
| Ranked thresholds | Proven track record required |
The result: a single number that tells you how a pilot stacks up against the global F3K field right now — updated after every competition, grounded in head-to-head performance, and resistant to gaming through weak fields or cherry-picked events.