TLDR Double elimination is the best tournament format for producing results that reflect the strength of the teams. A stochastic MATLAB code was written to provide some stats to support this claim.
Double elimination has long been one of the most popular tournament formats because it provides a cushion for when a team has a bad day or gets a bad seeding. Most people intuitively realize that single elimination leads to higher statistical fluctuations. There are downsides to double elimination including the fact that more games need to be played (this could be good or bad), and the finals problem rears its ugly head. For double elimination finals, the team that wins out to reach the finals must either be eliminated after only one series loss or be given an advantage in the finals which makes for a much less exciting finale.
There are four tournament formats with eight teams that are under consideration in this analysis.
- DES: Double elimination standard. DES example
- DENS: Double elimination non-standard. Notice in this DENS example how Gamelanders are not swapped to the other side of the bracket when they drop to lower round 2.
- SE: Single elimination. SE example
- GSS: Two groups of four with double elimination that leads to a four team single elimination playoffs. GSS example
METHODOLOGY
Every team is given a rank from 1-8. This analysis is primarily concerned with two systems. When the seeding is perfect (seed 1 vs seed 8, seed 4 vs seed 5, etc) and when the seeding is random. Random seeding imitates when a better team gets a lower seed than a worse team. Obviously most real tournaments happen somewhere between these two outcomes, but by looking at the extremes, the middle ground can be interpolated.
Each tournament is simulated with a MATLAB code at least ten thousand times to ensure the results converge on an answer. The winner of each matchup is determined by a random number. Two approaches were taken in this analysis. The first consists of assuming that every higher seed wins. The second is to assign each team (1-8) a value along a Gaussian distribution (bell curve) and the probability that the higher rank wins is determined by these values. The Gaussian distribution used in this analysis gives a 90% chance that the rank 1 team beats the rank 8 team 1(90)8, a 57.3% chance that the rank 3 team beats the rank 6 team 3(57.3)6, 2(71.1)7, and so on. Most tournaments don’t follow an ideal bell curve, but the exact distribution does not matter as long as the higher ranked teams have a higher chance of beating lower ranked teams. The exact numerical values of the results that are achieved will be different, but the ranking of the different tournament formats will be the same.
What result qualifies as a good result? Upsets are good for specific narratives and overall excitement, but if there are too many upsets, the best teams don’t reach the end, and the tournament winds up with very anticlimactic matches. (For example, most of Astralis’ opponents in the CSGO major finals had previous great upsets but were completely outmatched in the finals.) A tournament format that allows the best teams to more consistently reach the later matches creates better narratives, better viewing experiences, and a better quality of tournament. There are other factors that go into making a tournament great, but this is very important for the format.
For this code, after every single simulation of a tournament, the placement of each team is compared to their rank. For instance, if the rank 1 team actually placed third, they would add +2 to a “variance” tally that is averaged over all the teams in the tournament. If the tournament variance tally is averaged to be 1.5 that means that each team placed an average of 1.5 spots away from where they were expected to place based on their rank. The values of these variance tallies are averaged over the thousands of times the tournament format is simulated. A smaller number is better for the overall health of the tournament. (A very tiny number is bad because that would mean only the best teams would have a chance, but all of these formats have reasonable numbers).
RESULTS
The easiest question to answer is “what format struggles with bad seeding?”. Assume that the higher rank wins every single time and that the seedings are random. (Results from where there is upset potential will be dealt with later). The numbers shown represent the average “variance” tallies described in the previous section.
DES 0.4744
DENS 0.5572
GSS 0.6624
SE 1.1429
If you randomly seed single elimination (SE), a lot of high seeds will face each other in the early rounds. That leads to the teams being on average 1.1429 placements away from their rank. Standard double elimination (DES) does the best with only 0.4744 placements on average per team. If a tournament does not have the capacity to seed properly through qualifiers or rankings they should never use single elimination. If there are simply too many games for double elimination such as large open qualifiers with tons of teams, single elim is fine, but playoffs should use double elimination.
So double elimination is better at dealing with poor seeding, but how does it deal with upsets? As previously stated, upsets are not a bad thing, but the best teams should still be able to reach the later matches more consistently. Using the Gaussian distribution to create a % chance of an upset depending on the difference in rank between the teams, the four tournaments were simulated again. UV stands for “upset variance” and is the variance when the seeding is perfect. SV stands for “seeding variance” and is the variance when the seeding is random. Most real tournaments will fall somewhere in between.
UV SV
DES 1.7063 1.8892
DENS 1.6796 1.8913
GSS 1.7516 1.9249
SE 1.7724 2.0727
Interestingly enough, non standard double elimination fares better than the standard version with perfect seeding because it makes upsets less common in the lower bracket round 2. This is the only result which is close enough to make the method used to determine the % chance of an upset matter. The SV is usually worse for the non standard version.
Another stat that might be of interest is the quality of the grand finals which is easily the most important match of the tournament. The finals quality FQ is calculated as the average rank of the two teams that make the grand finals. (For instance, if the rank 1 team and the rank 3 team made the finals, the FQ would be 2). FQR is the same stat but for random seeding instead of perfect seeding.
FQ FQR
DES 2.8035 3.0098
DENS 2.7838 3.0118
GSS 3.0284 3.1399
SE 2.9785 3.3630
Pretty much everything that can be said about the UV and SV results can be said about the finals quality. GSS finally dethroned single elimination as the worst format with perfect seeding although it still fares better with random seeding.
CONCLUSIONS
If a tournament organizer is happy with having more games and dealing with the finals problem, double elimination is clearly the best tournament to run. A lot of tournaments run single elimination simply because that format is the simplest for human brains to understand and build hype around. Throwing in some double elimination in order to reach a single elimination playoffs can help alleviate the problems of single elimination particularly when there are more than eight teams.