EIDRaS Explained

by Robert Steinke

The ARMADA Diplomacy club has decided to use EIDRaS as its player ranking system. A well known platitude in the publishing business is that every equation in a document cuts the readership in half. The original EIDRaS article was rather equation heavy, and in an effort to familiarize ARMADA members with the system I wrote this explanation. The end of the article discusses some differences between the originally published EIDRaS system, and how it will be implemented as the ARMADA ranking system.

The EIDRaS system is based on an assumption and some math. The assumption is that if you knew the exact skill, represented by the rating, of each player in a game you should be able to guess the expected result of that game. This is useful for rating players because players doing better than expected might be rated too low, and players that do worse than expected might be rated too high. So the ratings can be adjusted to be more accurate after every game. If a player's rating is correct they will sometimes do better than expected and sometimes worse so their rating should stabilize around the correct value.

The first job is to mathematically score the result of a game.

Score = players / winners (for players that won)
Score = 0 (for players that lost)

In traditional Diplomacy, there are seven points to be gotten in each game, but the equations are generalized for variants with any number of players. The points are shared evenly among the winners. The losers get no points. I'll be using the example from the original EIDRaS paper to illustrate. Here it is again:

Name	Initial Rating	After ABC draw	After D solo	After ABCD draw
A	1300	1319	1290	1299
B	1000	1032	1015	1135
C	800	837	826	850
D	1400	1366	1475	1471
E	900	888	875	864
F	1100	1082	1064	1047
G	1200	1177	1156	1135

And here are the scores achieved by each player in each game:

Name	Score From ABC draw	Score From D solo	Score From ABCD draw
A	7/3	0	7/4
B	7/3	0	7/4
C	7/3	0	7/4
D	0	7	7/4
E	0	0	0
F	0	0	0
G	0	0	0

The scores by themselves don't mean very much. The important thing is the comparison of score with expected score. The expected score formula is probably the most complicated in the paper so I'll start with something simpler. Each player gets a number which is related to their rating. We add up the numbers of all the players to find a
total number for the game. The fraction of the 7 available points that we expect each player to get is equal to the fraction of their number into the total number.

Expected Score = number of points available * (player's number / sum of all players' numbers)

For seven players with the same rating they would each expect 1 point meaning a seven-way draw. Anyone who does better than a 7-way draw will raise their rating. Anyone who does worse will drop. The equation looks complicated because the player's number is calculated by this formula:

Player's number = e^(0.002 * players rating)

An exponential function is chosen so that if a very highly rated player wins a low rated game, or a low rated player loses a highly rated game there will be very little effect on their rating. Getting back to the example, we can now calculate the expected score for each player in the first game.

Name	Initial Rating	Player's Number	Fraction of Total	Expected Score
A	1300	13.46	.197	1.38
B	1000	7.39	.108	0.76
C	800	4.95	.072	0.51
D	1400	16.44	.241	1.68
E	900	6.05	.089	0.62
F	1100	9.03	.132	0.92
G	1200	11.02	.161	1.13
Total of Column:		68.34	1.00	7.00

Only players A, D, and G are expected to get better than a 7-way draw. The other players won't lose much rating if they lose the game. After the first game when A, B, and C draw we calculate the difference between each player's score and expected score.

Name	Score	Expected Score	Difference
A	2.33	1.38	0.95
B	2.33	0.76	1.57
C	2.33	0.51	1.82
D	0	1.68	-1.68
E	0	0.62	-0.62
F	0	0.92	-0.92
G	0	1.13	-1.13
Total of Column	6.99	7.00	-0.01 (rounding error)

From the difference column we can make some qualitative judgements about changes in rating. C should gain about twice as much rating as A gains (+37 vs. +19.) D should lose a little more rating than B gains (-34 vs. +32.) The difference is actually multiplied by a factor of 20. How do we decide this rating change factor? It is more or less arbitrary. A large factor will allow players to move quickly to their correct rating, but a small factor will make the ratings more stable once players are rated correctly. The answer is to make the factor large at first, but decrease as a player plays more games. Even old players need rating mobility so there is a base factor which
is increased by a multiple for a player's first games.

rating change factor = max(50 * base factor / (games played + 5), base factor)

A new player in his first rated game will change his rating by ten times the base factor. After twenty games it will be twice the base, and after forty five games it will be the base.

The base factor is determined by provisional ratings and press. Games with full press are more accurate demonstrations of skill and should create larger changes in rating. Press is given these values: partial 20, broadcast 15, no-press 10. A player's rating is considered provision for his or her first seven games. Provisional players'
ratings are probably not accurate so their opponents' ratings should change by less.

base factor = max(press value * fraction of non-provisional opponents
,press value * 1/3)

For example, if you are playing against six opponents and one of them is provisionally rated then your base factor is 5/6 the press factor. This can never go lower than 1/3 of the press factor so that games with all newbies will still affect their ratings.

Implementing EIDRaS for ARMADA

The ARMADA club will be starting a brand new EIDRaS rating ladder. Only ARMADA members will be rated, and only ARMADA sanctioned games will be used to calculate the ratings. Every ARMADA member will be started with a provisional score of 1000, and as of this writing, the first ARMADA sanctioned email games are being played.

The first decision we made with the ARMADA rating system is that different variants should have completely different rating ladders as skill with one variant does not necessarily imply skill with another. We felt that this applies to different press settings too. So the main ARMADA rating ladder will only count games played with the original rules, and different rating change factors for different press settings are removed. A factor of 20 will be used for all games.

The ARMADA rating system will still use the EIDRaS rules to modify the base factor for provisionally rated players. Any non-ARMADA rated player participating in an ARMADA sanctioned game will be deemed to have a provisional rating of 1000.

The original EIDRaS article also gave rules for games where a replacement player is required. First, the country is deemed to have been played by a player whose rating is the average of the ratings of players who played that country weighted by the number of turns each played. Then the rating change is divided between the players in the
ratio of number of turns played, except that a player who abandons a position cannot increase rating to discourage abandonments. For the ARMADA ladder we hope that we will not have too many abandonments, and those will be in good faith that a player really cannot continue playing such as going on vacation, etc. So we have decided to use the time weighted average system, but allow abandoning players to share in positive rating changes.

Look for the ARMADA Rating ladder at http://www.diplom.org/armada/, and we will keep the pouch updated as to the successes and troubles we encounter using this system.

Robert Steinke
(steinker@cs.colorado.edu)

If you wish to e-mail feedback on this article to the author, and clicking on the envelope above does not work for you, feel free to use the "Dear DP..." mail interface.