See YARS Later, Welcome EIDRaS

The observant amongst you may have noticed the Pouch's YARS rating system is currently down. This was partly technical because the way it was coded placed great strains on the diplom.org machine but also because certain people questioned the meaningfulness of the YARS ratings. This article introduces a new Diplomacy rating system, based on the ELO chess methodology, that should be up and running at the Pouch by the time the next issue of the Zine hits your Web browser.

In many ways, both YARS and the Hall of Fame work well in their intent to judge the best and most prolific players. However for us mere mortals the ratings are not comparable. For instance, in YARS a slightly negative rating may indicate frequent above average play or dreadful but infrequent play. What we need is a rating system that only judges ability, not the number of games played.

EIDRaS is an abbreviation of ELO Inspired Diplomacy Rating System, as you may have guessed it thus has the following advantageous ELO properties:

- Players of similar ability have similar ratings allowing GMs to designate games for players rated 2000+ or between 1000 and 1400. Players joining such games will be guaranteed a more even quality and thus hopefully better quality game.
- The difference in rating between two players is indicative of how their performances should differ if paired in the same game. For instance if you were rated 200 above France, you'd be expected to score half a point more and this would be the case if you were rated 3000 and France 2800 or you 800 to her 600.
- The abilities of opponents affect your rating. Beating up on a bunch of newbies will do your rating a lot less good than soloing against quality opposition. Conversely, losing to poor players will inflict greater damage on your rating.
- Recent results affect your rating more than old ones, hence the ability to improve, or regress, is recognised and with time those awful early results will stop dragging your rating down.
- Players will tend toward their true rating from 1000 over time, hence the top ratings will not be filled with players who have played one or two games but got a solo, nor will it reward playing extra games once a relatively accurate rating is established.
- The rating system includes map variants on an equal basis.

Each player has a rating **R** and each game that rating
changes by an amount **R** (delta R) depending on the result,
**S**, and the expected result, **X**, which itself depends on
how you compare with the opposition ratings wise. Your rating should
be considered as an approximate measure of your ability which with
added results becomes more accurate. The **K** factor represents
this by reducing the degree of rating change as the number of games
you have played increases. The formula is simply

**R = K(S - X)**

**S = W.n ÷ N **is the game score where **n** is number of
players, **N** the number of winners, and **W** is 1 if you won/draw
or 0 if you didn't.

**X** is your expected result depending on your rating compared
to your opponents. **X = n.e^[c.R] ÷ _{j}(e^[c.(R_{j})] ) **where c
= 0.002 and the R

**K **is a rating change factor. It is a measure of
how much more accurate your rating has become as a result of this
game information. Hence it depends on number of games you have played
before, the press settings and how many of your opponents were
provisionally rated. K is calculated by the formula
**K=(max(50s/(g+5),s) **where **g** is the number of games
played; **s** is given by **max(f÷3, p.f)** where p is the
fraction of provisionally rated opponents and **f **a press factor
taking the value of 20 for partial press; 15 for broadcast only and
10 for no-press. For real time (RT) judge games, f=f-4

The system will be seeded by a iterative method. All players are estimated to be worth 1000 rating points and HOF results are put through the above formulae to generate ratings. The output ratings are then used as new estimates and results fed through the system again. This is repeated until the variation in output ratings is small.

Newbies start with a 1000 point rating which will vary as per the above formulae. For the first seven games their ratings will be considered provisional and have less effect on the changes of fellow players rating via the K factor.

If more than one player plays a nation, the nation's rating is
assumed to be the time weighted average of the players concerned.
(Time being measured by the number of movement seasons each was at
the helm.)

Abandonments ratings will change by: **R = min(0, tR÷(t+T))
**where t is the number of seasons you played, T the number you
missed. It is not the place of an ability rating system to hurt the
undedicated however annoying they are, but I don't think they can
benefit either. It has proven very difficult to find a fair formula
for replacements so their rating is unaffected by such games. Like
old age this is not ideal, just better than the alternative.

A group of seven established dippers play three games, one right after the other. Note that the players take into each game the rating they hold as a result of the last game. Here are the results:

Name | Initial Rating | After ABC draw | After D solo | After ABCD draw |
---|---|---|---|---|

Another Stabber |
1300 |
1319 |
1290 |
1299 |

Bobby Bull |
1000 |
1032 |
1015 |
1135 |

Cannon Fodder |
800 |
837 |
826 |
850 |

Dave Decent |
1400 |
1366 |
1475 |
1471 |

Elaine Egotist |
900 |
888 |
875 |
864 |

Fluent Liar |
1100 |
1082 |
1064 |
1047 |

Gil Gullible |
1200 |
1177 |
1156 |
1135 |

Note how the ratings of A, B, and C, who all achieved the same
results, converge, and similarly for E, F and G. Against this
opposition, D really needs to win, which still has a healthy effect on
his rating, but the four-way draw actually leads to a rating *decline*
for D because he should have been able to achieve better.

EIDRaS has been developed by George Heintzelman and myself. Thanks to Brahm Dorst for initiating the rec.games.diplomacy newsgroup thread that buoyed us into action and to all the r.g.d contributors for helping to shape the system. Thanks to Manus for agreeing to host EIDRaS at the Pouch and offering to help code it up (ready, Manus?)

[1] Actually the constant is different, we use the natural logarithm and X averages 1 rather than 0.5 for chess so you cannot use this to compare your chess vs. Diplomacy ability, but the form is essentially identical.

Tony Nichols(anthony.nichols@virgin.net) |

*If you wish to e-mail feedback on this article to the author, click on
the letter above. If that does not work, feel free to use the
" Dear DP..." mail
interface.*