Building a command line tool to compute Elo ratings
By John Lekberg on May 01, 2020.
This week's post will cover building a command line tool that computes Elo ratings. You will learn:
- How to calculate Elo ratings in Python.
- How to use casefolding to make case-insensitive comparisons.
- How to have argparse validate command line options. (E.g. checking that a number is positive.)
The Elo rating system calculates the relative skill levels of players in zero-sum games.
Script source code
elo
#!/usr/bin/env python3
import collections
import enum
import sys
Result = enum.Enum("Result", ["WinA", "WinB", "Draw"])
Result.__doc__ = """
The result of a match: A wins, B wins, or they draw.
"""
Player = collections.namedtuple("Player", ["A", "B"])
Player.__doc__ = """
Hold related data for players A and B.
This is just for convenience.
"""
def update_scores(*, ratings, results, K):
"""Update players' ELO ratings based on an iterable of
results.
ratings -- (Player) initial ELO ratings.
E.g. Player(A=1000, B=1000)
results -- (iterable[Result]) the match results.
E.g. [Result.WinA, Result.Draw, Result.Draw]
K -- The K-factor - the maximum possible adjustment per
game. E.g. 24.
For more information, see
> https://en.wikipedia.org/wiki/Elo_rating_system
"""
R = ratings
for result in results:
Q = Player(A=10 ** (R.A / 400), B=10 ** (R.B / 400))
E = Player(A=Q.A / (Q.A + Q.B), B=Q.B / (Q.A + Q.B))
if result is Result.WinA:
S = Player(A=1, B=0)
elif result is Result.WinB:
S = Player(A=0, B=1)
elif result is Result.Draw:
S = Player(A=0.5, B=0.5)
R = Player(
A=R.A + K * (S.A - E.A), B=R.B + K * (S.B - E.B)
)
R = Player(A=round(R.A), B=round(R.B))
return R
def parse_results(lines):
"""Yield Result objects from an iterable of lines.
lines -- (iterable[str]) an iterable of lines. E.g.
a file object.
The translations are
*line* *result*
"A" Result.WinA
"B" Result.WinB
"DRAW" Result.Draw
NOTE: Before translating, lines are casefolded and have
whitespace stripped.
"""
WinA = "A".casefold()
WinB = "B".casefold()
Draw = "DRAW".casefold()
for line in lines:
line = line.strip().casefold()
if line == WinA:
yield Result.WinA
elif line == WinB:
yield Result.WinB
elif line == Draw:
yield Result.Draw
def positive_int(x):
"""Parse an int. Assert that it is positive."""
i = int(x)
assert i > 0
return i
def positive_float(x):
"""Parse a float. Assert that it is positive."""
f = float(x)
assert f > 0
return f
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
"A", type=positive_int, help="Player A's Elo rating"
)
parser.add_argument(
"B", type=positive_int, help="Player B's Elo rating"
)
parser.add_argument(
"--k",
type=positive_float,
default=24,
help="the K-factor (default 24)",
)
args = parser.parse_args()
ratings_old = Player(A=args.A, B=args.B)
ratings_new = update_scores(
results=parse_results(sys.stdin),
ratings=ratings_old,
K=args.k,
)
ro, rn = ratings_old, ratings_new
print(f"Original Ratings: [A={ro.A}] [B={ro.B}]")
print(f"Updated Ratings: [A={rn.A}] [B={rn.B}]")
$ ./elo --help
usage: elo [-h] [--k K] A B
positional arguments:
A Player A's Elo rating
B Player B's Elo rating
optional arguments:
-h, --help show this help message and exit
--k K the K-factor (default 24)
Using the script to rank me and my friend in Mario Kart
My friend Nicole and I like to play Mario Kart together. We play a competitive game mode and I keep track of who wins. Here's that data file, listing who won on which date:
data.txt
2020-04-13,john
2020-04-14,john
2020-04-15,nicole
2020-04-16,nicole
2020-04-17,nicole
2020-04-18,john
2020-04-19,john
2020-04-20,nicole
2020-04-21,nicole
2020-04-22,john
2020-04-23,nicole
2020-04-24,john
2020-04-25,john
2020-04-26,draw
2020-04-27,nicole
To make this parseable by elo
, I use cut and sed:
$ cut -f2 -d, data.txt | sed 's/john/A/;s/nicole/B/'
A
A
B
B
B
A
A
B
B
A
B
A
A
draw
B
I give me and Nicole initial skill ratings of 1000.
Then I use elo
to calculate our current skill ratings.
$ cut -f2 -d, data.txt | sed 's/john/A/;s/nicole/B/' | ./elo 1000 1000
Original Ratings: [A=1000] [B=1000]
Updated Ratings: [A=998] [B=1002]
So, currently Nicole (rating 1002) is a bit more skillful than me (rating 998) at Mario Kart.
How the script works
I use an Enum object to represent the results of a match:
- Player A wins.
- Player B wins.
- It's a draw.
I use a namedtuple to hold the calculations for each Player. Instead of writing code like this
Q_A = 10 ** (R_A / 400)
Q_B = 10 ** (R_B / 400)
I write this
Q = Player(A=10 ** (R.A / 400), B=10 ** (R.B / 400))
If I were to use third party libraries, I would use numpy to vectorize the operations. Instead of writing code like this:
R = Player(A=1000, B=1000)
Q = Player(A=10 ** (R.A / 400), B=10 ** (R.B / 400))
E = Player(A=Q.A / (Q.A + Q.B), B=Q.B / (Q.A + Q.B))
I would write
import numpy
R = numpy.array([1000, 1000])
Q = 10 ** (R / 400)
E = Q / Q.sum()
A Result
object is turned into a Player
variable that holds "scores" this way:
- If Player A wins, the score is
Player(A=1, B=0)
. - If Player B wins, the score is
Player(A=0, B=1)
. - If it's a draw, the score is
Player(A=0.5, B=0.5)
.
I got the scores for a draw from "How are draws calculated in the ELO system" via Chess.com.
When parsing results from input, I use str.casefold to casefold the data. This allows me to do case insensitive comparisons. As a result, "a" and "A" are treated the same; "draw", "Draw", and "DRAW" are treated the same; etc. I use casefolding instead of converting to lower- or upper-case because:
- There are edge cases with using lower-case or upper-case for case-insensitive comparisons.
- In my opinion, using casefolding clearly communicates the intent to do a case-insensitive comparison.
positive_int
and positive_float
are functions that I created that act like
int and float.
However, they also check that the parsed number is positive.
This is useful because errors from passing negative numbers into the equations
are caught immediately as the command line arguments are parsed.
positive_int("10")
10
positive_int("-2")
AssertionError
I got the algorithm for updating Elo ratings from "Elo rating system" via Wikipedia. Read that document for more information on the mathematical details of the algorithm.
In conclusion...
In this week's post you learned how to calculate Elo ratings in Python. Elo ratings are a useful way to compare the relative skills of players in two-player zero-sum games. Calculating skill of players in three-player (or more) games is more complicated. Read these documents to learn more about the work that Microsoft Research and others have done to calculate skill ratings for multiplayer games:
- "TrueSkill Ranking System" via Microsoft Research
- "Computing Your Skill" by Jeff Moser
- "Multiplayer Elo" by Tom Kerrigan
My challenge to you:
Create a new program
skill
, that works likeelo
, but calculates skill ratings for a three-player game.
If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!
(If you spot any errors or typos on this post, contact me via my contact page.)