Return to Blog

Building a command line tool to compute Elo ratings

By John Lekberg on May 01, 2020.


This week's post will cover building a command line tool that computes Elo ratings. You will learn:

The Elo rating system calculates the relative skill levels of players in zero-sum games.

Script source code

elo

#!/usr/bin/env python3

import collections
import enum
import sys

Result = enum.Enum("Result", ["WinA", "WinB", "Draw"])
Result.__doc__ = """
The result of a match: A wins, B wins, or they draw.
"""

Player = collections.namedtuple("Player", ["A", "B"])
Player.__doc__ = """
Hold related data for players A and B.
This is just for convenience.
"""


def update_scores(*, ratings, results, K):
    """Update players' ELO ratings based on an iterable of
    results.

    ratings -- (Player) initial ELO ratings.
        E.g. Player(A=1000, B=1000)
    results -- (iterable[Result]) the match results.
        E.g. [Result.WinA, Result.Draw, Result.Draw]
    K -- The K-factor - the maximum possible adjustment per
        game. E.g. 24.

    For more information, see
    > https://en.wikipedia.org/wiki/Elo_rating_system
    """
    R = ratings

    for result in results:
        Q = Player(A=10 ** (R.A / 400), B=10 ** (R.B / 400))
        E = Player(A=Q.A / (Q.A + Q.B), B=Q.B / (Q.A + Q.B))
        if result is Result.WinA:
            S = Player(A=1, B=0)
        elif result is Result.WinB:
            S = Player(A=0, B=1)
        elif result is Result.Draw:
            S = Player(A=0.5, B=0.5)
        R = Player(
            A=R.A + K * (S.A - E.A), B=R.B + K * (S.B - E.B)
        )

    R = Player(A=round(R.A), B=round(R.B))

    return R


def parse_results(lines):
    """Yield Result objects from an iterable of lines.

    lines -- (iterable[str]) an iterable of lines. E.g.
        a file object.

    The translations are

    *line*  *result*
    "A"     Result.WinA
    "B"     Result.WinB
    "DRAW"  Result.Draw

    NOTE: Before translating, lines are casefolded and have
    whitespace stripped.
    """
    WinA = "A".casefold()
    WinB = "B".casefold()
    Draw = "DRAW".casefold()
    for line in lines:
        line = line.strip().casefold()
        if line == WinA:
            yield Result.WinA
        elif line == WinB:
            yield Result.WinB
        elif line == Draw:
            yield Result.Draw


def positive_int(x):
    """Parse an int. Assert that it is positive."""
    i = int(x)
    assert i > 0
    return i


def positive_float(x):
    """Parse a float. Assert that it is positive."""
    f = float(x)
    assert f > 0
    return f


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument(
        "A", type=positive_int, help="Player A's Elo rating"
    )
    parser.add_argument(
        "B", type=positive_int, help="Player B's Elo rating"
    )
    parser.add_argument(
        "--k",
        type=positive_float,
        default=24,
        help="the K-factor (default 24)",
    )
    args = parser.parse_args()

    ratings_old = Player(A=args.A, B=args.B)
    ratings_new = update_scores(
        results=parse_results(sys.stdin),
        ratings=ratings_old,
        K=args.k,
    )

    ro, rn = ratings_old, ratings_new
    print(f"Original Ratings: [A={ro.A}] [B={ro.B}]")
    print(f"Updated Ratings: [A={rn.A}] [B={rn.B}]")
$ ./elo --help
usage: elo [-h] [--k K] A B

positional arguments:
  A           Player A's Elo rating
  B           Player B's Elo rating

optional arguments:
  -h, --help  show this help message and exit
  --k K       the K-factor (default 24)

Using the script to rank me and my friend in Mario Kart

My friend Nicole and I like to play Mario Kart together. We play a competitive game mode and I keep track of who wins. Here's that data file, listing who won on which date:

data.txt

2020-04-13,john
2020-04-14,john
2020-04-15,nicole
2020-04-16,nicole
2020-04-17,nicole
2020-04-18,john
2020-04-19,john
2020-04-20,nicole
2020-04-21,nicole
2020-04-22,john
2020-04-23,nicole
2020-04-24,john
2020-04-25,john
2020-04-26,draw
2020-04-27,nicole

To make this parseable by elo, I use cut and sed:

$ cut -f2 -d, data.txt |
    sed 's/john/A/;s/nicole/B/'
A
A
B
B
B
A
A
B
B
A
B
A
A
draw
B

I give me and Nicole initial skill ratings of 1000. Then I use elo to calculate our current skill ratings.

$ cut -f2 -d, data.txt |
    sed 's/john/A/;s/nicole/B/' |
    ./elo 1000 1000
Original Ratings: [A=1000] [B=1000]
Updated Ratings: [A=998] [B=1002]

So, currently Nicole (rating 1002) is a bit more skillful than me (rating 998) at Mario Kart.

How the script works

I use an Enum object to represent the results of a match:

I use a namedtuple to hold the calculations for each Player. Instead of writing code like this

Q_A = 10 ** (R_A / 400)
Q_B = 10 ** (R_B / 400)

I write this

Q = Player(A=10 ** (R.A / 400), B=10 ** (R.B / 400))

If I were to use third party libraries, I would use numpy to vectorize the operations. Instead of writing code like this:

R = Player(A=1000, B=1000)
Q = Player(A=10 ** (R.A / 400), B=10 ** (R.B / 400))
E = Player(A=Q.A / (Q.A + Q.B), B=Q.B / (Q.A + Q.B))

I would write

import numpy

R = numpy.array([1000, 1000])
Q = 10 ** (R / 400)
E = Q / Q.sum()

A Result object is turned into a Player variable that holds "scores" this way:

I got the scores for a draw from "How are draws calculated in the ELO system" via Chess.com.

When parsing results from input, I use str.casefold to casefold the data. This allows me to do case insensitive comparisons. As a result, "a" and "A" are treated the same; "draw", "Draw", and "DRAW" are treated the same; etc. I use casefolding instead of converting to lower- or upper-case because:

positive_int and positive_float are functions that I created that act like int and float. However, they also check that the parsed number is positive. This is useful because errors from passing negative numbers into the equations are caught immediately as the command line arguments are parsed.

positive_int("10")
10
positive_int("-2")
AssertionError

I got the algorithm for updating Elo ratings from "Elo rating system" via Wikipedia. Read that document for more information on the mathematical details of the algorithm.

In conclusion...

In this week's post you learned how to calculate Elo ratings in Python. Elo ratings are a useful way to compare the relative skills of players in two-player zero-sum games. Calculating skill of players in three-player (or more) games is more complicated. Read these documents to learn more about the work that Microsoft Research and others have done to calculate skill ratings for multiplayer games:

My challenge to you:

Create a new program skill, that works like elo, but calculates skill ratings for a three-player game.

If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!


(If you spot any errors or typos on this post, contact me via my contact page.)