Return to Blog

Building a command line tool to find playable words in Scrabble

By John Lekberg on July 25, 2020.


I enjoy playing Scrabble with my family. And I know I could win more often if I only knew more words. So, I built a word-search tool.

In this week's post, I will show you how to build a command line to that finds playable words in Scrabble. You will learn:

Script source code

find-words

#!/usr/bin/env python3

from collections import Counter
from string import ascii_lowercase
import heapq
import re

valid_pattern_re = re.compile("[a-z?]+")
valid_letters_re = re.compile("[a-z?]+")
letter_re = re.compile("[a-z]")

_score_pairs = [
    [1, "eaionrtlsu"],
    [2, "dg"],
    [3, "bcmp"],
    [4, "fhvwy"],
    [5, "k"],
    [8, "jx"],
    [10, "qz"],
]

letter_score = {
    letter: points
    for points, letters in _score_pairs
    for letter in letters
}


def MAIN():
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument(
        "word_list", help="path of word list."
    )
    parser.add_argument(
        "pattern",
        help="word patterns to match. Wildcard is '?'.",
        nargs="+",
    )
    parser.add_argument(
        "--letters",
        help="available letters (default None). Blank is '?'.",
    )
    parser.add_argument(
        "--top",
        help="how many matches to display (default 5).",
        type=int,
        default=5,
        metavar="N",
    )
    args = parser.parse_args()

    word_list = list_words(args.word_list)
    for pattern in args.pattern:
        words = find_words(
            pattern=pattern,
            letters=args.letters,
            word_list=word_list,
        )
        top_ws = heapq.nlargest(
            args.top,
            ((word_score(word), word) for word in words),
        )
        #
        header = f"* {pattern!r}"
        if args.letters:
            header += (
                f" using {''.join(sorted(args.letters))!r}"
            )
        if len(words) == 0:
            header += " (no matches)"
        elif len(words) > args.top:
            header += f" (top {args.top}/{len(words)})"
        print(header)
        #
        for score, word in top_ws:
            print(f"{score:2} {word}")


def find_words(*, pattern, letters, word_list):
    """Find matching, buildable words in a word list.

    pattern -- str. Pattern of letters and wildcards ("?").
        E.g. "h?ppy".
    letters -- str. Available letters and blanks ("?").
        E.g. "hhcdi"
    word_list -- list[str]. All possible words.

    For each word in the word list:

    - A word is matching if it matches the pattern.
    - If no letters are supplied, a word is always
      buildable.
    - If letters are supplied, a word is buildable if it can
      be built from
        - The letters in the pattern.
        - The letters supplied.
        - The available blanks.
    """
    assert valid_pattern_re.fullmatch(pattern)
    assert letters is None or valid_letters_re.fullmatch(
        letters
    )

    pat_re = re.compile(pattern.replace("?", "."))

    if letters is None:

        def buildable(word):
            """Determine if a word is buildable."""
            return True

    else:
        avail_letters = Counter(
            letter_re.findall(letters + pattern)
        )
        avail_blanks = letters.count("?")

        def buildable(word):
            """Determine if a world is buildable."""
            req_letters = Counter(word)
            deficit = req_letters - avail_letters
            req_blanks = sum(deficit.values())
            return req_blanks <= avail_blanks

    words = [
        word
        for word in word_list
        if pat_re.fullmatch(word)
        if buildable(word)
    ]
    return words


def list_words(path):
    """Open a word list and return a list of the words."""
    with open(path) as file:
        return file.read().split()


def word_score(word):
    """Compute the score of the word."""
    return sum(letter_score[letter] for letter in word)


if __name__ == "__main__":
    MAIN()
$ find-words --help
usage: find-words [-h] [--letters LETTERS] [--top N]
                  word_list pattern [pattern ...]

positional arguments:
  word_list          path of word list.
  pattern            word patterns to match. Wildcard is '?'.

optional arguments:
  -h, --help         show this help message and exit
  --letters LETTERS  available letters (default None). Blank is '?'.
  --top N            how many matches to display (default 5).

What is a "word list"?

I use the phrase "word list" to refer to a text file that is a list of valid words. E.g.

words_alpha.txt

a
aa
aaa
aah
aahed
[... 370,093 lines omitted ...]
zwinglianism
zwinglianist
zwitter
zwitterion
zwitterionic

You can build a word list yourself.

You can also find word lists online. E.g.

Using the script to help me play Scrabble

Here are some example scenarios:


I have the letters u, e, e, r, s, v, i. Can I build any 7-letter words?

$ find-words words_alpha.txt '???????' \
      --letters 'ueersvi'
* '???????' using 'eeirsuv' (no matches)

No matches. Can I build any 5- or 6-letter words?

$ find-words words_alpha.txt '??????' '?????' \
      --letters 'ueersvi'
* '??????' using 'eeirsuv'
 9 siever
 9 sevier
 9 revues
 9 revise
 9 reives
* '?????' using 'eeirsuv' (top 5/21)
 8 virus
 8 vires
 8 viers
 8 verus
 8 verse

Yes, I can. Now I know that I can play "virus", or "revise".


I have the letters f, b, o, g, w, a, j. I want to build upwards from the horizontal word car.

Can I build any 4 letter words that end in c, a, or r?

$ find-words words_alpha.txt '???'{c,a,r} \
      --letters 'fbogwaj'
* '???c' using 'abfgjow' (no matches)
* '???a' using 'abfgjow' (top 5/6)
13 baja
12 jaga
 9 faba
 7 boga
 7 baga
* '???r' using 'abfgjow'
 6 boar
 5 goar

Yes, I can play the word "boar".


NOTE: The syntax '???'{c,a,r} is Z Shell Brace Expansion. Here's what it does:

$ echo '???'{c,a,r}
???c ???a ???r

This makes it easy to describe multiple word patterns. E.g. I want to find all 2- or 3-letter words ending in 'x':

$ find-words words_alpha.txt {'?','??'}x 
* '?x' (top 5/10)
16 xx
11 bx
10 dx
 9 ux
 9 tx
* '??x' (top 5/57)
24 xxx
19 zax
17 xix
17 lxx
15 pyx

Other shells also have brace expansion:


I have the letters l, w, d, o, e, u, o. Can I build a word around the letter x?

$ find-words words_alpha.txt \
      {,'?','??','???'}'x'{,'?','??','???'} \
      --letters 'lwdoeuo'
* 'x' using 'deloouw'
 8 x
* 'x?' using 'deloouw'
12 xw
10 xd
 9 xu
* 'x??' using 'deloouw'
11 xed
* 'x???' using 'deloouw' (no matches)
* '?x' using 'deloouw'
10 dx
 9 ux
 9 ox
 9 lx
 9 ex
* '?x?' using 'deloouw' (no matches)
* '?x??' using 'deloouw'
11 exul
* '?x???' using 'deloouw' (no matches)
* '??x' using 'deloouw'
11 dux
11 dex
10 lux
10 lox
10 lex
* '??x?' using 'deloouw'
11 luxe
* '??x??' using 'deloouw'
13 loxed
* '??x???' using 'deloouw' (no matches)
* '???x' using 'deloouw'
12 doux
12 deux
11 ulex
* '???x?' using 'deloouw' (no matches)
* '???x??' using 'deloouw' (no matches)
* '???x???' using 'deloouw' (no matches)

I can play the word "loxed".

How the script works

The function list_words uses io.TextIOBase.read and str.split to turn a word list file into a list of strings.

I turn supplied patterns into a regular expression (regexp) by transforming "?" in the regexp wildcard character ".".

If letters are supplied, I have to determine if a given word is buildable from the pattern and the letters on hand.

I use a list comprehension to do a linear search to find all matching words.

Once I have a list of matching words, I use heapq.nlargest to find the top 5 words by score.

In conclusion...

In this week's post you learned how to build a command line tool that finds playable words in Scrabble. You used list comprehensions to search through a database of words to find all the words that match a pattern. Then you used heapq.nlargest to find the top scoring words. You also used collections.Counter as a multiset to determine if it's possible to build a word, given the letters that you have on hand.

Here's my challenge to you:

Patterns are specified like ???x (a four letter word that ends in "x"). It's awkward to have to type so many ? characters. So, your goal is to implement a run-length encoding feature for patterns. This means that a pattern like

3?x2?

Is turned into

???x??

before further processing.

If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!


(If you spot any errors or typos on this post, contact me via my contact page.)