Building a command line tool to find playable words in Scrabble
By John Lekberg on July 25, 2020.
I enjoy playing Scrabble with my family. And I know I could win more often if I only knew more words. So, I built a word-search tool.
In this week's post, I will show you how to build a command line to that finds playable words in Scrabble. You will learn:
- How to use list comprehensions to search through a dataset of words to find all the words that match a pattern.
- How to use heapq.nlargest to find the top scoring words.
- How to use collections.Counter objects as multisets, to determine if it's possible to build a word, given the letters that you have on hand.
Script source code
find-words
#!/usr/bin/env python3
from collections import Counter
from string import ascii_lowercase
import heapq
import re
valid_pattern_re = re.compile("[a-z?]+")
valid_letters_re = re.compile("[a-z?]+")
letter_re = re.compile("[a-z]")
_score_pairs = [
[1, "eaionrtlsu"],
[2, "dg"],
[3, "bcmp"],
[4, "fhvwy"],
[5, "k"],
[8, "jx"],
[10, "qz"],
]
letter_score = {
letter: points
for points, letters in _score_pairs
for letter in letters
}
def MAIN():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
"word_list", help="path of word list."
)
parser.add_argument(
"pattern",
help="word patterns to match. Wildcard is '?'.",
nargs="+",
)
parser.add_argument(
"--letters",
help="available letters (default None). Blank is '?'.",
)
parser.add_argument(
"--top",
help="how many matches to display (default 5).",
type=int,
default=5,
metavar="N",
)
args = parser.parse_args()
word_list = list_words(args.word_list)
for pattern in args.pattern:
words = find_words(
pattern=pattern,
letters=args.letters,
word_list=word_list,
)
top_ws = heapq.nlargest(
args.top,
((word_score(word), word) for word in words),
)
#
header = f"* {pattern!r}"
if args.letters:
header += (
f" using {''.join(sorted(args.letters))!r}"
)
if len(words) == 0:
header += " (no matches)"
elif len(words) > args.top:
header += f" (top {args.top}/{len(words)})"
print(header)
#
for score, word in top_ws:
print(f"{score:2} {word}")
def find_words(*, pattern, letters, word_list):
"""Find matching, buildable words in a word list.
pattern -- str. Pattern of letters and wildcards ("?").
E.g. "h?ppy".
letters -- str. Available letters and blanks ("?").
E.g. "hhcdi"
word_list -- list[str]. All possible words.
For each word in the word list:
- A word is matching if it matches the pattern.
- If no letters are supplied, a word is always
buildable.
- If letters are supplied, a word is buildable if it can
be built from
- The letters in the pattern.
- The letters supplied.
- The available blanks.
"""
assert valid_pattern_re.fullmatch(pattern)
assert letters is None or valid_letters_re.fullmatch(
letters
)
pat_re = re.compile(pattern.replace("?", "."))
if letters is None:
def buildable(word):
"""Determine if a word is buildable."""
return True
else:
avail_letters = Counter(
letter_re.findall(letters + pattern)
)
avail_blanks = letters.count("?")
def buildable(word):
"""Determine if a world is buildable."""
req_letters = Counter(word)
deficit = req_letters - avail_letters
req_blanks = sum(deficit.values())
return req_blanks <= avail_blanks
words = [
word
for word in word_list
if pat_re.fullmatch(word)
if buildable(word)
]
return words
def list_words(path):
"""Open a word list and return a list of the words."""
with open(path) as file:
return file.read().split()
def word_score(word):
"""Compute the score of the word."""
return sum(letter_score[letter] for letter in word)
if __name__ == "__main__":
MAIN()
$ find-words --help
usage: find-words [-h] [--letters LETTERS] [--top N]
word_list pattern [pattern ...]
positional arguments:
word_list path of word list.
pattern word patterns to match. Wildcard is '?'.
optional arguments:
-h, --help show this help message and exit
--letters LETTERS available letters (default None). Blank is '?'.
--top N how many matches to display (default 5).
What is a "word list"?
I use the phrase "word list" to refer to a text file that is a list of valid words. E.g.
words_alpha.txt
a
aa
aaa
aah
aahed
[... 370,093 lines omitted ...]
zwinglianism
zwinglianist
zwitter
zwitterion
zwitterionic
You can build a word list yourself.
You can also find word lists online. E.g.
Using the script to help me play Scrabble
Here are some example scenarios:
I have the letters u, e, e, r, s, v, i. Can I build any 7-letter words?
$ find-words words_alpha.txt '???????' \ --letters 'ueersvi'
* '???????' using 'eeirsuv' (no matches)
No matches. Can I build any 5- or 6-letter words?
$ find-words words_alpha.txt '??????' '?????' \ --letters 'ueersvi'
* '??????' using 'eeirsuv'
9 siever
9 sevier
9 revues
9 revise
9 reives
* '?????' using 'eeirsuv' (top 5/21)
8 virus
8 vires
8 viers
8 verus
8 verse
Yes, I can. Now I know that I can play "virus", or "revise".
I have the letters f, b, o, g, w, a, j. I want to build upwards from the horizontal word car.
Can I build any 4 letter words that end in c, a, or r?
$ find-words words_alpha.txt '???'{c,a,r} \ --letters 'fbogwaj'
* '???c' using 'abfgjow' (no matches)
* '???a' using 'abfgjow' (top 5/6)
13 baja
12 jaga
9 faba
7 boga
7 baga
* '???r' using 'abfgjow'
6 boar
5 goar
Yes, I can play the word "boar".
NOTE: The syntax '???'{c,a,r}
is Z
Shell Brace
Expansion.
Here's what it does:
$ echo '???'{c,a,r}
???c ???a ???r
This makes it easy to describe multiple word patterns. E.g. I want to find all 2- or 3-letter words ending in 'x':
$ find-words words_alpha.txt {'?','??'}x
* '?x' (top 5/10)
16 xx
11 bx
10 dx
9 ux
9 tx
* '??x' (top 5/57)
24 xxx
19 zax
17 xix
17 lxx
15 pyx
Other shells also have brace expansion:
I have the letters l, w, d, o, e, u, o. Can I build a word around the letter x?
$ find-words words_alpha.txt \ {,'?','??','???'}'x'{,'?','??','???'} \ --letters 'lwdoeuo'
* 'x' using 'deloouw'
8 x
* 'x?' using 'deloouw'
12 xw
10 xd
9 xu
* 'x??' using 'deloouw'
11 xed
* 'x???' using 'deloouw' (no matches)
* '?x' using 'deloouw'
10 dx
9 ux
9 ox
9 lx
9 ex
* '?x?' using 'deloouw' (no matches)
* '?x??' using 'deloouw'
11 exul
* '?x???' using 'deloouw' (no matches)
* '??x' using 'deloouw'
11 dux
11 dex
10 lux
10 lox
10 lex
* '??x?' using 'deloouw'
11 luxe
* '??x??' using 'deloouw'
13 loxed
* '??x???' using 'deloouw' (no matches)
* '???x' using 'deloouw'
12 doux
12 deux
11 ulex
* '???x?' using 'deloouw' (no matches)
* '???x??' using 'deloouw' (no matches)
* '???x???' using 'deloouw' (no matches)
I can play the word "loxed".
How the script works
The function list_words
uses
io.TextIOBase.read
and str.split to
turn a word list file into a
list
of
strings.
I turn supplied patterns into a regular expression (regexp) by transforming
"?"
in the regexp wildcard character "."
.
If letters are supplied, I have to determine if a given word is buildable from the pattern and the letters on hand.
- I start by determining the letters required to build a word, using collections.Counter as a multiset.
- Then I subtract the letters from the pattern and subtract the letters on hand.
- If there are any remaining letters, then I can only build the word if I have enough blank tiles to cover the rest of the letters.
I use a list comprehension to do a linear search to find all matching words.
Once I have a list of matching words, I use heapq.nlargest to find the top 5 words by score.
In conclusion...
In this week's post you learned how to build a command line tool that finds playable words in Scrabble. You used list comprehensions to search through a database of words to find all the words that match a pattern. Then you used heapq.nlargest to find the top scoring words. You also used collections.Counter as a multiset to determine if it's possible to build a word, given the letters that you have on hand.
Here's my challenge to you:
Patterns are specified like
???x
(a four letter word that ends in "x"). It's awkward to have to type so many?
characters. So, your goal is to implement a run-length encoding feature for patterns. This means that a pattern like
3?x2?
Is turned into
???x??
before further processing.
If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!
(If you spot any errors or typos on this post, contact me via my contact page.)