Comprehensions and generator expressions in Python
By John Lekberg on July 16, 2020.
This week's post is about using comprehensions and generator expressions in Python. You will learn:
- How to use comprehensions to create lists, sets, and dictionaries.
- How to use generator expressions to keep memory usage low.
What are comprehensions?
Comprehensions are a way to create lists, sets, and dictionaries by transforming and filtering other iterables. E.g.
[ (x ** 2) for x in [1, 1, 2, 3, 5] ]
[1, 1, 4, 9, 25]
{ word.casefold(): word for word in ["Hello", "THERE", "geneRAL"] }
{'hello': 'Hello', 'there': 'THERE', 'general': 'geneRAL'}
real_commands = { "sit", "stay", "heel", "wait" } issued_commands = { "SIT", "Sit", "Bark", "jump", "heel", "BARK" } { command for command in issued_commands if command.lower() not in real_commands }
{'BARK', 'Bark', 'jump'}
Here's how I would write the above code without using comprehensions.
-
Instead of
[ (x ** 2) for x in [1, 1, 2, 3, 5] ]
[1, 1, 4, 9, 25]
I would write
a = [] for x in [1, 1, 2, 3, 5]: a.append(x ** 2) a
[1, 1, 4, 9, 25]
-
Instead of
{ word.casefold(): word for word in ["Hello", "THERE", "geneRAL"] }
{'hello': 'Hello', 'there': 'THERE', 'general': 'geneRAL'}
I would write
d = {} for word in ["Hello", "THERE", "geneRAL"]: d[word.casefold()] = word d
{'hello': 'Hello', 'there': 'THERE', 'general': 'geneRAL'}
-
Instead of
real_commands = { "sit", "stay", "heel", "wait" } issued_commands = { "SIT", "Sit", "Bark", "jump", "heel", "BARK" } { command for command in issued_commands if command.lower() not in real_commands }
{'BARK', 'Bark', 'jump'}
I would write
real_commands = { "sit", "stay", "heel", "wait" } issued_commands = { "SIT", "Sit", "Bark", "jump", "heel", "BARK" } s = set() for command in issued_commands: if command.lower() not in real_commands: s.add(command) s
{'BARK', 'Bark', 'jump'}
There are three types of comprehensions:
-
List comprehensions (PEP 202) create lists and look like
[ exp for-in ... ]
E.g.
[ x ** 2 for x in range(-4, 5) ]
[16, 9, 4, 1, 0, 1, 4, 9, 16]
-
Set comprehensions create sets and look like
{ exp for-in ... }
E.g.
{ x ** 2 for x in range(-4, 5) }
{0, 1, 4, 9, 16}
-
Dictionary comprehensions (PEP 274) create dicts and look like
{ exp: exp for-in ... }
E.g.
{ x: x ** 2 for x in range(-4, 5) }
{ -4: 16, -3: 9, -2: 4, -1: 1, 0: 0, 1: 1, 2: 4, 3: 9, 4: 16 }
Using multiple for-loops and if-statements in a comprehension
Comprehensions can have multiple for-loops and if-statements. E.g. Here is a list comprehension that generates some Pythagorean triples:
domain = range(1, 100) [ (a, b, c) for a in domain for b in domain for c in domain if a < b if b < c if a ** 2 + b ** 2 == c ** 2 ]
[(3, 4, 5),
(5, 12, 13),
(6, 8, 10),
...
(57, 76, 95),
(60, 63, 87),
(65, 72, 97)]
Without using a list comprehension, I would write this as:
domain = range(1, 100) triples = [] for a in domain: for b in domain: for c in domain: if a < b: if b < c: if a ** 2 + b ** 2 == c ** 2: triples.append((a, b, c)) triples
[(3, 4, 5),
(5, 12, 13),
(6, 8, 10),
...
(57, 76, 95),
(60, 63, 87),
(65, 72, 97)]
The for-loops and if-statements can be mixed together. Doing this can lead to speedups:
-
Compare 26.6 seconds
import cProfile domain = range(1, 500) cProfile.run(""" [ (a, b, c) for a in domain for b in domain for c in domain if a < b if b < c if a ** 2 + b ** 2 == c ** 2 ] """)
4 function calls in 26.598 seconds ...
-
... to 23.0 seconds.
import cProfile domain = range(1, 500) cProfile.run(""" [ (a, b, c) for a in domain for b in domain if a < b for c in domain if b < c if a ** 2 + b ** 2 == c ** 2 ] """)
4 function calls in 23.055 seconds ...
What are generator expressions?
Generator expressions (PEP 289) are a way to create generators (a kind of iterator) by transforming and filtering other iterables. Think of generator expressions as "iterator comprehensions". E.g. Here's a generator expression of the first few square numbers:
( x ** 2 for x in range(10) )
<generator object <genexpr> at 0x10511cc80>
Generator expressions have a similar syntax to comprehensions. They look like
( exp for-in ... )
To access the data in the generator, I need to consume the iterator using, e.g., list:
list(( x ** 2 for x in range(10) ))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
NOTE: When a generator expression is the only argument to a function, the parentheses do not need to be written:
list(( x ** 2 for x in range(10) ))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
list(x ** 2 for x in range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
What's the use of generator expressions?
-
I can generate "comprehensions" for collections like tuple and frozenset:
tuple(x ** 2 for x in range(10))
(0, 1, 4, 9, 16, 25, 36, 49, 64, 81)
frozenset(x ** 2 for x in range(10))
frozenset({0, 1, 4, 9, 16, 25, 36, 49, 64, 81})
-
Generator expressions don't load all the elements into memory, making it easier to work on large amounts of data. E.g. Here's a function that reports object memory usage:
import bisect import sys def report_memory(exp): x = eval(exp) size_bytes = sys.getsizeof(x) coefficients = [1024 ** n for n in [1, 2]] units = ["B", "KiB", "MiB"] idx = bisect.bisect(coefficients, size_bytes) coef = [1, *coefficients][idx] unit = units[idx] print(f"{exp!r} takes {size_bytes//coef} {unit}")
Compare 819 mebibytes:
domain = range(100_000_000) report_memory('[ x ** 2 for x in domain ]')
'[ x ** 2 for x in domain ]' takes 819 MiB
... to 112 bytes:
report_memory('( x ** 2 for x in domain )')
'( x ** 2 for x in domain )' takes 112 B
Because generator expressions can keep memory usage lower than comprehensions, I like to use them with these functions:
In conclusion...
In this week's post, you learned about comprehensions and generator expressions, which are a concise way to create lists (and other containers) by transforming and subsetting iterables. The comprehension "notation" is similar to set-builder notation in mathematics.
My challenge to you:
I have a 3 byte passcode with this SHA-256 checksum:
6d6125cc4538aaec9dbef490ab1091a6cb4af5348f96a5cb0bfeeeda6edfebbe
Use a comprehension or generator expression to figure out what 3 byte passcode produces this checksum.
You can generate the checksum using hashlib.sha256:
from hashlib import sha256 sha256(b"Hello World").hexdigest()
'a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e'
If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!
(If you spot any errors or typos on this post, contact me via my contact page.)