Return to Blog

Sets and Frozen Sets in Python

By John Lekberg on August 08, 2020.


This week's post is about using sets and frozen sets in Python. You will learn:

Why should you care about sets?

Sets are useful because membership checks are a lot faster with sets than with lists:

import cProfile
data_list = list(range(10_000_000))
cProfile.run('9_999_999 in data_list')
3 function calls in 0.136 seconds
data_set = set(range(10_000_000))
cProfile.run('9_999_999 in data_set')
3 function calls in 0.000 seconds

However, creating a set is a bit slower than creating a list:

import cProfile
cProfile.run('list(range(10_000_000))')
3 function calls in 0.520 seconds
cProfile.run('set(range(10_000_000))')
3 function calls in 0.961 seconds

What are sets?

In Python, a set is collection of elements, based on a mathematical set. There are three ways to create a set:

Create an empty set with the expression set():

set()
set()

Do not use {} to create an empty set. Doing this creates a dictionary:

type(set())
set
type({})
dict

Here's how sets are different than Sequences like list and tuple:

For the full API, read "Set Types - set, frozenset" via Python.org.

What are frozen sets?

Frozen sets are immutable, hashable sets. Create them with the built-in function frozenset:

frozenset([1, 2, 3])
frozenset({1, 2, 3})

Because frozen sets are hashable, they can be contained in other sets:

{{1}, {2, 3}}
TypeError: unhashable type: 'set'
{frozenset([1]), frozenset([2, 3])}
{frozenset({1}), frozenset({2, 3})}

Because frozen sets are immutable, elements cannot be added or removed:

data_set = {1, 2, 3}
data_set.add(5)
data_set
{1, 2, 3, 5}
data_fset = frozenset([1, 2, 3])
data_fset.add(5)
AttributeError: 'frozenset' object has no attribute 'add'

Unlike lists and tuples, sets and frozen sets can be compared with each other:

[1, 2, 3] == (1, 2, 3)
False
{1, 2, 3} == frozenset({1, 2, 3})
True

And unlike lists and tuples, sets and frozen sets can be combined with set operations:

[1, 2, 3] + (1, 4, 5)
TypeError: can only concatenate list (not "tuple") to list
{1, 2, 3} | frozenset([1, 4, 5])
{1, 2, 3, 4, 5}

When sets and frozen sets are combined, the type of the result is the type of the left operand:

type(set() | frozenset())
set
type(frozenset() | set())
frozenset

For the full API, read "Set Types - set, frozenset" via Python.org.

Using sets for fast membership checks

Sets use hash tables to store their elements. This means that the time complexity of membership checks is O(1). With a list, membership checks are a linear search and take O(n) time. E.g.

Keep this in mind when writing code that does repeated membership checks.

Using sets to remove duplicates

Here's a simple way to remove duplicates from a list (or any other iterable):

data_list = [1, 1, 1, 2, 2, 2, 1, 1, 3, 2, 1, 1, 3, 1]

list(set(data_list))
[1, 2, 3]

If I'm dealing with generators and potentially infinite data, this technique doesn't work:

import random

stream = iter(lambda: random.randint(0, 100), None)
list(set(stream))
(this code never finishes)

In this case, I usually create a generator function that I call unique:

def unique(it):
    """Remove duplicates from an iterable using a set."""
    past = set()
    for x in it:
        if x not in past:
            yield x
            past.add(x)

stream = iter(lambda: random.randint(0, 100), None)
unique(stream)
<generator object unique at 0x1111f1c10>
from itertools import islice

list(islice(unique(stream), 10))
[96, 75, 25, 58, 49, 9, 64, 77, 95, 20]

Using set operations to simplify your code

Set operations like union, intersection, difference can be used to simplify your code.

Here are some examples:

For the full API, read "Set Types - set, frozenset" via Python.org.

In conclusion...

In this week's post, you learned about sets and frozen sets. You learned how to use sets for fast membership checking, and how to use sets to remove duplicate data. You also learned how to use set operations to simplify your code.

For more information about sets in Python, see these resources:

There is relationship between set theory and logic that you may have noticed. E.g. x in (A | B) is equivalent to (x in A) or (x in B). Here's a table of this relationship:

Set OperationLogical Operation
uniondisjunction (OR)
intersectionconjunction (AND)
differencenegation (NOT)
symmetric differenceexclusive or (XOR)

My challenge to you:

Read the documentation for these operations:

Then write your own versions of these operations. E.g. here's a solution for set.union:

def union(*iterables):
    """Return a new set with elements from
    the set and all others.
    """
    return {
        x
        for it in iterables
        for x in it
    }

union("hello", "there", {1, 2, 3})
{1, 2, 3, 'e', 'h', 'l', 'o', 'r', 't'}

If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!


(If you spot any errors or typos on this post, contact me via my contact page.)