# Sets and Frozen Sets in Python

By John Lekberg on August 08, 2020.

This week's post is about using sets and frozen sets in Python. You will learn:

• What are sets and frozen sets?
• How to use sets for fast membership checking.
• How to use sets to remove duplicate data.
• How to use set operations to simplify your code.

# Why should you care about sets?

Sets are useful because membership checks are a lot faster with sets than with lists:

``````import cProfile
data_list = list(range(10_000_000))
cProfile.run('9_999_999 in data_list')
``````
``````3 function calls in 0.136 seconds
``````
``````data_set = set(range(10_000_000))
cProfile.run('9_999_999 in data_set')
``````
``````3 function calls in 0.000 seconds
``````

However, creating a set is a bit slower than creating a list:

``````import cProfile
cProfile.run('list(range(10_000_000))')
``````
``````3 function calls in 0.520 seconds
``````
``````cProfile.run('set(range(10_000_000))')
``````
``````3 function calls in 0.961 seconds
``````

# What are sets?

In Python, a set is collection of elements, based on a mathematical set. There are three ways to create a set:

• Use the built-in function set:

``````set([1, 2, 3])
``````
``````{1, 2, 3}
``````
• List the elements between `{` and `}`, comma separated:

``````{1, 2, 3}
``````
``````{1, 2, 3}
``````
• Use a set comprehension:

``````{ n ** 2 for n in range(1, 5) }
``````
``````{1, 4, 9, 16}
``````

Create an empty set with the expression `set()`:

``````set()
``````
``````set()
``````

Do not use `{}` to create an empty set. Doing this creates a dictionary:

``````type(set())
``````
``````set
``````
``````type({})
``````
``````dict
``````

Here's how sets are different than Sequences like list and tuple:

• Sets only contain hashable elements.

``````{"hello", "there"}
``````
``````{'hello', 'there'}
``````
``````{"hello", ["there"]}
``````
``````TypeError: unhashable type: 'list'
``````
• Sets can't be subscripted:

``````data_list = [1, 2, 3]
data_list
``````
``````1
``````
``````data_set = {1, 2, 3}
data_set
``````
``````TypeError: 'set' object is not subscriptable
``````

Instead you iterate over a set using, e.g., a for-loop:

``````for x in data_set:
print(x)
``````
``````1
2
3
``````
• Sets never contain duplicates:

``````{1, 1, 1, 2, 2, 2, 3}
``````
``````{1, 2, 3}
``````
• Sets don't care about what order the elements are in:

``````[1, 2, 3] == [3, 2, 1]
``````
``````False
``````
``````{1, 2, 3} == {3, 2, 1}
``````
``````True
``````

For the full API, read "Set Types - set, frozenset" via Python.org.

# What are frozen sets?

Frozen sets are immutable, hashable sets. Create them with the built-in function frozenset:

``````frozenset([1, 2, 3])
``````
``````frozenset({1, 2, 3})
``````

Because frozen sets are hashable, they can be contained in other sets:

``````{{1}, {2, 3}}
``````
``````TypeError: unhashable type: 'set'
``````
``````{frozenset(), frozenset([2, 3])}
``````
``````{frozenset({1}), frozenset({2, 3})}
``````

Because frozen sets are immutable, elements cannot be added or removed:

``````data_set = {1, 2, 3}
data_set
``````
``````{1, 2, 3, 5}
``````
``````data_fset = frozenset([1, 2, 3])
``````
``````AttributeError: 'frozenset' object has no attribute 'add'
``````

Unlike lists and tuples, sets and frozen sets can be compared with each other:

``````[1, 2, 3] == (1, 2, 3)
``````
``````False
``````
``````{1, 2, 3} == frozenset({1, 2, 3})
``````
``````True
``````

And unlike lists and tuples, sets and frozen sets can be combined with set operations:

``````[1, 2, 3] + (1, 4, 5)
``````
``````TypeError: can only concatenate list (not "tuple") to list
``````
``````{1, 2, 3} | frozenset([1, 4, 5])
``````
``````{1, 2, 3, 4, 5}
``````

When sets and frozen sets are combined, the type of the result is the type of the left operand:

``````type(set() | frozenset())
``````
``````set
``````
``````type(frozenset() | set())
``````
``````frozenset
``````

For the full API, read "Set Types - set, frozenset" via Python.org.

# Using sets for fast membership checks

Sets use hash tables to store their elements. This means that the time complexity of membership checks is O(1). With a list, membership checks are a linear search and take O(n) time. E.g.

• Compare 16 seconds:

``````import cProfile

domain = range(50_000)

def example_list():
data_list = list(domain)
for i in domain:
assert i in data_list

cProfile.run('example_list()')
``````
``````4 function calls in 16.138 seconds
``````
• ... to 0.007 seconds:

``````def example_set():
data_set = set(domain)
for i in domain:
assert i in data_set

cProfile.run('example_set()')
``````
``````4 function calls in 0.007 seconds
``````

Keep this in mind when writing code that does repeated membership checks.

# Using sets to remove duplicates

Here's a simple way to remove duplicates from a list (or any other iterable):

``````data_list = [1, 1, 1, 2, 2, 2, 1, 1, 3, 2, 1, 1, 3, 1]

list(set(data_list))
``````
``````[1, 2, 3]
``````

If I'm dealing with generators and potentially infinite data, this technique doesn't work:

``````import random

stream = iter(lambda: random.randint(0, 100), None)
list(set(stream))
``````
``````(this code never finishes)
``````

In this case, I usually create a generator function that I call `unique`:

``````def unique(it):
"""Remove duplicates from an iterable using a set."""
past = set()
for x in it:
if x not in past:
yield x

stream = iter(lambda: random.randint(0, 100), None)
unique(stream)
``````
``````<generator object unique at 0x1111f1c10>
``````
``````from itertools import islice

list(islice(unique(stream), 10))
``````
``````[96, 75, 25, 58, 49, 9, 64, 77, 95, 20]
``````

# Using set operations to simplify your code

Set operations like union, intersection, difference can be used to simplify your code.

Here are some examples:

• Here's code that combines a set of teachers and teacher assistants to create a set of employees:

``````teacher = {"T.Karen", "T.Robert"}
teacher_assistant = {"S.Susan", "S.John"}

employee = set()

for person in teacher:
for person in teacher_assistant:

employee
``````
``````{'S.John', 'S.Susan', 'T.Karen', 'T.Robert'}
``````

Simplify this code by using set.union (the `|` operator):

``````employee = teacher | teacher_assistant
employee
``````
``````{'S.John', 'S.Susan', 'T.Karen', 'T.Robert'}
``````
• Here's code that combines a set of students and a set of employees to create a set of student-employees:

``````student = {"S.Bobby", "S.Susan", "S.John"}
employee = {"S.John", "S.Susan", "T.Karen", "T.Robert"}

student_employee = set()

for person in student:
if person in employee:

student_employee
``````
``````{'S.John', 'S.Susan'}
``````

Simplify this code by using set.intersection (the `&` operator):

``````student_employee = student & employee
student_employee
``````
``````{'S.John', 'S.Susan'}
``````
• Here's code that combines a set of students and a set of employees to create a set of students that are not employees:

``````student = {"S.Bobby", "S.Susan", "S.John"}
employee = {"S.John", "S.Susan", "T.Karen", "T.Robert"}

student_only = set()

for person in student:
if person not in employee:

student_only
``````
``````{'S.Bobby'}
``````

Simplify this code by using set.difference (the `-` operator):

``````student_only = student - employee
student_only
``````
``````{'S.Bobby'}
``````

For the full API, read "Set Types - set, frozenset" via Python.org.

# In conclusion...

In this week's post, you learned about sets and frozen sets. You learned how to use sets for fast membership checking, and how to use sets to remove duplicate data. You also learned how to use set operations to simplify your code.

There is relationship between set theory and logic that you may have noticed. E.g. `x in (A | B)` is equivalent to `(x in A) or (x in B)`. Here's a table of this relationship:

Set OperationLogical Operation
uniondisjunction (OR)
intersectionconjunction (AND)
differencenegation (NOT)
symmetric differenceexclusive or (XOR)

My challenge to you:

Read the documentation for these operations:

Then write your own versions of these operations. E.g. here's a solution for `set.union`:

``````def union(*iterables):
"""Return a new set with elements from
the set and all others.
"""
return {
x
for it in iterables
for x in it
}

union("hello", "there", {1, 2, 3})
``````
``````{1, 2, 3, 'e', 'h', 'l', 'o', 'r', 't'}
``````

If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!

(If you spot any errors or typos on this post, contact me via my contact page.)