Solving the Permutation Rank problem using Python
By John Lekberg on March 04, 2020.
This week's post is about solving an interview problem: the "Permutation Rank" problem. You will learn:
- How to solve this problem using a brute force algorithm.
- How to analyze the time complexity of the brute force algorithm.
- How to use combinatorics to develop an efficient algorithm.
Problem statement
Consider all permutations of the string "ERDOS". (E.g. "DESRO" is a permutation.) Order the permutations lexicographically, beginning with "DEORS" and ending with "SROED". What is the position of "ERDOS"? ("DEORS" is at position 1.)
A brute force solution
A brute force solution of the "Permutation Rank" problem will:
- Generate all permutations of "ERDOS".
- Order the permutations lexicographically.
- Find the position of "ERDOS".
"ERDOS" has 5 characters, so there 120 (5!) permutations. 120 permutations is a small enough state space that the brute force algorithm is feasible. Here's a Python implementation:
import itertools def permutation_position_brute_force(word): ordered_perms = itertools.permutations(sorted(word)) goal_perm = tuple(word) for position, perm in enumerate(ordered_perms, start=1): if perm == goal_perm: return position permutation_position_brute_force("ERDOS")
37
This code uses the built-in function sorted to generate the first permutation in the lexicographic ordering:
sorted("ERDOS")
['D', 'E', 'O', 'R', 'S']
(Don't worry that sorted
returns a list and not a string.
This won't be a problem for the algorithm.)
Then, itertools.permutations generates all permutations:
list(itertools.permutations(sorted("ERDOS")))
[('D', 'E', 'O', 'R', 'S'),
('D', 'E', 'O', 'S', 'R'),
('D', 'E', 'R', 'O', 'S'),
('D', 'E', 'R', 'S', 'O'),
...
('S', 'R', 'E', 'O', 'D'),
('S', 'R', 'O', 'D', 'E'),
('S', 'R', 'O', 'E', 'D')]
If the input is sorted, then itertools.permutations
will generate the
permutation tuples in lexicographic order.
(Read the documentation of itertools.permutations
for more information.)
itertools.permutations
generates tuples like ('D', 'E', 'O', 'R', 'S')
instead of strings like 'DEORS'
.
I have two ways to deal with this:
- I can examine each permutation tuple and use
"".join
to turn the tuple into a string. This is one operation for each permutation: n! operations. - I can turn the goal string
"ERDOS"
into a goal tuple("E", "R", "D", "O", "S")
. This is one operation.
I choose to turn the goal string, word
, into a goal tuple, goal_perm
,
because that uses fewer operations.
I use enumerate to associate a position with each
permutation, starting at position 1
.
How well does the brute force solution scale to larger strings with unique characters?
- For an n-character word, there are n! permutations.
So the time complexity of permutation_position_brute_force
is
O(n!)
This means that if running permutation_position_brute_force
on a 10-character
word took 1 second, then running the algorithm on a 20-character word would
take over 21,000 years!
There's got to be a quicker way to solve this.
Counting the number of permutations before the goal
The brute force algorithms counts how many permutations appear before the goal permutation by counting each permutation, one by one. But there is a way to count how many permutations appear before the goal permutation without counting permutations one-by-one.
Here are patterns of the permutations that come before ERDOS
:
D****
. Strings likeDEORS
andDRESO
. There are 4 unpicked characters in the patternD****
, so there are 24 (4!) permutations ofERDOS
that look likeD****
.ED***
. Strings likeEDORS
andEDSOR
. There are 3 unpicked characters in the patternED***
, so there are 6 (3!) permutations ofERDOS
that look likeED***
.EO***
. Strings likeEODSR
andEOSDR
. There are 3 unpicked characters in the patternEO***
, so there are 6 (3!) permutations ofERDOS
that look likeEO***
.
Because all the patterns are mutually exclusive, I
can sum their counts to get the total number of permutations before ERDOS
:
24 + 6 + 6 = 36
There are 36 permutations before ERDOS
, so ERDOS
is at position 37.
How do I know that the three patterns D****
, ED***
, and EO***
are the
only three patterns of permutations before ERDOS
?
- I look at the lexicographic ordering of the characters of
ERDOS
. The ordering isDEORS
. - I look at the goal permutation,
ERDOS
. - I start with the first character,
E
. In the lexicographic ordering,D
comes beforeE
, so I knowD****
is a pattern that only counts some permutations that come beforeERDOS
. (D****
definitely doesn't match a permutation that comes afterERDOS
.) - I continue to the second character,
R
. In the lexicographic ordering,D
,E
, andO
come beforeR
. ButE
is already chosen as the first character, so onlyED***
andEO***
are patterns that count. - I continue to the third letter,
D
. In the lexicographic ordering, nothing comes beforeD
, so I continue to the fourth character. - I keep creating patterns of permutations that come before
ERDOS
by examining the fourth character, and then examining the fifth character. - I end up with the patterns
D****
,ED***
, andEO***
.
Here's a Python function, permutation_position_quick
, that implements the
above procedure:
import math def permutation_position_quick(word): N = len(word) first = sorted(word) chars_before = { first[i]: first[:i] for i in range(N) } count = 0 for i in range(N): possible_chars = set(chars_before[word[i]]) possible_chars.difference_update(word[:i]) n_wildcards = N - (i + 1) count += ( len(possible_chars) * math.factorial(n_wildcards) ) return 1 + count permutation_position_quick("ERDOS")
37
permutation_position_quick
uses set objects to easily
remove patterns that would reuse already chosen characters (e.g. invalid
patterns like EE***
).
I confirm that permutation_position_quick
produces the same results as permutation_position_brute_force
:
p1 = permutation_position_quick p2 = permutation_position_brute_force all( p1(word) == p2(word) for word in itertools.permutations("ERDOS") )
True
What is the time complexity of permutation_position_quick
?
- The algorithm loops a number of times linear in the number of characters in the string. There are O(n) loop iterations.
- Each loop iteration creates a set of O(n) characters and removes a subset of O(n) characters. The time complexity of set.difference_update in this case is O(n).
Because there are O(n) loop iterations and each iteration takes
O(n) operations, the overall time complexity of
permutation_position_quick
is
O(n2)
This means that if running permutation_position_quick
on a 10-character word
took 1 second, then running the algorithm on a 20-character word would take 4
seconds.
(Compare this with permutation_position_brute_force
taking over 21,000
years.)
In conclusion...
In this post you learned how to use combinatorics to efficiently solve the "Permutation Rank" problem. The key idea is that you don't need to count every permutation one-by-one. To learn more about combinatorics, check out these documents:
- "The Mathematics of 2048: Counting States with Combinatorics" by John Lees-Miller.
- "Combinatorics and Probabilities" by GĂ©rard P. Michon, Ph.D.
My challenge to you:
This post covered solving the "Permutation Rank" problem for words with unique characters, like "ERDOS". How would you solve the "Permutation Rank" problem for words with duplicated characters, like "TERRENCE"?
If you enjoyed this post, let me know. Share this with your friends and stay tuned for next week's post. See you then!
(If you spot any errors or typos on this post, contact me via my contact page.)