Solving the Task Ordering problem in Python
By John Lekberg on August 01, 2020.
NOTE: Michal Porteš found a mistake with my code.
I didn't define the dict constraints
correctly.
I have fixed this.
Thanks, Michal!
This week's blog post is about solving the "Task Ordering" problem in Python. You will learn:
- How to solve the problem using brute force.
- How to use Khan's Algorithm to solve the problem more efficiently.
Problem statement
You have a list of tasks that you need to complete. E.g.
- Task A - do the dishes.
- Task B - wash your clothes.
- Task C - dry your clothes.
- Task D - go for a run.
- Task E - have dinner.
But you can't complete the tasks randomly. You have some constraints. E.g.
- Task E must be done before Task A.
- Task D must be done before Task B, Task C, and Task E.
- Task B must be done before Task C.
Your goal is to find a way to complete the tasks in order and not violate the constraints. E.g. one solution is
- Task D
- Task E
- Task B
- Task C
- Task A
How I represent the data in the problem
I represent the list of tasks as a set. E.g.
{ "A", "B", "C", "D", "E" }
I represent the constraints as a dictionary that maps each task to the set of tasks that must come after. E.g. this constraint dictionary
{
"E": {"A"},
"D": {"B", "C", "E"},
"B": {"C"},
}
represents these constraints:
- Task E must be done before Task A.
- Task D must be done before Task B, Task C, and Task E.
- Task B must be done before Task C.
I represent the solution as a sequence of tasks. E.g.
("D", "E", "B", "C", "A")
Creating a brute force solution
A simple way to use brute force to solve this problem is to generate every
permutation of tasks and check if it satisfies the constraints.
Here's a Python function, task_order_bf
, that does this:
from itertools import permutations, combinations def task_order_bf(*, tasks, constraints): """Solve the 'Task Ordering' problem using brute force. tasks -- a set of tasks. constraints -- a dictionary that maps each task to all the tasks that must come after. """ for candidate in permutations(tasks): good = all( before not in constraints[after] for before, after in combinations(candidate, 2) if after in constraints ) if good: return candidate tasks = {"A", "B", "C", "D", "E"} constraints = { "E": {"A"}, "D": {"B", "C", "E"}, "B": {"C"}, } task_order_bf(tasks=tasks, constraints=constraints)
('D', 'E', 'A', 'B', 'C')
This code generates all possible orders of the tasks (ignoring constraints) using itertools.permutations.
This code checks each candidate by using the built-in function all and itertools.combinations to make sure that for every pair of elements
( ... before ... after ... )
That Task before is allowed to be before Task after.
What's the time complexity of this solution? For n tasks:
- There are n! permutations of the n tasks.
- Checking that a permutation satisfies the constraints takes O(n2) operations.
As a result, the overall time complexity of this function is
O(n! n2)
Creating a more efficient solution
This problem of "Task Ordering" is also known also "Topological Sorting". The concepts of "tasks" and "constraints" map to vertices and edges of a directed graph.
For a directed graph with vertices V and edges E, Khan's Algorithm performs a topological sort in
O(|V| + |E|) steps
Here's a Python function, task_order_fast
, that solves the "Task Ordering"
problem using Khan's Algorithm:
from collections import defaultdict def task_order_fast(*, tasks, constraints): """Solve the 'Task Ordering' problem quickly, using Khan's Algorithm for topological sorting. tasks -- a set of tasks. constraints -- a dictionary that maps each task to all the tasks that must come after. """ return khans_algorithm(V=tasks, E=constraints) def khans_algorithm(*, V, E): """Use Khan's Algorithm to generate a topological sort of a directed graph given by vertices V and edges E. Time complexity is O(|V| + |E|). V -- set. Vertices. E -- dict. Map vertex to set of outgoing vertices. """ # Set up auxiliary data structures and functions. e_out = defaultdict(set) e_in = defaultdict(set) def link(tail, head): """Add the edge (tail, head) to e_out and e_in.""" e_out[tail].add(head) e_in[head].add(tail) def unlink(tail, head): """Remove the edge (tail, head) from e_out and e_in.""" e_out[tail].remove(head) e_in[head].remove(tail) indegree = lambda v: len(e_in[v]) outdegree = lambda v: len(e_out[v]) # "source" refers to "source vertex". source = lambda v: indegree(v) == 0 # Khan's Algorithm for tail in E: for head in E[tail]: link(tail, head) L = [] S = set(filter(source, V)) while len(S) > 0: tail = S.pop() L.append(tail) for head in tuple(e_out[tail]): unlink(tail, head) if source(head): S.add(head) # If cycles are present, then no solution exists. no_cycles = all( indegree(v) == 0 and outdegree(v) == 0 for v in V ) if no_cycles: return L tasks = {"A", "B", "C", "D", "E"} constraints = { "E": {"A"}, "D": {"B", "C", "E"}, "B": {"C"}, } task_order_fast(tasks=tasks, constraints=constraints)
['D', 'B', 'E', 'C', 'A']
For more information about Khan's algorithm, read these documents:
- "Topological Sorting" via Wikipedia
- "Lecture 20: Topo-Sort and Dijkstra’s Greedy Idea" by Rajesh Rao
- "Topological Sort" by Rashid Bin Muhammad
What's the time complexity of this algorithm? Khan's algorithm is
O(|V| + |E|)
For n tasks and m constraints,
- There are n vertices in V.
- There are m edges in E.
As a result, the overall time complexity for this function is
O(n + m)
In conclusion...
In this week's post, you learned how to solve the "Task Ordering" problem using brute force. Then you learned how to recast the problem in terms of topological sorting. This allowed you to use Khan's algorithm to create a faster solution to the "Task Ordering" problem.
Topological sorting is useful for dependency resolution in build systems (like GNU Make) and package managers (like npm).
In a mathematical sense, topological sorting is like creating a linear extension from a partial order to a total order.
My challenge to you:
A solution can't exist if the task constraints form a cycle. E.g.
- Task A comes before Task D.
- Task D comes before Task E.
- Task E comes before Task A.
As it is, the functions return None when a solution is not found.
Your goal is to modify the code to, instead of returning None, raise an Exception that describes that constraints that are in conflict. E.g.
ERROR: Unable to order tasks. Tasks "A", "D", "E" form a cycle.
If you enjoyed this week's post, share it with your friends and stay tuned for next week's post. See you then!
(If you spot any errors or typos on this post, contact me via my contact page.)