16.2 Basic Graph Traversals
As with all the data we have seen so far, to process a datum we have
to traverse it—
16.2.1 Reachability
Many uses of graphs need to address reachability: whether we can, using edges in the graph, get from one node to another. For instance, a social network might suggest as contacts all those who are reachable from existing contacts. On the Internet, traffic engineers care about whether packets can get from one machine to another. On the Web, we care about whether all public pages on a site are reachable from the home page. We will study how to compute reachability using our travel graph as a running example.
16.2.1.1 Simple Recursion
If they are the same, then clearly reachability is trivially satisfied.
If they are not, we have to iterate through the neighbors of the source node and ask whether the destination is reachable from each of those neighbors.
fun reach-1(src :: Key, dst :: Key, g :: Graph) -> Boolean:
if src == dst:
true
else:
<graph-reach-1-loop>
loop(neighbors(src, g))
end
end
src
is:
fun loop(ns):
cases (List) ns:
| empty => false
| link(f, r) =>
if reach-1(f, dst, g): true else: loop(r) end
end
end
check:
reach = reach-1
reach("was", "was", kn-cities) is true
reach("was", "chi", kn-cities) is true
reach("was", "bmg", kn-cities) is false
reach("was", "hou", kn-cities) is true
reach("was", "den", kn-cities) is true
reach("was", "saf", kn-cities) is true
end
Exercise
Which of the above examples leads to a cycle? Why?
16.2.1.2 Cleaning up the Loop
Before we continue, let’s try to improve the expression of the
loop. While the nested function above is a perfectly reasonable
definition, we can use Pyret’s for
to improve its readability.
fun ormap(fun-body, l):
cases (List) l:
| empty => false
| link(f, r) =>
if fun-body(f): true else: ormap(fun-body, r) end
end
end
for ormap(n from neighbors(src, g)):
reach-1(n, dst, g)
end
16.2.1.3 Traversal with Memory
Because we have cyclic data, we have to remember what nodes we’ve already visited and avoid traversing them again. Then, every time we begin traversing a new node, we add it to the set of nodes we’ve already started to visit so that. If we return to that node, because we can assume the graph has not changed in the meanwhile, we know that additional traversals from that node won’t make any difference to the outcome.This property is known as ☛ idempotence.
fun reach-2(src :: Key, dst :: Key, g :: Graph, visited :: List<Key>) -> Boolean:
if visited.member(src):
false
else if src == dst:
true
else:
new-visited = link(src, visited)
for ormap(n from neighbors(src, g)):
reach-2(n, dst, g, new-visited)
end
end
end
false
. (There may still be other parts of the graph to explore,
which other recursive calls will do.)Exercise
Does it matter if the first two conditions were swapped, i.e., the beginning ofreach-2
began withif src == dst: true else if visited.member(src): false
? Explain concretely with examples.
Exercise
We repeatedly talk about remembering the nodes that we have begun to visit, not the ones we’ve finished visiting. Does this distinction matter? How?
16.2.1.4 A Better Interface
reach-2
shows, we may have a better
implementation, but we’ve changed the function’s interface; now it has
a needless extra argument, which is not only a nuisance but might also
result in errors if we accidentally misuse it. Therefore, we should
clean up our definition by moving the core code to an internal
function:
fun reach-3(s :: Key, d :: Key, g :: Graph) -> Boolean:
fun reacher(src :: Key, dst :: Key, visited :: List<Key>) -> Boolean:
if visited.member(src):
false
else if src == dst:
true
else:
new-visited = link(src, visited)
for ormap(n from neighbors(src, g)):
reacher(n, dst, new-visited)
end
end
end
reacher(s, d, empty)
end
Exercise
Does this really gives us a correct implementation? In particular, does this address the problem that the
size
function above addressed? Create a test case that demonstrates the problem, and then fix it.
16.2.2 Depth- and Breadth-First Traversals
It is conventional for computer science texts to call these depth- and breadth-first search. However, searching is just a specific purpose; traversal is a general task that can be used for many purposes.
The reachability algorithm we have seen above has a special property. At every node it visits, there is usually a set of adjacent nodes at which it can continue the traversal. It has at least two choices: it can either visit each immediate neighbor first, then visit all of the neighbors’ neighbors; or it can choose a neighbor, recur, and visit the next immediate neighbor only after that visit is done. The former is known as breadth-first traversal, while the latter is depth-first traversal.
The algorithm we have designed uses a depth-first strategy: inside <graph-reach-1-loop>, we recur on the first element of the list of neighbors before we visit the second neighbor, and so on. The alternative would be to have a data structure into which we insert all the neighbors, then pull out an element at a time such that we first visit all the neighbors before their neighbors, and so on. This naturally corresponds to a queue [An Example: Queues from Lists].
Exercise
Using a queue, implement breadth-first traversal.
If we correctly check to ensure we don’t re-visit nodes, then both
breadth- and depth-first traversal will properly visit the entire
reachable graph without repetition (and hence not get into an infinite
loop). Each one traverses from a node only once, from which it
considers every single edge. Thus, if a graph has \(N\) nodes and
\(E\) edges, then a lower-bound on the complexity of traversal is
\(O([N, E \rightarrow N + E])\). We must also consider the cost of checking whether we
have already visited a node before (which is a set membership problem,
which we address elsewhere: Several Variations on Sets). Finally, we have to
consider the cost of maintaining the data structure that keeps track
of our traversal. In the case of depth-first traversal,
recursion—
This would suggest that depth-first traversal is always better than breadth-first traversal. However, breadth-first traversal has one very important and valuable property. Starting from a node \(N\), when it visits a node \(P\), count the number of edges taken to get to \(P\). Breadth-first traversal guarantees that there cannot have been a shorter path to \(P\): that is, it finds a shortest path to \(P\).
Exercise
Why “a” rather than “the” shortest path?
Exercise
Prove that breadth-first traversal finds a shortest path.