16.2 Basic Graph Traversals
As with all the data we have seen so far, to process a datum we have
to traverse it—
Many uses of graphs need to address reachability: whether we can, using edges in the graph, get from one node to another. For instance, a social network might suggest as contacts all those who are reachable from existing contacts. On the Internet, traffic engineers care about whether packets can get from one machine to another. On the Web, we care about whether all public pages on a site are reachable from the home page. We will study how to compute reachability using our travel graph as a running example.
22.214.171.124 Simple Recursion
If they are the same, then clearly reachability is trivially satisfied.
If they are not, we have to iterate through the neighbors of the source node and ask whether the destination is reachable from each of those neighbors.
fun reach-1(src :: Key, dst :: Key, g :: Graph) -> Boolean: if src == dst: true else: <graph-reach-1-loop> loop(neighbors(src, g)) end end
fun loop(ns): cases (List) ns: | empty => false | link(f, r) => if reach-1(f, dst, g): true else: loop(r) end end end
check: reach = reach-1 reach("was", "was", kn-cities) is true reach("was", "chi", kn-cities) is true reach("was", "bmg", kn-cities) is false reach("was", "hou", kn-cities) is true reach("was", "den", kn-cities) is true reach("was", "saf", kn-cities) is true end
Which of the above examples leads to a cycle? Why?
126.96.36.199 Cleaning up the Loop
Before we continue, let’s try to improve the expression of the
loop. While the nested function above is a perfectly reasonable
definition, we can use Pyret’s
for to improve its readability.
fun ormap(fun-body, l): cases (List) l: | empty => false | link(f, r) => if fun-body(f): true else: ormap(fun-body, r) end end end
for ormap(n from neighbors(src, g)): reach-1(n, dst, g) end
188.8.131.52 Traversal with Memory
Because we have cyclic data, we have to remember what nodes we’ve already visited and avoid traversing them again. Then, every time we begin traversing a new node, we add it to the set of nodes we’ve already started to visit so that. If we return to that node, because we can assume the graph has not changed in the meanwhile, we know that additional traversals from that node won’t make any difference to the outcome.This property is known as ☛ idempotence.
fun reach-2(src :: Key, dst :: Key, g :: Graph, visited :: List<Key>) -> Boolean: if visited.member(src): false else if src == dst: true else: new-visited = link(src, visited) for ormap(n from neighbors(src, g)): reach-2(n, dst, g, new-visited) end end end
false. (There may still be other parts of the graph to explore, which other recursive calls will do.)
ExerciseDoes it matter if the first two conditions were swapped, i.e., the beginning of
if src == dst: true else if visited.member(src): false? Explain concretely with examples.
We repeatedly talk about remembering the nodes that we have begun to visit, not the ones we’ve finished visiting. Does this distinction matter? How?
184.108.40.206 A Better Interface
reach-2shows, we may have a better implementation, but we’ve changed the function’s interface; now it has a needless extra argument, which is not only a nuisance but might also result in errors if we accidentally misuse it. Therefore, we should clean up our definition by moving the core code to an internal function:
fun reach-3(s :: Key, d :: Key, g :: Graph) -> Boolean: fun reacher(src :: Key, dst :: Key, visited :: List<Key>) -> Boolean: if visited.member(src): false else if src == dst: true else: new-visited = link(src, visited) for ormap(n from neighbors(src, g)): reacher(n, dst, new-visited) end end end reacher(s, d, empty) end
Does this really gives us a correct implementation? In particular, does this address the problem that the
sizefunction above addressed? Create a test case that demonstrates the problem, and then fix it.
16.2.2 Depth- and Breadth-First Traversals
It is conventional for computer science texts to call these depth- and breadth-first search. However, searching is just a specific purpose; traversal is a general task that can be used for many purposes.
The reachability algorithm we have seen above has a special property. At every node it visits, there is usually a set of adjacent nodes at which it can continue the traversal. It has at least two choices: it can either visit each immediate neighbor first, then visit all of the neighbors’ neighbors; or it can choose a neighbor, recur, and visit the next immediate neighbor only after that visit is done. The former is known as breadth-first traversal, while the latter is depth-first traversal.
The algorithm we have designed uses a depth-first strategy: inside <graph-reach-1-loop>, we recur on the first element of the list of neighbors before we visit the second neighbor, and so on. The alternative would be to have a data structure into which we insert all the neighbors, then pull out an element at a time such that we first visit all the neighbors before their neighbors, and so on. This naturally corresponds to a queue [An Example: Queues from Lists].
Using a queue, implement breadth-first traversal.
If we correctly check to ensure we don’t re-visit nodes, then both
breadth- and depth-first traversal will properly visit the entire
reachable graph without repetition (and hence not get into an infinite
loop). Each one traverses from a node only once, from which it
considers every single edge. Thus, if a graph has \(N\) nodes and
\(E\) edges, then a lower-bound on the complexity of traversal is
\(O([N, E \rightarrow N + E])\). We must also consider the cost of checking whether we
have already visited a node before (which is a set membership problem,
which we address elsewhere: Several Variations on Sets). Finally, we have to
consider the cost of maintaining the data structure that keeps track
of our traversal. In the case of depth-first traversal,
This would suggest that depth-first traversal is always better than breadth-first traversal. However, breadth-first traversal has one very important and valuable property. Starting from a node \(N\), when it visits a node \(P\), count the number of edges taken to get to \(P\). Breadth-first traversal guarantees that there cannot have been a shorter path to \(P\): that is, it finds a shortest path to \(P\).
Why “a” rather than “the” shortest path?
Prove that breadth-first traversal finds a shortest path.