On this page:
20.1 A Running Example
20.2 Types
20.3 A First Checker
20.4 Complexity
20.5 A Fabulous Improvement
20.6 Testing

20 Detecting Cycles

    20.1 A Running Example

    20.2 Types

    20.3 A First Checker

    20.4 Complexity

    20.5 A Fabulous Improvement

    20.6 Testing

20.1 A Running Example

As you may have noticed, Pyret will check for and print cycles. For instance,

import lists as L

data Pair: p(hd, ref tl) end

p0 = p(0, 0)
p1 = p(1, 1)

p2 = p(2, 3)
p3 = p(3, 4)
p2!{tl: p3}
p3!{tl: p2}

p4 = p(4, p3)
p5 = p(5, p4)

p6 = p(6, "dummy")
p6!{tl: p6}

Do Now!

Sketch out the above pairs to make sure you see all the cycles.

So we have two that participate in no cyclic behavior (p0 and p1), two (p2 and p3 that are mutually-cyclic, one (p6) that is a self-cycle, and two (p4 and p5) that lead to a cycle.

20.2 Types

As an aside, imagine we try to type-check this program. We have to provide a type for tl, but it’s not clear what this can be: sometimes it’s a Number, and other times it’s a Pair. However, we might observe that if our goal is to create cyclic data, then we want tl to refer to a Pair or to nothing at all. That suggests that a useful type is:

data Pair: p(hd :: Number, ref tl :: Option<Pair>) end

so that we can write

p0 = p(0, none)
p1 = p(1, none)

p2 = p(2, none)
p3 = p(3, none)
p2!{tl: some(p3)}
p3!{tl: some(p2)}
p4 = p(4, some(p3))
p5 = p(5, some(p4))

p6 = p(6, none)
p6!{tl: some(p6)}

This works, but we have to deal with the Option everywhere. Since our goal is to focus on cycles, and this would become unwieldy, we ignore the typed version from now on.

20.3 A First Checker

Okay, back to the untyped version.

So let’s try to figure out whether, given a Pair, it leads to a cycle. What should the type be?

cc :: Pair -> Boolean

where cc stands for “check cycle”.

Critically, it’s important that this be a total function: i.e., it always terminates.

So let’s write the most obvious solution:

fun cc(e):
  fun loop(cur, hist):
    if is-p(cur):
      if L.member-identical(hist, cur):
        true
      else:
        loop(cur!tl, link(cur, hist))
      end
    else:
      false
    end
  end
  loop(e, empty)
end

First of all, does this even terminate? It could take a while to visit all the nodes, but a cycle demands that somewhere, we revisit a node we saw before. Since we track that, we can’t not terminate. Therefore, termination is guaranteed, and the function is total. Indeed, all these tests pass:

check:
  cc(p0) is false
  cc(p1) is false
  cc(p2) is true
  cc(p3) is true
  cc(p4) is true
  cc(p5) is true
  cc(p6) is true
end

As another aside, observe that we could have written these tests instead like

check:
  p0 violates cc
  p2 satisfies cc
end

which would be more concise, but that would also be misleading: it would suggest that cc is a desirable property, so p2 is a “good” instance and p0 is a “bad” one. However, cc is not a judgment of quality—its two responses have equal weight—so this would be confusing to a later reader.

20.4 Complexity

Now that we have determined that it terminates, we can ask for its time and space complexity. First we have to decide what we are even computing the complexity over. If the sequence is finite, then the size is clearly the size of the sequence. But if it’s infinite, we don’t want to traverse the “whole thing”: rather, we mean its finite part (excluding any repetition). So the meaningful measure in either case is the number of p nodes, i.e., the finite size. It may just be that some of these lead back to themselves, so that a naïve traversal will go on forever.

Okay, so we visit each node once. We keep track of all the nodes just in case we double back over, either until we run out of nodes or we repeat one. Therefore, the space complexity is linear in the length of the sum of the prefix (from the starting node) and the cycle. The time complexity is that but also, at each point, we have to check membership, so it’s quadratic in the length of that prefix + cycle. So: linear space, quadratic time, in the size of the prefix + cycle.

Now, some degree of linear behavior is unavoidable: we clearly have to keep going until we run out or hit a cycle, so for detecting the cycle having something be linear in the size of the prefix (get it out of the way) + length of the cycle (find the cycle) seems essential. But can we improve on this complexity? It seems unlikely: by definition, how can we check for a cycle if we don’t remember everything we’ve seen?

Our first hunch might be, “Maybe there’s another space-time tradeoff!” But it’s not so clear here. Our space is linear and time quadratic, so we may think we can flip those around. But the time can’t be less than the space! If, for instance, we had linear time and quadratic space, that wouldn’t make sense, because we’d need at least quadratic time just to fill the space. So that’s not going to work so well.

Instead, the best way to improve seems to have a better lookup data structure. We’d still take linear space—as we said, linear was unavoidable (and we can’t just be linear in the size of the cycle, because the whole point is we don’t even know we have a cycle, much less which parts are prefix and which parts cycle)—and the time complexity would hopefully reduce from quadratic to linear-times-something-sublinear.

20.5 A Fabulous Improvement

It turns out we can do a lot better! It’s called the tortoise-and-hare algorithm.

We start off with two references into the sequence, one called the tortoise and the other the hare.

At each step, the tortoise tries to advance by one node. If it cannot, we’ve run out of sequence, and we’re done. The hare, being a hare and not a tortoise, tries to advance by two nodes. Again, if it cannot, we’ve run out of sequence, and we’re done. Otherwise both advance, and check if they’re at the same place. If they are, because they started out being at distinct nodes, we’ve found a cycle! If they are not, then we iterate.

Why does this even work? In the finite case it’s clear, because the hare will run out of next nodes. We only have to think about the infinite case. There, in general, we have this kind of situation:

There is some prefix of nodes, followed by a cycle. Now, we don’t know how long the prefix is, so we don’t know how far ahead of the tortoise the hare is. Nevertheless, there is some first point at which the tortoise enters the cycle. (There must be, because the tortoise always makes progress, and the prefix can only be finite.) From this point on, we know that on each step, the relative speed of the two animals is 1. That means the hare “gains” 1 on the tortoise every step. We can see that eventually, the hare must catch up with the tortoise—or, alternatively, that the tortoise catches up with the hare!

Now let’s analyze this. The tortoise will get caught by the time it has completed one loop of the cycle. Because the tortoise moves one step at a time, the total time is the length of the prefix + length of the loop. In terms of space, however, we no longer need any history at all; we only need the current positions of the tortoise and hare. Therefore, our time complexity is linear, but the space complexity is now significantly smaller: down to constant!

Here is the code:

fun th(e):
  fun loop(tt, hr):
    if tt <=> hr:
      true
    else:
      if is-p(tt) and is-p(hr):
        new-tt = tt!tl
        int-hr = hr!tl
        if is-p(int-hr):
          new-hr = int-hr!tl
          loop(new-tt, new-hr)
        else:
          false
        end
      else:
        false
      end
    end
  end
  loop(e, e!tl)
end

20.6 Testing

While it might be tempting to write tests like

check:
  cc(p0) is false
  cc(p2) is true
end

(i.e., the same as before, but with cc replaced by ph), we should instead write them as follows:

check:
  cc(p0) is th(p0)
  cc(p1) is th(p1)
  cc(p2) is th(p2)
  cc(p3) is th(p3)
  cc(p4) is th(p4)
  cc(p5) is th(p5)
  cc(p6) is th(p6)
end

This confers two advantages. First, if we change the example, we don’t have to update two tests, only one. But the much more important reason is that we intend for pr to be an optimized version of cc. That is, we expect the two to produce the same result. We can think of cc as our clear, reference implementation. That is, this is another instance of model-based testing.

As an aside, this algorithm is not exactly what Pyret does, because we need to check for arbitrary graph-ness, not just cycles. It’s also complicated due to user-defined functions, etc.