19 Recursion and Cycles from Mutation

Earlier [From Acyclicity to Cycles], we saw the difficulty of constructing cyclic data, and saw how we could address this problem using state [Cyclic Data]. Let us now return to the earlier example of creating a cyclic list of alternating colors. We had tried to write:

web-colors = link("white", link("grey", web-colors))

which, as we noted, does not pass muster because web-colors is not bound on the right of the =. (Why not? Because otherwise, if we try to substitute web-colors on the right, we would end up in an infinite regress.)

Something about this should make you a little suspicious: we have been able to write recursive functions all the time, without difficulty. Why are they different? For two reasons:

The first reason is the fact that we’re defining a function. A function’s body is not evaluated right away—only when we apply it—so the language can wait for the body to finish being defined. (We’ll see what this might mean in a moment.)
The second reason isn’t actually a reason: function definitions actually are special. But we are about to expose what’s so special about them—it’s the use of a box! [Boxes: A Canonical Mutable Structure]—so that any definition can avail of it.

Returning to our example above, recall that we can’t make up our list using links, because we want the list to never terminate. Therefore, let us first define a new datatype to hold an cyclic list:

data Pair: p(hd, tl) end

You should think of this as analogous to a list, where hd is the first element and tl is the rest.

Observe that we have carefully avoided writing type definitions for the fields; we will instead try to figure them out as we go along. Also, however, this definition as written cannot work.

Do Now!
Do you see why not?

Let’s decompose the intended infinite list into two pieces: lists that begin with white and ones that begin with grey. What follows white? A grey list. What follows grey? A white list. It is clear we can’t write down these two definitions because one of them must precede the other, but each one depends on the other. (This is the same problem as trying to write a single definition above.)

19.1 Partial Definitions

What we need to instead do is to partially define each list, and then complete the definition using the other one. However, that is impossible using the above definition, because we cannot change anything once it is constructed. Instead, therefore, we need:

data Pair: p(hd, ref tl) end

Note that this datatype lacks a base case, which should remind you of definitions we saw in Streams From Functions.

Using this, we can define:

white-pair = p("white", "dummy")
grey-pair = p("grey", "dummy")

Each of these definitions is quite useless by itself, but they each represent what we want, and they have a mutable field for the rest, currently holding a dummy value. Therefore, it’s clear what we must do next: update the mutable field.

white-pair!{tl: grey-pair}
grey-pair!{tl: white-pair}

Because we have ordained that our colors must alternate beginning with white, this rounds up our definition:

web-colors = white-pair

If we ask Pyret to inspect the value of web-colors, we notice that it employs an algorithm to prevent traversing infinite objects. You can learn more about how that works separately [Detecting Cycles].

We can define a helper function, take, a variation of which we saw for streams [Streams From Functions], to inspect a finite prefix of an infinite list:

fun ctake(n :: Number, il :: Pair) -> List:
  if n == 0:
    empty
  else:
    link(il.hd, ctake(n - 1, il!tl))
  end
end

such that:

check:
  ctake(4, web-colors) is
  [list: "white", "grey", "white", "grey"]
end

19.2 Recursive Functions

Based on this, we can now understand recursive functions. Consider a very simple example, such as this:

fun sum(n):
  if n > 0:
    n + sum(n - 1)
  else:
    0
  end
end

We might like to think this is equivalent to:

sum =
  lam(n):
    if n > 0:
      n + sum(n - 1)
    else:
      0
    end
  end

but if you enter this, Pyret will complain that sum is not bound. We must instead write

rec sum =
  lam(n):
    if n > 0:
      n + sum(n - 1)
    else:
      0
    end
  end

What do you think rec does? It binds sum to a box initially containing a dummy value; it then defines the function in an environment where the name is bound, unboxing the use of the name; and finally, it replaces the box’s content with the defined function, following the same pattern we saw earlier for web-colors.

19.3 Premature Evaluation

Observe that the above description reveals that there is a time between the creation of the name and the assignment of a value to it. Can this intermediate state be observed? It sure can!

There are generally three solutions to this problem:

Make sure the value is sufficiently obscure so that it can never be used in a meaningful context. This means values like 0 are especially bad, and indeed most common datatypes should be shunned. Indeed, there is no value already in use that can be used here that might not be confusing in some context.
The language might create a new type of value just for use here. For instance, imagine this definition of CList:
```
data CList:
  | undef
  | clink(v, ref r)
end
```
undef appears to be a “base case”, thus making CList very similar to List. In truth, however, the undef is present only until the first mutation happens, after which it will never again be present: the intent is that r only contain a reference to other clinks.
The undef value can now be used by the language to check for premature uses of a cyclic list. However, while this is technically feasible, it imposes a run-time penalty. Therefore, this check is usually only performed by languages focused on teaching; professional programmers are assumed to be able to manage the consequences of such premature use by themselves.
Allow the recursion constructor to be used only in the case of binding functions, and then make sure that the right-hand side of the binding is syntactically a function. This solution precludes some reasonable programs, but is certainly safe.

19.4 Cyclic Lists Versus Streams

The color list example above is, as we have noted, very reminiscent of stream examples. What is the relationship between the two ways of defining infinite data?

Cyclic lists have on their side simplicity. The pattern of definition used above can actually be encapsulated into a language construct, so programmers do not need to wrestle with mutable fields (as above) or thunks (as streams demand). This simplicity, however, comes at a price: cyclic lists can only represent strictly repeating data, i.e., you cannot define nats or fibs as cyclic lists. In contrast, the function abstraction in a stream makes it generative: each invocation can create a truly novel datum (such as the next natural or Fibonacci number). Therefore, it is straightforward to implement cyclic lists as streams, but not vice versa.

contents ← prev up next →

I	Introduction
II	Introduction to Programming
III	From Pyret to Python
IV	Programming With State
V	Algorithm Analysis
VI	Data Structures with Analysis
VII	Advanced Topics
VIII	Interactive Programs
IX	Appendices

18	State and Equality
19	Recursion and Cycles from Mutation
20	Detecting Cycles
21	Avoiding Recomputation by Remembering Answers
22	Partial Domains
23	Staging
24	Factoring Numbers
25	Deconstructing Loops

19.1	Partial Definitions
19.2	Recursive Functions
19.3	Premature Evaluation
19.4	Cyclic Lists Versus Streams