### 18State and Equality

In State, Change, and Testing, we introduced the notion of mutable data. We also saw the impact it has on testing. Underlying testing is some notion of equality: when we write a test in Pyret using is, we are implicitly making a statement about equality between the two sides. Here we will examine equality in the presence of state in more detail.

#### 18.1Boxes: A Canonical Mutable Structure

In State, Change, and Testing we saw a motivating example using bank accounts. To focus our study of equality, it can be convenient to have an even simpler mutable data structure, called a box (which you will find in other programming languages as well). A box has only one field—the value being boxed—and supports just three operations:
1. box consumes a value and creates a mutable box containing that value.

2. unbox-now consumes a box and returns the value contained in the box.

3. set-box-now consumes a box, a new value, and changes the box to contain the value. All subsequent unbox-nows of that box will now return the new value—unless it is mutated again.

Here are the corresponding definitions in Pyret:

data Box<T>:
| box(ref v :: T)
end

fun unbox-now<T>(b :: Box<T>) -> T:
b!v
end

fun set-box-now<T>(b :: Box<T>, new-v :: T) -> Box<T>:
b!{v: new-v}
end

Observe that we use b!v to extract the current value, and use the naming convention of -now to make clear these are stateful operations, so the value now may not be the same as the value later.

#### 18.2Mutation and Types

In terms of types, whenever we replace the value in a box, we want it to be type-consistent with what was previously there. Otherwise it would be very difficult to program against a box, because the type of its content would keep changing.

These definitions obey the following tests:

check:
n1 = box(1)
n2 = box(2)
set-box-now(n1, 3)
set-box-now(n2, 4)
unbox-now(n1) is 3
unbox-now(n2) is 4
end

However, we cannot write set-box-now(n1, "hi"), because that would violate the type of n1, which is Box<Number>. We could make this explicit by writing

n1 :: Box<Number> = box(1)

if we wanted to be explicit. However, note that n1 being a box of numbers does not preclude us from having a box of strings:

n3 :: Box<String> = box("hello")

or indeed a box of any other type. We just need its type to remain consistent, whatever that type is.

This is a general rule we want to follow with mutable data: the new value must be the same type as the old value. This gives programs a consistent interface to program against. For instance, above, we know that we can always perform numeric operations against the value extracted from n1there is no danger that it will suddenly produce a string. This discipline can either be enforced by a system of annotations, or has to be manually maintained by the programmer.

#### 18.3Mutation and Equality

We’ve already seen [Re-Examining Equality] that equality is subtle. It’s about to become much subtler with the introduction of mutation!

As a running example, we’ll work with:
<three-boxes> ::=
 b1 = box(7) b2 = box(7) b3 = b1
Observe that b1 and b3 are referring to the same box, while b2 is referring to a different one. We can see this from a memory diagram:

Directory

• b1

1001

• b2

1002

• b3

1001

Heap

• 1001: box(7)

• 1002: box(7)

We can confirm this using the following tests:

check:
b1 is-not%(identical) b2
b1 is%(identical) b3
b2 is-not%(identical) b3
end

In other words, b1 and b3 are aliases for the same box, but neither is an alias to the box referred to by b2. Since identical is transitive, it follows from the first two tests that the third test must also pass (and thankfully, Pyret confirms this for us!).

Now, you might wonder why we have used identical and not equal-always [Notations for Equality], i.e., plain old is.

Do Now!

Let’s try that:

check:
b1 is b3
b1 is b2
end

What do you see?

It’s unsurprising that the first test, b1 is b3, passes. However, the second, b1 is b2, fails! And the name suggests why: the two are not guaranteed to always be equal. That is, suppose we were to modify the box referred to by b1:

set-box-now(b1, 8)

Sure enough, the values in the boxes are not the same, but because b1 and b3 are aliases, their values change in lock-step (more accurately, there is only one value—the box at 1001):

check:
unbox-now(b1) is-not unbox-now(b2)
unbox-now(b1) is unbox-now(b3)
end

#### 18.4Another Equality Predicate

Suppose we return to the state where we have defined the three boxes [<three-boxes>] but not mutated b1. That is, when printed, all three boxes have the same value, box(7). We have seen that b1 and b3 are both equal-always and identical to each other. However, we have also seen that b1 and b2 are neither of those. This is somewhat frustrating, because there is clearly some sense in which they are “equal”: at the moment, they contain the same value, even if later on one of them might not.

Therefore, Pyret offers a third equality predicate that is designed for just these situations: it is (as you might guess) called equal-now:

check:
b1 is%(equal-now) b2
b2 is%(equal-now) b3
end

The -now in the name reminds us that these values are equal at the moment, but may not be equal later. Sure enough, if we add

set-box-now(b1, 8)

back into the program, the above equal-now tests fail: now, they are no longer equal!

Recall that the other two equality predicates have an binary operator notation: == for equal-always and <=> for identical. Similarly, equal-now has the binary operator =~. You should view that as = with hand-waving ~: it’s equal for now, but don’t expect it to remain so. That is, we can rewrite the above tests as:

check:
equal-now(b1, b2) is true
(b2 =~ b3) is true
end

Whether they pass, of course, depends on the state of the program: whether b1, b2, or b3 has had its content modified.

#### 18.5A Hierarchy of Equality

As you might guess, the equality operators have a hierarchy of implication. That is, if one operator is true of two expressions, the other necessarily is, but not vice versa.

Do Now!

Can you work out this hierarchy of implication?

Observe that if two expressions are identical, then they are aliases, i.e., they are referring to one and the same value. Therefore, the values produced by those expressions must be equal-always. If they are always equal, then clearly at any given moment, they must also be equal-now.

Even if two expressions are not identical, they may be equal-always. This would never be true of mutable data (because there is the possibility of a future mutation), but it can be true of immutable data that have the same structure and contents. In that case, if they are always equal, then again they must be equal-now.

However, the converses are not true.

If two data are equal-now, they may not be equal-always: if they are mutable, a future mutation may change the equality, as we have seen above. Similarly, two data may be equal-always but not be identical, because they reside at different heap addresses and are therefore truly different data.

In most languages, it is common to have two equality operators, corresponding to identical (known as reference equality) and equal-now (known as structural equality). Pyret is rare in having a third operator, equal-always. For most programs, this is in fact the most useful equality operator: it is not overly bothered with details of aliasing, which can be difficult to predict; at the same time it makes decisions that stand the test of time, thereby forming a useful basis for various optimizations (which may not even be conscious of their temporal assumptions). This is why is in testing uses equal-always by default, and forces users to explicitly pick a different primitive if they want it.

#### 18.6Space and Time Complexity

identical always takes constant time. Indeed, some programs use identical precisely because they want constant-time equality, carefully structuring their program so that values that should be considered equal are aliases to the same value. Of course, maintaining this programming discipline is tricky.

equal-always and equal-now both must traverse at least the immutable part of data. Therefore, they take time proportional to the smaller datum (because if the two data are of different size, they must not be equal anyway, so there is no need to visit the extra data). The difference is that equal-always reduces to identical at references, thereby performing less computation than equal-now would.

#### 18.7What it Means to be Identical

Return for a moment to the state where we have just defined the three boxes [<three-boxes>]. We could have written the following:

hold-b1-value = unbox-now(b1)
set-box-now(b1, hold-b1-value + 1)

Now, we can compare the contents of the various boxes:

b1-id-b2 = unbox-now(b1) == unbox-now(b2)
b1-id-b3 = unbox-now(b1) == unbox-now(b3)

And at the end of performing comparisons, we can restore them:

set-box-now(b1, hold-b1-value)

Observe that b1-id-b2 would be false but b1-id-b3 would be true. And notice that this would always be true when the two expressions are identical, but not otherwise.

Thus, at the end there has been no change, but by making the change we can check which values are and aren’t aliases of others. In other words, thisrepresents the essence of identical.

In practice, identical does not behave this way: it would be too disruptive. It is also not the most efficient implementation possible, when Pyret can simply check the memory addresses being the same. Nevertheless, it does demonstrate the basic idea behind identical: two values are identical precisely when, when you make changes to one, you see the changes manifest on the “other” (i.e., there is really only one value, but with potentially multiple names for it).

#### 18.8Comparing Functions

We haven’t actually provided the full truth about equality because we haven’t discussed functions. Defining equality for functions—especially extensional equality, namely whether two functions have the same graph, i.e., for each input produce the same output—is complicated (a euphemism for impossible) due to the Halting Problem.

Because of this, most languages have tended to use approximations for function equality, most commonly reference equality. This is, however, a very weak approximation: even if the exact same function text in the same environment is allocated as two different closures, these would not be reference-equal. At least when this is done as part of the definition of identical, it makes sense; if other operators do this, however, they are actively lying, which is something the equality operators do not usually do.

There is one other approach we can take: simply disallow function comparison. This is what Pyret does: all three equality operators above will result in an error if you try to compare two functions. (You can compare against just one function, however, and you will get the answer false.) This ensures that the language’s comparison operators are never trusted falsely.

Pyret did have the choice of allowing reference equality for functions inside identical and erroring only in the other two cases. Had it done so, however, it would have violated the chain of implication above [A Hierarchy of Equality]. The present design is arguably more elegant. Programmers who do want to use reference equality on functions can simply embed the functions inside a mutable structure like boxes.

There is one problem with erroring when comparing two functions: a completely generic procedure that compares two arbitrary values may error if both of the values given are functions. Because this can cause unpredictable program failure, Pyret offers a three-valued version of each of the above three operators (identical3, equal-always3 and equal-now3), all of which return EqualityResult values that correspond to truth, falsity, and ignorance (returned in the case when both arguments are functions). Programmers can use this in place of the Boolean-valued comparison operators if they are uncertain about the types of the parameters.