18 State and Equality
In State, Change, and Testing, we introduced the notion of mutable
data. We also saw the impact it has on testing. Underlying testing is
some notion of equality: when we write a test in Pyret using
is
, we are implicitly making a statement about equality between
the two sides. Here we will examine equality in the presence of state
in more detail.
18.1 Boxes: A Canonical Mutable Structure
box consumes a value and creates a mutable box containing that value.
unbox-now consumes a box and returns the value contained in the box.
set-box-now consumes a box, a new value, and changes the box to contain the value. All subsequent unbox-nows of that box will now return the new value—
unless it is mutated again.
data Box<T>:
| box(ref v :: T)
end
fun unbox-now<T>(b :: Box<T>) -> T:
b!v
end
fun set-box-now<T>(b :: Box<T>, new-v :: T) -> Box<T>:
b!{v: new-v}
end
b!v
to extract the current value, and use
the naming convention of -now
to make clear these are stateful
operations, so the value now may not be the same as the value later.18.2 Mutation and Types
In terms of types, whenever we replace the value in a box, we want it to be type-consistent with what was previously there. Otherwise it would be very difficult to program against a box, because the type of its content would keep changing.
check:
n1 = box(1)
n2 = box(2)
set-box-now(n1, 3)
set-box-now(n2, 4)
unbox-now(n1) is 3
unbox-now(n2) is 4
end
set-box-now(n1, "hi")
, because that
would violate the type of n1
, which is Box<Number>
. We
could make this explicit by writing
n1 :: Box<Number> = box(1)
n1
being a box
of numbers does not preclude us from having a box of strings:
n3 :: Box<String> = box("hello")
This is a general rule we want to follow with mutable data: the new
value must be the same type as the old value. This gives programs a
consistent interface to program against. For instance, above, we know
that we can always perform numeric operations against the value
extracted from n1
—
18.3 Mutation and Equality
We’ve already seen [Re-Examining Equality] that equality is subtle. It’s about to become much subtler with the introduction of mutation!
b1 = box(7) |
b2 = box(7) |
b3 = b1 |
b1
and b3
are referring to the same
box, while b2
is referring to a different one. We can see this
from a memory diagram:Directory
b1
→ 1001b2
→ 1002b3
→ 1001
Heap
1001:
box(7)
1002:
box(7)
check:
b1 is-not%(identical) b2
b1 is%(identical) b3
b2 is-not%(identical) b3
end
b1
and b3
are aliases for the same box,
but neither is an alias to the box referred to by b2
. Since
identical
is transitive, it follows from the first two
tests that the third test must also pass (and thankfully, Pyret
confirms this for us!).Now, you might wonder why we have used identical
and not
equal-always
[Notations for Equality], i.e., plain old
is
.
Do Now!
Let’s try that:check: b1 is b3 b1 is b2 end
What do you see?
b1 is b3
,
passes. However, the second, b1 is b2
, fails! And the name
suggests why: the two are not guaranteed to always be
equal. That is, suppose we were to modify the box referred to by
b1
:
set-box-now(b1, 8)
b1
and b3
are aliases, their values change in lock-step
(more accurately, there is only one value—check:
unbox-now(b1) is-not unbox-now(b2)
unbox-now(b1) is unbox-now(b3)
end
18.4 Another Equality Predicate
Suppose we return to the state where we have defined the three boxes
[<three-boxes>] but not mutated b1
. That is, when
printed, all three boxes have the same value, box(7)
. We
have seen that b1
and b3
are both
equal-always
and identical
to each other. However, we
have also seen that b1
and b2
are neither of
those. This is somewhat frustrating, because there is clearly some
sense in which they are “equal”: at the moment, they contain the
same value, even if later on one of them might not.
equal-now
:
check:
b1 is%(equal-now) b2
b2 is%(equal-now) b3
end
-now
in the name reminds us that these values are equal at
the moment, but may not be equal later. Sure enough, if we add
set-box-now(b1, 8)
equal-now
tests fail: now,
they are no longer equal!==
for equal-always
and <=>
for
identical
. Similarly, equal-now
has the binary operator
=~
. You should view that as =
with hand-waving ~
:
it’s equal for now, but don’t expect it to remain so. That is, we can
rewrite the above tests as:
check:
equal-now(b1, b2) is true
(b2 =~ b3) is true
end
b1
, b2
, or b3
has had its content modified.18.5 A Hierarchy of Equality
As you might guess, the equality operators have a hierarchy of implication. That is, if one operator is true of two expressions, the other necessarily is, but not vice versa.
Do Now!
Can you work out this hierarchy of implication?
Observe that if two expressions are identical
, then they are
aliases, i.e., they are referring to one and the same
value. Therefore, the values produced by those expressions must be
equal-always
. If they are always equal, then clearly at any
given moment, they must also be equal-now
.
Even if two expressions are not identical
, they may be
equal-always
. This would never be true of mutable data (because
there is the possibility of a future mutation), but it can be true of
immutable data that have the same structure and contents. In that
case, if they are always equal, then again they must be
equal-now
.
However, the converses are not true.
If two data are equal-now
, they may not be
equal-always
: if they are mutable, a future mutation may change
the equality, as we have seen above. Similarly, two data may be
equal-always
but not be identical
, because they reside
at different heap addresses and are therefore truly different data.
In most languages, it is common to have two equality operators,
corresponding to identical
(known as reference equality)
and equal-now
(known as structural equality). Pyret is
rare in having a third operator, equal-always
. For
most programs, this is in fact the most useful equality operator: it
is not overly bothered with details of aliasing, which can be
difficult to predict; at the same time it makes decisions that stand
the test of time, thereby forming a useful basis for various
optimizations (which may not even be conscious of their temporal
assumptions). This is why is
in testing uses
equal-always
by default, and forces users to explicitly pick a
different primitive if they want it.
18.6 Space and Time Complexity
identical
always takes constant time. Indeed, some programs use
identical
precisely because they want constant-time
equality, carefully structuring their program so that values that
should be considered equal are aliases to the same value. Of course,
maintaining this programming discipline is tricky.
equal-always
and equal-now
both must traverse at least
the immutable part of data. Therefore, they take time proportional to
the smaller datum (because if the two data are of different size, they
must not be equal anyway, so there is no need to visit the extra
data). The difference is that equal-always
reduces to
identical
at references, thereby performing less computation
than equal-now
would.
18.7 What it Means to be Identical
hold-b1-value = unbox-now(b1)
set-box-now(b1, hold-b1-value + 1)
b1-id-b2 = unbox-now(b1) == unbox-now(b2)
b1-id-b3 = unbox-now(b1) == unbox-now(b3)
set-box-now(b1, hold-b1-value)
b1-id-b2
would be false
but b1-id-b3
would be true
. And notice that this would always be true when
the two expressions are identical, but not otherwise.Thus, at the end there has been no change, but by making the change we
can check which values are and aren’t aliases of others. In other
words, thisrepresents the essence of identical
.
In practice, identical
does not behave this way: it would be
too disruptive. It is also not the most efficient implementation
possible, when Pyret can simply check the memory addresses being the
same. Nevertheless, it does demonstrate the basic idea behind
identical
: two values are identical
precisely when, when
you make changes to one, you see the changes manifest on the “other”
(i.e., there is really only one value, but with potentially multiple
names for it).
18.8 Comparing Functions
We haven’t actually provided the full truth about equality because we
haven’t discussed functions. Defining equality for functions—
Because of this, most languages have tended to use approximations for
function equality, most commonly reference equality. This is, however,
a very weak approximation: even if the exact same function text in the
same environment is allocated as two different closures, these would
not be reference-equal. At least when this is done as part of the
definition of identical
, it makes sense; if other operators do
this, however, they are actively lying, which is something the
equality operators do not usually do.
There is one other approach we can take: simply disallow function
comparison. This is what Pyret does: all three equality operators
above will result in an error if you try to compare two
functions. (You can compare against just one function, however, and
you will get the answer false
.) This ensures that the
language’s comparison operators are never trusted falsely.
Pyret did have the choice of allowing reference equality for
functions inside identical
and erroring only in the other two
cases. Had it done so, however, it would have violated the chain of
implication above [A Hierarchy of Equality]. The present design
is arguably more elegant. Programmers who do want to use reference
equality on functions can simply embed the functions inside a mutable
structure like boxes.
There is one problem with erroring when comparing two functions: a
completely generic procedure that compares two arbitrary values may
error if both of the values given are functions. Because this can
cause unpredictable program failure, Pyret offers a three-valued
version of each of the above three operators (identical3
,
equal-always3
and equal-now3
), all of which return
EqualityResult
values that correspond to truth, falsity, and
ignorance (returned in the case when both arguments are
functions). Programmers can use this in place of the Boolean-valued
comparison operators if they are uncertain about the types of the
parameters.