11.2 Understanding Equality
11.2.1 Equality of Data
Now that we have the ability to mutate data, it’s worth asking what it means for two pieces of data to be equal. We’ll motivate this through a concrete example. Following the naming convention of Data Mutation and the Directory, we will write every name only once, using the upper-case name from Python, but everything we write will equally be true for Pyret.
First, consider these three statements:
a1 = Account(8603, 500)
a2 = Account(8603, 500)
a3 = Account(8603, 250)
Do Now!
Which of the above
Account
s do you consider “equal”?
The third Account
has a different balance than the first two,
so it can’t be considered equal to either of the first two. The first
two have the same contents, so arguably they can be considered equal.
Now, let’s consider the directory and heap that would result from running these three statements:
Directory
a1
→ 1120a2
→ 1121a3
→ 1122
Heap
- 1120:
Account(8603, 500)
- 1121:
Account(8603, 500)
- 1122:
Account(8603, 250)
From the perspective of the heap, each account ends up at its own
address. Those different addresses are a way in which the two values
are not the same: they have the same contents, but not the same
address. Is that relevant? To explore this, let’s associate another
name (a4
) with the same address as a2
, then change
the balance in a2
.
For now we will show just the Python version:
a1 = Account(8603, 500)
a2 = Account(8603, 500)
a3 = Account(8603, 250)
a4 = a2
# checkpoint 1
a2.balance = 800
# checkpoint 2
Directory
a1
→ 1130a2
→ 1131a3
→ 1132a4
→ 1131
Heap
- 1130:
Account(8603, 500)
- 1131:
Account(8603, 500)
- 1132:
Account(8603, 250)
a1
and a2
refer to two different
Account
s with the same contents. After checkpoint 1, those
contents are different because we modified the contents of the balance
field in a2
:
Directory
a1
→ 1130a2
→ 1131a3
→ 1132a4
→ 1131
Heap
- 1130:
Account(8603, 500)
- 1131:
Account(8603, 800)
- 1132:
Account(8603, 250)
a2
and a4
are aliases for the
same Account
. Therefore, their values change in lockstep:
asking to display the value of either one would now show an account
with a balance of 800
.Do Now!
What do you think now? Are the first two accounts equal?
11.2.2 Different Equality Operations
This sequence of examples points out that we seem to be raising three possible notions of equality:
Whether two values have the same contents. This is formally called structural equality; you can think of it as a “print equality”, namely, when displayed, do the two values look the same.
Whether two values live at the same address, i.e., there is actually only one value in memory. This is formally called reference equality. Usually, we would refer to the two values by different names (so there is the possibility that they are different), and reference equality checks whether the names are aliases. Observe that a given value always prints the same way, so any two names that have reference equality also have structural equality, but not vice versa.
Which notion of equality is “correct”? It turns out that they are valuable in different contexts. For this reason, programming languages generally provide multiple equality operations, letting the programmer indicate which kind of equality they mean in their context.
Unfortunately, the names of equality operations, and their exact meaning, vary across languages. Therefore, we will examine each of Pyret and Python separately.
11.2.2.1 Equality in Python
The ==
operator that you learned in Pyret and we carried into
Python checks for structural equality, independent of addresses:
| |
|
| |
|
However, note that this will no longer be true at checkpoint 2:
| |
|
| |
|
If we instead want to check for aliasing, we instead use an operation
called is
(not to be confused with Pyret’s is
, which
is used for writing tests):
| |
|
| |
|
This explains why a2 == a4
was true both before and
after the mutation, but a1 == a2
was no longer true
after it. The latter seems to violate a very basic meaning of
“equality”; the problem here is caused by the introduction of
mutation.
As we go forward, you’ll get more practice with when to use each kind
of equality. The ==
operator is more accepting, so it is
usually the right default. If you actually need to know whether two
expressions evaluate to the same address, you should instead use is
.
11.2.2.2 Equality in Pyret
Equality in Pyret is somewhat more detailed, because the language wants you to think harder about what is happening in your programs.
a1 = account(8603, 500)
a2 = account(8603, 500)
a3 = account(8603, 250)
a4 = a2
# checkpoint 1
a2!{balance: 800}
# checkpoint 2
In Python, we saw that a1 == a2
before the
mutation. However, in Pyret, this produces false
! Why?
Are these two values structurally equal right now?
Will these two values be structurally equal always?
By default, Pyret tends towards safer programming
practices. Therefore, the standard (structural) equality predicate,
==
, will only return true
if the two values will
always be equal. Thus:
| |
|
Because the two values are actually aliases, no matter how one
changes, the “other” will always change in the same way. Therefore,
they will always “print the same”. We can confirm that they are
aliases by using Pyret’s reference equality operator, <=>
:
| |
|
| |
|
In contrast, that guarantee does not apply to a1
and
a2
; and indeed, at checkpoint 2, we see that they are no
longer equal. Hence
| |
|
However, there is a time when a1
and a2
do print
the same, namely before checkpoint 1. Therefore, Pyret provides
another equality operator that checks whether values are equal
at the moment, =~
. If we ask this before checkpoint 1,
we get:
| |
|
But if we ask the same question at checkpoint 2, we get:
| |
|
Symbol |
| Function |
| Type |
| Meaning |
|
|
|
| Structural |
| If it returns |
|
|
|
| Structural |
| If it returns |
|
|
|
| Reference |
| Returns |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
After checkpoint 2, we no longer need to check any of the
equal-always
or identical
relationships again, because
by definition they cannot change. But we should check equal-now
again. Sure enough:
| |
|
| |
|
Therefore, in Pyret, the ==
operator is the same as
equal-always
. When data contain mutable fields, this will
always produce false
, because even if the values are
structurally equal now, it’s possible that a future
mutation will change that. This is to remind you to be careful in the
presence of mutation. In situations where we really care only about
equality at that instant, we can use =~
, i.e., equal-now
.
The examples above might suggest that only aliased values are
equal-always
. This is not true! If our data are immutable
(which is the default in the language), then if two values are
structurally equal now, they must remain structurally equal
forever. For such data, equal-always
will return true
even when they are not aliases. This is a reminder that we get
stronger guarantees about immutable data.
It is worth noting that upto this point we have used
equal-always
—==
and Pyret’s
is
in testing—equal-always
and equal-now
. Python
made a different choice, which results in “equality” having a
perhaps surprising meaning. (Python has no notion of
equal-always
, only equal-now
or =~
, which is
written as ==
, and identical
or <=>
, which is
written as is
.)