11.2 Understanding Equality

contents ← prev up next →

11.2 Understanding Equality

11.2.1 Equality of Data

11.2.2 Different Equality Operations

11.2.2.1 Equality in Python

11.2.2.2 Equality in Pyret

11.2.1 Equality of Data

Now that we have the ability to mutate data, it’s worth asking what it means for two pieces of data to be equal. We’ll motivate this through a concrete example. Following the naming convention of Data Mutation and the Directory, we will write every name only once, using the upper-case name from Python, but everything we write will equally be true for Pyret.

First, consider these three statements:

a1 = Account(8603, 500)
a2 = Account(8603, 500)
a3 = Account(8603, 250)

Do Now!
Which of the above Accounts do you consider “equal”?

The third Account has a different balance than the first two, so it can’t be considered equal to either of the first two. The first two have the same contents, so arguably they can be considered equal.

Now, let’s consider the directory and heap that would result from running these three statements:

11.2.2 Different Equality Operations

This sequence of examples points out that we seem to be raising three possible notions of equality:

Whether two values have the same contents. This is formally called structural equality; you can think of it as a “print equality”, namely, when displayed, do the two values look the same.
Whether two values live at the same address, i.e., there is actually only one value in memory. This is formally called reference equality. Usually, we would refer to the two values by different names (so there is the possibility that they are different), and reference equality checks whether the names are aliases. Observe that a given value always prints the same way, so any two names that have reference equality also have structural equality, but not vice versa.

Which notion of equality is “correct”? It turns out that they are valuable in different contexts. For this reason, programming languages generally provide multiple equality operations, letting the programmer indicate which kind of equality they mean in their context.

Unfortunately, the names of equality operations, and their exact meaning, vary across languages. Therefore, we will examine each of Pyret and Python separately.

11.2.2.1 Equality in Python

The == operator that you learned in Pyret and we carried into Python checks for structural equality, independent of addresses:

a1 == a2

True

a2 == a4

True

However, note that this will no longer be true at checkpoint 2:

a1 == a2

False

a2 == a4

True

If we instead want to check for aliasing, we instead use an operation called is (not to be confused with Pyret’s is, which is used for writing tests):

a1 is a2

False

a2 is a4

True

This explains why a2 == a4 was true both before and after the mutation, but a1 == a2 was no longer true after it. The latter seems to violate a very basic meaning of “equality”; the problem here is caused by the introduction of mutation.

As we go forward, you’ll get more practice with when to use each kind of equality. The == operator is more accepting, so it is usually the right default. If you actually need to know whether two expressions evaluate to the same address, you should instead use is.

11.2.2.2 Equality in Pyret

Equality in Pyret is somewhat more detailed, because the language wants you to think harder about what is happening in your programs.

Recall that we are using the datatype in Example: Bank Accounts and have written the following definitions:

a1 = account(8603, 500)
a2 = account(8603, 500)
a3 = account(8603, 250)
a4 = a2
# checkpoint 1
a2!{balance: 800}
# checkpoint 2

In Python, we saw that a1 == a2 before the mutation. However, in Pyret, this produces false! Why?

The reason is because structural equality is actually complicated; there are two different questions we could be asking:

Are these two values structurally equal right now?
Will these two values be structurally equal always?

Pyret makes a distinction between these two.

By default, Pyret tends towards safer programming practices. Therefore, the standard (structural) equality predicate, ==, will only return true if the two values will always be equal. Thus:

a2 == a4

true

Because the two values are actually aliases, no matter how one changes, the “other” will always change in the same way. Therefore, they will always “print the same”. We can confirm that they are aliases by using Pyret’s reference equality operator, <=>:

a1 <=> a2

false

a2 <=> a4

true

In contrast, that guarantee does not apply to a1 and a2; and indeed, at checkpoint 2, we see that they are no longer equal. Hence

a1 == a2

false

However, there is a time when a1 and a2 do print the same, namely before checkpoint 1. Therefore, Pyret provides another equality operator that checks whether values are equal at the moment, =~. If we ask this before checkpoint 1, we get:

a1 =~ a2

true

But if we ask the same question at checkpoint 2, we get:

a1 =~ a2

false

These operators and their funny symbols may be hard to remember, but Pyret also gives them useful (if longer) names, and they can be used as ordinary functions:

Symbol	Function	Type	Meaning
`==`	`equal-always`	Structural	If it returns `true`, they will always be equal, irrespective of any future mutations.
`=~`	`equal-now`	Structural	If it returns `true` they are currently equal, but that may change after future mutations.
`<=>`	`identical`	Reference	Returns `true` if the two arguments are aliases, `false` otherwise.

Thus, before checkpoint 1:

equal-now(a1, a2)

true

equal-now(a2, a4)

true

equal-always(a1, a2)

false

equal-always(a2, a4)

true

identical(a1, a2)

false

identical(a2, a4)

true

After checkpoint 2, we no longer need to check any of the equal-always or identical relationships again, because by definition they cannot change. But we should check equal-now again. Sure enough:

equal-now(a1, a2)

false

equal-now(a2, a4)

true

Therefore, in Pyret, the == operator is the same as equal-always. When data contain mutable fields, this will always produce false, because even if the values are structurally equal now, it’s possible that a future mutation will change that. This is to remind you to be careful in the presence of mutation. In situations where we really care only about equality at that instant, we can use =~, i.e., equal-now.

The examples above might suggest that only aliased values are equal-always. This is not true! If our data are immutable (which is the default in the language), then if two values are structurally equal now, they must remain structurally equal forever. For such data, equal-always will return true even when they are not aliases. This is a reminder that we get stronger guarantees about immutable data.

It is worth noting that upto this point we have used equal-always—in the form of both == and Pyret’s is in testing—without really bothering to understand very much about how it works, and yet have always gotten predictable answers. This suggests that there is something natural about working with immutable data. In contrast, with mutable data, something has to give. Pyret made a conscious design choice to reflect this in the distinction between equal-always and equal-now. Python made a different choice, which results in “equality” having a perhaps surprising meaning. (Python has no notion of equal-always, only equal-now or =~, which is written as ==, and identical or <=>, which is written as is.)

contents ← prev up next →

I	Introduction
II	Introduction to Programming
III	From Pyret to Python
IV	Programming With State
V	Algorithm Analysis
VI	Data Structures with Analysis
VII	Advanced Topics
VIII	Interactive Programs
IX	Appendices

11	Programming with State (in Both Pyret and Python)
12	More Programming with State: Python

11.1	State, Change, and Testing
11.2	Understanding Equality
11.3	Arrays and Lists in Memory
11.4	Cyclic Data