V Programming with State
24 Modifying Structured Data
Many programs work with data that get updated over time: as users of common software applications, we mark email messages as read, add items to shopping lists, add entries to our calendars, and update the due dates of to-do items as our schedules change. We haven’t yet written programs that update data, but you now have the foundational skills you need to learn how to do this.
It turns out that there are a couple of different kinds of updates that we can make to data. Consider our to-do list example: we might want to add items to the list (or better still, take them out!), but we also need to update details of existing items (such as their descriptions or due dates). We’ll learn to update details of existing items first.
As a reminder, here is the dataclass for ToDoItem
, along with
examples of the data.
@dataclass
class ToDoItem:
descr : str
due : date
tags : list
milk_item = ToDoItem("buy milk", date(2020, 7, 27), ["shopping", "home"])
grading_item = ToDoItem("grade hwk", date(2020, 7, 28), ["teaching"])
paper_item = ToDoItem("meet students", date(2020, 7, 26), ["research"])
todo_list = [grading_item, milk_item, paper_item]
24.1 Modifying Fields of Structured Data
In Python, we modify a field of a dataclass by using = to give a new value to the field within the value. For example:
milk_item.due = date(11, 15, 2020)
paper_item.descr = "meet Jack"
Once we do this, the original value is gone, replaced by the new value. We can see this by accessing the field (or by looking in the program-directory pane if your programming environment has that feature):
print(milk_item.due)
Do Now!
Check the value of
todo_list
. Doesmilk_item
have the original date or the new date?
milk_item
, which was in the
list, todo_list
shows the new date. This is a big deal!
Changes we make through one name in our directory can be visible
through another name. This happens when we have set up relationships
between data (such as putting a dataclass item in a list, while having
names for both the item and the list). This subtlety is why we
deferred data updates until now (and we’ll take this chapter and the
next to work through the subtleties and consequences).Let’s now write a function to modify the date in a ToDoItem
.
def modify_duedate(forItem : ToDoItem, new_date : date):
"""change due date on given item to the new date"""
forItem.due = new_date
What does this function return? Nothing, as it turns out. The purpose
of this function isn’t to produce an answer, but rather to simply
update a piece of data. You’ll note there is no return
statement here. If you run the function, Python just gives you the
prompt again, without showing a result.
How, then, do we test a procedure/function that doesn’t return
anything? Our whole practice of testing until now has been based on
calling functions and checking their expected answers. When a function
updates data, we need to check that the update occurred properly. In
other words, we check for the expected effect of calling the
function. Here, the expected effect is that the due
field of
the provided ToDoItem
will have the new date. We also expect
that the other fields (descr
and tags
) have not
changed. Here’s an example test function:
def test_modify_duedate():
# SETUP
item = ToDoItem("register", date(2020, 11, 9), ["school"])
# PERFORM MODIFICATIONS
modify_duedate(item, date(2020, 11, 15))
# CHECK EFFECTS
assert item.due == date(2020, 11, 15)
assert item.descr == "register"
assert item.tags == ["school"]
This test function is longer than ones we have written before. The strings within the function are serving as labels for the three steps we take when testing for updates: we set up or create our test data, we call the function to modify the data, then we check the expected effects from the modification. While these strings are not strictly necessary, we find the labels are a useful guide for both you and the person who will read your code.
24.2 Modifications to Shared Data
Imagine that you have a very important phone call coming up, so you made multiple to-do items to make sure you don’t lose track of it. By the time you made the third item, you were tired of typing, so you decided to reuse the definition of the second one, just with a new name:
call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2
Do Now!
Draw the program directory for this code
Directory
call1
→ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2
→ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3
→ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
The directory suggests that all three calls are the same. We can confirm this by running the following commands:
| |
|
| |
|
Fred is unexpectedly out of the office, so now you need to call Tina instead. You decide to modify the call reminders to reflect this change.
Do Now!
Write a command to change the description within
call3
to"call Tina"
instead of"call Fred"
(we’ll get to the other two in a moment).
Using what we just learned, we want the following:
call3.descr = "call Tina"
Let’s check whether call3
shows the new value:
| |
|
This corrects call3
. Presumably, call1
and
call2
still indicate that we need to call
Fred. Let’s check.
| |
|
| |
|
Wait! call2
has already been changed. Furthermore, the check
we made earlier that call1
and call2
referred to the
same values now returns False
, even though we didn’t
explicitly change any fields of either of them!
| |
|
If we again ask whether call2
and call3
have the
same values (which they did before the update), we find that they
still do:
| |
|
Perhaps this check is not surprising. We did, after all, create
call3
based on call2
when we wrote call3 =
call2
. But we are still left with the unsettling observation that
running a line of code (the update call3.descr = "call Tina"
)
that had nothing to do with call1
or call2
changed
the ==
relationship between call1
and
call2
. These observations suggest that equality is more
subtle than it first appeared. Furthermore, our program directory
isn’t currently sophisticated enough to show us that the relationship
between call1
and call2
is different than that
between call2
and call3
. We’ll start with the
program directory, then come back to equality.
24.3 Understanding Memory
Every time you use a constructor to create data, your programming environment stores in the memory of your computer. Memory consists of a (large) number of slots. Your newly-created datum goes into one of these slots. Each slot is labeled with an address. Just as a street address refers to a specific building, a memory address refers to a specific slot where a datum is stored. Memory slots are physical entities, not conceptual ones. A computer with a 500GB hard drive has 500 billion slots in which it can store data. Not all of that memory is available to your programming environment: your web browser, applications, operating system, etc all get stored in the memory. Your programming environment does get a portion of memory to use for storing its data. That portion is called the heap.
When you write a statement like
call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
ToDoItem
into a
physical slot in the heap, then associates the address of that
slot with the variable name in the directory. The name in the
directory doesn’t map to the value itself, but rather to the
address that holds the value. The address bridges
between the physical storage location and the conceptual name you want
to associate with the new datum. In other words, our directory really
looks like:Directory
call
→ 1001
Heap
1001:
ToDoItem("call Fred" ...)
This revised directory-plus-heap information explains what we recently
observed about the relationships among call1
, call2
,
and call3
around the update to call3.descr
. Let’s
revisit the code by which we introduced these names into the
directory. First, we made new two ToDoItem
data values and
associated them with call1
and call2
:
call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
After running these two statements, the directory and heap appear as follows:
Directory
call1
→ 1001call2
→ 1002
Heap
1001:
ToDoItem("call Fred" ...)
1002:
ToDoItem("call Fred" ...)
What happens now when we evaluate call3 = call2
? This
statement evaluates just as it did before: we create a directory entry
for call3
and associate it with the result of the right side
of the =
. The “result” of the computation here is the
address where the value is stored. The value of call2
is in
1002, so we associate that same address with call3
.
Directory
call1
→ 1001call2
→ 1002call3
→ 1002
Heap
1001:
ToDoItem("call Fred" ...)
1002:
ToDoItem("call Fred" ...)
With the heap articulated separately from the directory, we now see
the relationship between call3
and call2
: they
refer to the same address, which in turn means that they refer to the
same value. call1
, however, refers to a different address
(that just happens to contain a value with the same field contents).
24.4 Variables and Equality
Given the existence of both the directory and a heao now lets us
return to the question of what equality means. Considering our
development of call1
, call2
, and call3
, we
have seen there are (at least) two notions of equality:
Two expressions (which include names) evaluate to the same address in the heap
Two expressions evaluate to values with the same types and with the same field contents, but may be in different addresses in the heap
==
operator that you learned in Pyret and we carried into
Python checks the second condition. Two expressions can be ==
while referring to different addresses. If we want to know whether two
names refer to the same address, we instead use an operation called
is
. Let’s revisit our ToDoItem
creation one last time:call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2
| |
|
| |
|
| |
|
| |
|
Both notions of equality are useful in practice, and indeed most languages will give you two
different equality operators: one for equality of address, and one for
equality of (component) values. In Python, we use is
for
address equality and ==
for equality of values. If you did
Re-Examining Equality, you saw this difference in Pyret as <=>
versus
==
. Pyret also offers a temporal based notion of equality. (If you did not do anything in Re-Examining Equality, you weren’t exposed
to these equality variations in Pyret.)
As we go forward, you’ll get more practice with when to use each kind
of equality. The ==
operator is more accepting, so it is
usually the right default. If you actually need to know whether two
expressions refer to the same address, you can switch to is
.
24.5 Basic Data in Memory
We have shown how to place structured data in our new directory/heap structure, but what about basic data? For example, what if our original program to set up the call items also included the following two lines:
x = 4
y = 5
The corresponding directory/heap contents would be as follows:
Directory
call1
→ 1001call2
→ 1002call3
→ 1002x
→4
y
→5
Heap
1001:
ToDoItem("call Fred" ...)
1002:
ToDoItem("call Tina" ...)
This shows that basic data live in the directory, not the heap. The whole point of structured data is that they have both their own identity and multiple components. The heap gives access to both concepts. Basic data can’t be broken down (by definition). As such, there is nothing lost by putting them only in the directory.
What about strings, however? We’ve referred to them as basic data until now, but doesn’t they have components, in terms of the sequence of characters that make up the string? Yes, that is technically accurate. However, we are treating strings as basic data because we aren’t using operations that modify that sequence of strings. This is a fine point, one that usually comes up in later CS courses. This book will leave strings in the directory, but if you are writing programs that modify the internal characters, put them in the heap instead.
25 Modifying Variables
We’ve now seen two different forms of updates in programs: updates to
fields of structured data in Modifying Structured Data, and
updates to the values associated with names when computing over lists
with for
loops in Traversing Lists. At a
quick glance, these two forms of update look similar:
call3.descr = "call Tina"
run_total = run_total + fst
=
operator and compute a new value on the right
side. The left sides, however, are subtly different: one is a field
within a value, while the other is a name in the directory. This
difference turns out to be significant, as we can see by working
through how these two forms work at the level of the directory and the
heap. We covered this for the call3.descr
update in the last
chapter. Now, let’s look at the run_total
update.25.1 Modifying Variables in Memory
Here again is our earlier code for summing the numbers in a list:
run_total = 0
for num in [5, 1, 7, 3]:
run_total = run_total + num
The first line makes an entry in the directory:
Directory
run_total
→0
The for
loop also sets up a directory entry, this time for
the variable num
used to refer to the list elements. Thus,
the directory appears as:
Directory
run_total
→0
num
→5
Inside the for
loop, we compute a new value for
run_total
. The use of =
tells Python to modify the
value of run_total
in the directory.
Directory
run_total
→5
num
→5
This process continues: Python advances num
to the next list
element
Directory
run_total
→5
num
→1
run_total
Directory
run_total
→6
num
→1
What is the takeaway from this? There are two:
The
=
construct associates a name with a value. If the name is not already in the directory, it gets added. If the name is already in the directory, the new value replaces the old value. The old value is no longer accessible.for
loops also introduce a name into the directory, specifically the one the programmer chose to refer to the individual list elements. Python modifies the value of that name as part of processing thefor
loop.
The ability to modify the values associated with names probably doesn’t seem like a big deal just yet, but upcoming programs will show where this ability gets interesting.
Exercise
Draw the sequence of directory contents for the following program:
score = 0 score = score + 4 score = 10
Exercise
Draw the sequence of directory contents for the following program:
count_long = 0 for word in ["here", "are", "some", "words"]: if len(word) > 4: count_long = count_long + 1
In Modifying Structured Data, we saw how a statement of the
form call3 = call2
resulted in a change to
call3.descr
also affecting the value of call2
. Does
this same effect occur if we update the value of a variable directly,
rather than a field? Consider the following example:
y = 5
x = y
Do Now!
What do the directory and heap look like after running this code?
Since x
and y
are assigned basic values, there are
no values in the heap:
Directory
y
→5
x
→5
Do Now!
If we now evaluate
y = 3
, does the value ofx
change?
It does not. The value associated with y
in the directory
changes, but there is no connection between x
and y
in the directory. The =
command does associate the value
mapped to y
with x
, but that is not the same as
having the two variables track each other. Any time we update the
value associated with a name (as opposed to a field), only the
directory changes. The resulting directory therefore looks like:
Directory
y
→3
x
→5
That the value of x
did not change is because of the form of
the update: name = ...
. It has nothing to do with the fact
that x
and y
were associated with basic data. For
example, here is another way we might have updated our earlier program
involving call3
to tell us to call "Tina"
.
call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2
call3 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])
Do Now!
Write out the directory and heap after running this program.
You should have come up with the following:
Directory
call1
→ 1001call2
→ 1002call3
→ 1003
Heap
1001:
ToDoItem("call Fred" ...)
1002:
ToDoItem("call Fred" ...)
1003:
ToDoItem("call Tina" ...)
With this version, updating the ToDoItem
associated with
call3
does not affect the descr
field of
call2
. That updating via fields affects both variables while
updating the variables directly does not is
one of the most common confusions in introductory
programming. Statements like var1 = var2
do not set up
relationships that persist between variables over time. Moreover,
these statements are not commutative, meaning that var1 =
var2
and var2 = var1
do not make the same modifications to
the directory. The variable on the left of =
is reassigned to
a new value in the directory, and that value comes from evaluating
whatever is on the right. If the expression on the right of =
is just a variable, then the value is retrieved from the directory and
associated with the variable on the left.
Is there any way to get two variables to track each other’s values?
No, but as we saw with our first (field-based) approach to updating call3
, it is
possible for two variables to reference the same structured datum, and
thereby share modifications. We’ll see more examples of this as we go.
Ultimately, this book is trying to teach you how to work with various kinds of information that might show up in a real project. Up until now, we’ve focused on the structure of information (tables vs lists vs datatypes/dataclasses) and how to write programs to process those structures.
But now, we’re starting to consider an additional question: does any of our data need to be shared across different parts of the program, such that multiple parts of the program might need to change the data and see the changes? So far, we’ve seen how dataclasses work when they are shared. Can we also share lists? What does that look like? We’ll tackle those questions next.
Strategy: Rules for updating the directory and the heap
Summarizing, the rules for how the directory and memory update are as follows:
We add to memory when a data constructor is used
We update memory when a field of existing data is reassigned
We add to the directory when a name is used for the first time (this includes parameters and internal variables when a function is called)
We update the directory when a name that is already in the directory is reassigned to a different value)
25.2 Modifying Variables Associated with Lists
Let’s expand our study of updates yet again, this time looking at updating lists. We’ll start with a list of basic data: strings representing votes cast through a poll. We’ll need functions that let people cast votes, as well as functions to tally the votes cast for specific options and functions to determine which option had the most votes.
As a concrete example, let’s assume that students are responding to a poll about their favorite desserts. We’ll start by setting up an initial (empty) list of the votes that have been cast:
theVotes = []
This would create an entry in the directory:
Directory
theVotes
→[]
Do Now!
Write a sequence of 2 commands: one to add
"fruit"
totheVotes
and another to add"churros"
.
theVotes = ["fruit"] + theVotes
theVotes = ["churros"] + theVotes
Do Now!
Draw the directory after both of these modifications have been made.
Directory
theVotes
→["churros", "fruit"]
Do Now!
Why are new items at the front of the list? What would need to change to put new votes at the end of the list instead?
Exercise
Write a function
count_votes_for
that takes the name of a dessert and returns the number of times that the given dessert appears intheVotes
.
Exercise
Imagine that you were now asked to determine which option received the most votes. Develop two different task plans (Task Plans) that would perform this computation. You don’t have to write the code, just the plans.
25.3 Writing Functions that Modify Variables
Our two modification commands are (not surprisingly) the same except for the vote that we are casting. Seeing as we expect many votes to be cast, it would make sense to turn this repeated code into a function. As we did in Pyret, we create a parameter name for the information that differs across the commands, and make the remainder of the commands the body of the function. Here’s the proposed code:
def cast_vote(item: str):
"""add given item to the theVotes list"""
theVotes = [item] + theVotes
If this function were the only code in your file, Python would not
accept it. It would complain that the theVotes
variable is
unbound (which means it isn’t associated with a name). We need to have
created an initial value of theVotes
so that it exists in
the directory. Earlier in this section, we started with an initial
empty list of votes. We have to include that in our file.
theVotes = []
def cast_vote(item: str):
"""add given item to the theVotes list"""
theVotes = [item] + theVotes
theVotes
is unbound. But we just defined
it. What could be the problem here?The problem is has to do with where the variable is
defined. Here, we are trying to modify the variable inside a
function, but we created the variable outside the
function. When we wrote sum_list
, the variable with the
running total was created and modified within the same function. Here, the
theVotes
variable is defined outside the function. Why
might this matter? The answer lies in the directory.
25.3.1 The global
annotation
Whenever a function gets called, the
programming language creates a new directory area with a space for
local information. If we create an empty list of votes then call
cast_vote("fruit")
, the directory has the following contents
just before we modify the list:
Directory
theVotes
→[]
sum_list
, for example,
we want to sum each list independently, without the results of summing
one list affecting the sum reported for another.Sometimes, however, we want data from one function call to
affect later calls. That is, in fact, the whole point of having a
variable where we maintain our current list of votes. We therefore
have to tell Python that we want to modify the value of the
theVotes
variable that is outside of the local
area. The Python keyword global
achieves that:
theVotes = []
def cast_vote(item: str):
"""add given item to the theVotes list"""
global theVotes
theVotes = [item] + theVotes
The global
annotation is used with the name of a variable
that was defined outside the function that we want this function to
modify. You can’t use this to modify the values of variables that were
introduced in other functions, just ones that were defined at the same
level as the functions in your file (the so-called “top” level).
25.4 Testing Functions that Modify global
Variables
We developed cast_vote
by making a function from two commands
with shared content. We didn’t write tests for it though. Let’s add a
testing function now.
Do Now!
Try to write a test case for
cast_vote
. Write down either a concrete test (inpytest
form) or write down a question whose answer would help you figure out how to write a test.
Following the form of tests that we developed in Examples and Test Cases, we expect to write something like
def test_cast_vote():
assert cast_vote("basbousa") == ...
...
? As we saw when testing the function to
update ToDoItem
descriptions, we write tests that check the
effects of update functions, not their returned results.Do Now!
What effect do we expect
cast_vote("basbousa")
to have? It modifiestheVotes
, so refer totheVotes
or"basbousa"
in your answer.
One claim that directly follows from the problem statement would be
"theVotes
should contain "basbousa"
". Can we turn
this into code to form a test case? Yes, and here’s the code:
def test_cast_vote():
cast_vote("basbousa")
assert "basbousa" in theVotes
This example shows the essence of testing functions that modify
variables. Rather than put the function call in the assert
statement, we call the function first, then write an assert
that uses the variable itself to check for the expected change.
Do Now!
Are there other expected changes that we should check? Can you come up with a
cast_vote
function that would pass this test but not do what we expected?
Imagine that someone wrote the following cast_vote
function
instead:
theVotes = []
def cast_vote(item: str):
"""add given item to the theVotes list"""
global theVotes
theVotes = [item]
This modifies theVotes
to include the given item, but it
accidentally throws away the rest of the list! Our current
test_cast_vote
would approve of this version, which suggests
that additional tests are needed.
Do Now!
What other expectations do you have about
theVotes
after the function runs? Can you capture those asassert
statements about the variable?
One expectation we might have is that theVotes
would be 1
element longer after the call than it was before the call. Here’s how
to augment test_cast_vote
to include this assertion:
def test_cast_vote():
pre_length = len(theVotes)
cast_vote("basbousa")
assert "basbousa" in theVotes
assert len(theVotes) == pre_length + 1
The strategy here is to use a local variable to save the length of
theVotes
before calling cast_vote
, then use that
saved value to check the expected impact after the call.
Do Now!
Are there other impacts to consider? Could a program satisfy our current two tests and still not do what we expect?
If we start imagining worst-case scenarios, we could envision a
malicious voting machine that doubles the impact of a vote by removing
an old vote and replacing it with two copies of the newly cast
vote. Here’s code that wouild do that (the list[1:]
returns
the rest of list
. in general list[x:y]
takes the sublist from
index x
up to but not including index y
).
def cast_vote(item: str):
"""add given item to the theVotes list"""
global theVotes
theVotes = [item] + [item] + theVotes[1:]
Okay, yes, this could happen, but isn’t this sort of idea generation getting a little excessive? Yes and no. If you are writing test cases for a voting machine, you absolutely should be thinking about situations like this (because people trying to influence elections might try them). The ability to think about how adversaries might abuse a system is a valuable skill (and you can build a career around doing this well). But it does feel a bit open-ended, especially in a first programming course.
Fortunately, we have a much more systematic way to raise the
robustness of our tests. Recall that we asked you to write a function
called count_votes_for
as an exercise. Since both
cast_vote
and count_votes_for
work with the
theVotes
variable, we should check that the results of
count_votes_for
are what we would expect after
cast_vote
modifies the variable.
Do Now!
Modify
test_cast_vote
to also check that the results ofcount_votes_for
change as expected.
Here’s a revised test function:
def test_cast_vote():
pre_length = len(theVotes)
pre_count = count_votes_for("basbousa")
cast_vote("basbousa")
assert "basbousa" in cast_vote
assert len(theVotes) == pre_length + 1
assert count_votes_for("basbousa") == pre_count + 1
To be careful, we also should test that the vote count has not
changed for a dessert other than "basbousa"
.
Do Now!
Add a test for a dessert other than
"basbousa"
.
One challenge here is that we don’t know what other votes have been cast. From the perspective of this test, only one vote gets cast, so a test for any other dessert might only check desserts that aren’t in the list.
A better test function would set theVotes
to have specific
contents, to give us more data to work with.
def test_cast_vote():
theVotes = ["fruit", "churros", "fruit", "basbousa"]
pre_length = len(theVotes)
pre_count = count_votes_for("basbousa")
pre_fruit = count_votes_for("fruit")
cast_vote("basbousa")
assert "basbousa" in cast_vote
assert len(theVotes) == pre_length + 1
assert count_votes_for("basbousa") == pre_count + 1
assert count_votes_for("fruit") == pre_fruit
Do Now!
This code has fixed the content of
theVotes
. This suggests that we no longer need to save information about the results of functions before callingcast_vote
. We could edit theassert
statements to use manually-computed values (such as3
in place ofpre_length
). Is this a good idea?
You certainly could remove the pre_
definitions and just use
concrete numbers in the assert
statements. As you are getting
started, you may prefer to do that. However, the benefit to the
current setup is that we could modify the initial value of
theVotes
(if we think of another important case to check)
without then having to modify all of the assert
statements. The current setup also lets someone else read your
assert
statements and clearly understand the effect being
tested: having the right side of ==
say pre_count +
1
is more informative than a concrete number like
2
. Remember: examples and code are meant to be read by other people!
25.4.1 The Internal Structure of a Test Function
This testing function seems to be getting a bit long. If we look closely, however, we’ll see it has some structure to it, which we’ll highlight by adding some blank lines and labeling strings between key testing tasks. (Remember that Python uses strings as a commenting mechanism.) These are the same labels that we added to our earlier example of a test function for updating descriptions.
def test_cast_vote():
# SETUP
theVotes = ["fruit", "churros", "fruit", "basbousa"]
# SAVE CURRENT VALUES
pre_length = len(theVotes)
pre_count = count_votes_for("basbousa")
pre_fruit = count_votes_for("fruit")
# PERFORM MODIFICATIONS
cast_vote("basbousa")
# CHECK EFFECTS
assert "basbousa" in cast_vote
assert len(theVotes) == pre_length + 1
assert count_votes_for("basbousa") == pre_count + 1
assert count_votes_for("fruit") == pre_fruit
The first section sets up our data. The second saves any current
values that are needed for the tests. The third makes the function
calls that we want to test. The last checks the effects with the
assert
statements. Viewed this way, the long function seems
more manageable. The labels also remind us of the key tasks that go
into testing functions that modify variables.
25.4.2 Takeaways on Testing Modifications
What are the takeaways from all of this?
When testing a function that modifies variables, we test assertions about the effects of the function.
A systematic way to identify effects to test is to leverage other functions that use the variable that’s being modified. Testing basic properties like length can also be useful.
There are four tasks involved in testing a function that modifies variables: setting up the data, saving current values, calling the function to test, and checking the effects.
Does it feel like testing has gotten more complicated now that variables are being modified? It has. When there are no variables that are shared and modified across functions, we can test each function individually. But once one function modifies the value of a variable that other functions also use, we have to make sure that the modifying function isn’t breaking the expected behavior of the others. Variables are powerful, but that power comes with responsibility.
Strategy: Testing a Function that Modifies Data
In prose, write down the effects that you expect the modification to have, both on traits of the variable’s value and on the results of other functions that use the variable. Remember to consider situations that should not be effected by the modification.
Create a test-function skeleton with areas for setup, saving current values, calling the function, and checking the effects.
In the setup section, set the value of the variable to a concrete value for testing.
In the calling-the-function section, write the function call that you want to test.
Turn each of the effects that you want to check into an
assert
statement, naming and storing computations of any values from before the test was run in the saving-current-values section.
26 Revisiting Lists and Variables
26.4.2 Revisiting Variables: A Function to Create Accounts for New Customers |
26.1 Sharing List Updates
In Modifying Structured Data, we saw that we can have two names that refer to the same dataclass value in memory. In this way, changes made to one are visible in the other. Put differently, we can have two names that refer to the same datum. This same ability could also be useful for lists. For example, you and a roommate might share a shopping list, the registrar and the financial aid office might share a list of students, and so on. In both examples, an list update made by one person or office should be visible to the other. How do we implement this in Python?
More concretely, assume that Shaunae and Jonella are roommates who take turns shopping for groceries. Each of them has access to a shared shopping list (through their phones): any time one of them adds or removes an item, the other person should see the change. How could we mock this up in code?
Do Now!
Create a shopping list for Shaunae with a couple of items, then give the same list a second directory name corresponding to Jonella.
You might have written something like the following:
shaunae_list = ["bread", "coffee"]
jonella_list = shaunae_list
Do Now!
Jonella realizes they are also out of eggs. Run an command at the prompt to add
"eggs"
to Jonella’s list, then make sure"eggs"
appears under both names.
For
Loops in Functions that Produce Lists, perhaps you
wrote something like>>> jonella_list = ["eggs"] + jonella_list
>>> jonella_list
["eggs", "bread", "coffee"]
>>> shaunae_list
["bread", "coffee"]
+
to
combine two lists, Python creates a new list out of the elements of
the existing lists. Since the left side of the =
is a name in
the directory, Python modifies the directory to refer to this new
list. It does not maintain the relationship between
jonella_list
and shaunae_list
.Surely sharing lists is a common enough pattern that there is a way to do this, however. There is, but it requires us to work with lists in a different way than we first saw in Processing Lists.
26.1.1 Operations that Mutate Lists
So far, we have used +
to append two lists together. This
operation creates a new list. Python offers a different operation
(called append
) for modifying an existing list to include a
new element. Here’s a different way we could have added
"eggs"
to Jonella’s list.
>>> jonella_list.append("eggs")
>>> jonella_list
["bread", "coffee", "eggs"]
>>> shaunae_list
["bread", "coffee", "eggs"]
jonella_list
in
the directory, follows the .
over to the list datum in the
heap, then operates on the list itself to add "eggs"
. Unlike
in the previous version, we don’t use =
to assign a new value
to jonella_list
. The append
updates the list in the
heap. Since jonella_list
and shaunae_list
refer to
the same address in memory, the addition via jonella_list
also shows up when we access the list through the name
shaunae_list
.Similarly, if Shaunae buys "coffee"
, she can remove it from
the shared list using the remove
operator:
>>> shaunae_list.remove("coffee")
>>> jonella_list
["bread", "eggs"]
>>> shaunae_list
["bread", "eggs"]
The point here is that both append
and remove
modify
the contents of the list in the heap. If two names refer to the same
heap address, changes made through one name appear through the other.
Does this mean we always want to modify lists using append
and remove
? Not at all! Sometimes, we will want to build new
lists rather than modifying old ones (we’ll see some cases of this
later). If you want a modification to be visible through all names
that refer to a list, use append
. If you only want the
addition of an element to be visible through one name, use +
.
26.2 Lists in Memory
In Modifying Structured Data, we drew directory-heap diagrams
to show how dataclass values appear in the heap. What do lists look
like in the heap? In Python, lists are stored by putting a
dataclass-like datum into memory that contains the number of elements
in the list, followed by individual elements in the subsequent
addresses. For example, here’s what Shaunae and Jonella’s shopping
lists look like after adding "eggs"
:
Directory
shaunae_list
→ 1001jonella_list
→ 1001
Heap
1001:
List(len:3)
1002:
"bread"
1003:
"coffee"
1004:
"eggs"
There are no directory entries to the list elements, but the fact that
the List
datum reflects size 3
tells us that the
next three addresses are part of the list.
Do Now!
Work out the memory diagram for the following program:
scores = [89,72,92] colors = ["blue", "brown"] scores.append(83)
Does this raise any questions?
After defining the first two lists, your memory diagram should have looked like (perhaps with a different starting address):
Directory
scores
→ 1005colors
→ 1009
Heap
1005:
List(len:3)
1006:
89
1007:
72
1008:
92
1009:
List(2)
1010:
"blue"
1011:
"brown"
What now happens when we append 83
to the scores
list? Since lists are supposed to have all of their elements in
consecutive memory slots, 83
should go into
@1009
, but that address is already occupied! We shouldn’t
destroy the colors
list just because we added to the
scores
list.
In practice, lists are created with some extra space to allow for later additions. Here’s a more accurate diagram:
Directory
scores
→ 1005colors
→ 1014
Heap
1005:
List(len:3, slots:8)
1006:
89
1007:
72
1008:
92
1009:
None
1010:
None
1011:
None
1012:
None
1013:
None
1014:
List(len:2, slots:8)
1015:
"blue"
1016:
"brown"
1017:
None
1018:
None
1019:
None
1020:
None
1021:
None
1022:
None
The None
value is what Python uses to indicate "there’s
nothing here". Note that the List
datum has also changed: it
how has two fields, one for the number of elements, and another for
the number of slots. We also modified the directory to reflect the new
address for the scores
list.
There’s now room to perform the append
on the scores
list. The 83
would get inserted into @1009
and
the len
field of the List
would get updated to
4
.
Couldn’t a list still run out of room though? Absolutely. When that happens, the whole list gets moved to a new sequence of addresses with more room. The details of this are a more advanced topic. For now, the key takeaway for you is that lists are arranged in consecutive memory addresses. This layout will be particularly relevant when we get to Hashtables and Dictionaries.
26.3 Practice: Data for Shared Bank Accounts
Let’s apply what we’ve been learning about sharing modifications to a real-world problem: having bank accounts shared by multiple people (such as partners or spouses). A bank wants to allow multiple customers to manage the same account (meaning each person deposits to or withdraws from the same shared balance). Our task is to define data structures for managing bank customers and their (shared) accounts.
How might you set up the data about customers and their accounts? To keep things simple, we’ll limit the information about customers to their names (leaving out phone numbers, etc), and information about accounts to their balances (leaving out interest rates, etc).
In a full banking system, we might want to allow customers to be part of multiple accounts as well. This would suggest the following data structure:
@dataclass
class Account:
id : int
balance : int
owners : lst # of Accounts
@dataclass
class Customer:
name : str
accts : lst # of Accounts
You can have these kinds of circular connections, in which customers connect to accounts which connect to customers. In the interest of letting us focus on basic sharing and memory, however, let’s simplify our data as follows: accounts don’t track who their customers are, and each customer has only one account:
@dataclass
class Account:
id : int
balance : int
@dataclass
class Customer:
name : str
acct : Account
Do Now!
We could consider making the
acct
field hold an ID number, rather than anAccount
datum? Should we do this? Why or why not?
Do Now!
Write an expression that creates a new
Customer
named Tina with a newAccount
. Give the new account ID number1
and initial balance of100
.
Do Now!
Assume we gave the name
t_cust
to theCustomer
datum created in the previous exercise. Draw the memory diagram that containst_cust
and the new data values.
In code, you should have written:
t_cust = Customer("Tina", Account(1, 100))
Directory
t_cust
→ 1015
Heap
1015:
Customer("Tina", Account(1, 100))
Do Now!
Tina wishes to deposit $50 to her account. Write a statement using
t_cust
to perform the deposit. Also show the changes that this makes to the dictionary-memory diagram.
t_cust.acct.balance = t_cust.acct.balance + 50
t_cust
(rather than to the
t_cust
variable itself), we make the update in memory
(specifically, to the balance
field).Directory
t_cust
→ 1015
Heap
1015:
Customer("Tina", Account(1, 150))
Now, let’s try a more complicated situation. Maria and Jorge are new customers who want to share an account. Sharing means that either of them can make deposits or withdrawals and both will see any changes to their shared balance.
Do Now!
Draw a memory diagram with
Customer
data for each of Maria and Jorge and a sharedAccount
with a balance of250
. Don’t write the code, just draw the diagram that the code should produce.
You should have ended up with something like the following (perhaps
with a name for the Account
, perhaps not (either works).
Directory
m_cust1
→ 1016j_cust1
→ 1017
Heap
1015:
Account(id:2, balance:250)
1016:
Customer("Maria", 1015))
1017:
Customer("Jorge", 1015))
Now, we need to figure out the code to produce this diagram.
Do Now!
Here are four possible proposals for code to create the shared account. Which one(s) produce the desired memory diagram? You might find it useful to draw the memory diagrams for each approach and compare them to the one that we want.
Version 1
m_cust1 = Customer("Maria", Account(2, 250)) j_cust1 = Customer("Jorge", Account(2, 250))
Version 2
m_cust2 = Customer("Maria", Account(2, 250)) j_cust2 = Customer("Jorge", m_cust2.acct)
Version 3
new_acct = Account(2, 250) m_cust3 = Customer("Maria", new_acct) j_cust3 = Customer("Jorge", new_acct)
Version 4
init_bal = 250 m_cust4 = Customer("Maria", Account(2, init_bal)) j_cust4 = Customer("Jorge", Account(2, init_bal))
We want to end up with a single account in the heap. Versions 1 and 4
create two separate Account
data, so we can rule them
out. The difference between versions 2 and 3 lies in the directory:
version 3 names the shared Account
in the directory, while
version 2 does not. Either way, the Account
is accessible
from the directory (by going through the m_cust
and
j_cust
variables.
Do Now!
Let’s assume we set up the accounts with version 3. Jorge wants to make a deposit of
100
. Which of the following lines of code are appropriate if Maria should be able to access the deposited funds?1. new_acct = Account(2, new_acct.balance + 100) 2. new_acct.balance = new_acct.balance + 100 3. j_cust3.acct = Account(2, new_acct.balance + 100) 4. j_cust3.acct.balance = new_acct.balance + 100 5. j_cust3.acct.balance = j_cust3.acct.balance + 100
For the deposit to be visible to Maria, we have to modify the account that Maria shares with Jorge. We can’t make a new account, so lines 1 and 3 are not appropriate.
Whether lines 2 and 4 work depends on whether new_acct
still
refers to the memory location of Jorge’s (and Maria’s) account. If it
does, both lines work (though the intent isn’t as clear with line 2,
which doesn’t indicate that the goal is to update Jorge’s
balance). But if new_acct
now refers elsewhere, neither line
works: line 2 puts the adjusted balance in the wrong account, while
line 4 may compute the new balance from the wrong existing balance.
Line 5 works fine.
26.4 Circular References
At the start of this chapter, we said that we wanted each
Account
to also contain a reference to the
Customer
(s) who own the account. We deferred setting up this
reference, but return to it now. Here are the desired dataclasses (to
keep things simple, we’ll allow a Customer
to have only one
Account
.
@dataclass
class Account:
id: int
balance: int
owners: list # of Customer
@dataclass
class Customer:
name: str
acct: Account
Let’s create a new Customer
with a new Account
using
these dataclasses:
new_acct = Account(5, 150, Customer("Elena", __________))
How do we fill in the blank in the Customer
? We’d like to say
new_acct
but Python (and most other languages) will raise an
error that new_acct
isn’t defined. Why is that?
When given this assignment, Python first evaluates the right side, to
get the value or memory location that should be stored in the
directory for new_acct
. If we filled in the blank with
new_acct
, Python would start by running:
Account(5, 150, Customer("Elena", new_acct))
To do this, it needs to look up new_acct
in the directory, but
that name isn’t in the directory yet (it only goes in after we
compute the value to store for that name). Hence the error.
To get around this, we leverage the ability to update the contents of
memory locations after names for data are in place. We’ll create the
Account
partially, but without filling in the
Customer
. Then we create the Customer
to reference
the new Account
. Then we update the Account
owners with the
now-created Customer
:
new_acct = Account(5, 150, []) # note the empty Customer list
new_cust = Customer("Elena", new_acct)
new_acct.owners = [new_cust]
Note here that each part gets a spot in memory and an entry in the
directory, but the datum hasn’t been finished yet. Once we have the
datum set up in memory though, we can update the owners
field to
the correct value.
Here’s what this looks like at the level of memory and the directory after running the first two lines:
Directory
new_acct
→ 1016new_cust
→ 1017
Heap
1015:
[]
1016:
Account(5, 150, 1015)
1017:
Customer("Elena", 1016)
Then, when we run the third line, we create a new list containing
new_cust
and update the owner list within new_acct
:
Directory
new_acct
→ 1016new_cust
→ 1017
Heap
1015:
[]
1016:
Account(5, 150, 1018)
1017:
Customer("Elena", 1016)
1018:
[1017]
Notice that the two owners lists each live in memory but aren’t
associated with names in the directory. They are only reachable going
through new_acct
, and after the update, the empty list isn’t
reachable at all.
Do Now!
Why did a new list get created (at address 1018) rather than have
new_cust
get added to the list at address 1015?
We used new_acct.owners = [new_cust]
to finish setting up the
owners
field, and the right-hand side of =
creates a
new list. If we had instead written
new_acct.owners.append(new_cust)
, then 1017 would
have gone into the list stored at 1015, as follows:
Directory
new_acct
→ 1016new_cust
→ 1017
Heap
1015:
[1017]
1016:
Account(5, 150, 1017)
1017:
Customer("Elena", 1016)
Either approach works. However, not all languages give you an
operation like append
that modifies the contents of an
existing list (Pyret didn’t, at least not the portion that we saw). We
therefore chose to show you the more general approach here.
26.4.1 Testing Circular Data
When you want to write a test involving circular data, you can’t write out the circular data manually. For example, imagine that we wanted to write out new_acct from the previous examples:
test("data test", new_acct,
Account(5, 150, [Customer("Elena", Account(5, 150, ...)])
Because of the circularity, you can’t finish writing down the data. You have two options: write tests in terms of the names of data, or write tests on the components of the data.
Here’s an example that illustrates both. After setting up the account, we might want to check that the owner of the new account is the new customer:
test("new owner", new_acct.owner, new_cust)
Here, rather than write out the Customer
explicitly, we use the name
of the existing item in the directory. This doesn’t require you to write
ellipses. We also focused on just the owner
component, as a part of
the Account
value that we expected to change.
26.4.2 Revisiting Variables: A Function to Create Accounts for New Customers
What if we turned the sequence for creating dependencies between customers and their accounts into a function? We might get something like the following:
def create_acct(new_id: int, init_bal: int, cust_name: str) -> Account:
new_acct = Account(new_id, init_bal, []) # note the empty Customer list
new_cust = Customer(cust_name, new_acct)
new_acct.owners.append(new_cust)
return new_acct
This looks useful, but it does have a flaw: we could accidentally create two accounts with the same id number. It would be better to maintain a variable containing the next unused account id, which would guaranteed that a specific id gets used only once.
Do Now!
How might you modify the code to do this? Which concepts seem relevant?
The next available id keeps changing as the program runs. This suggests that we need a variable in the directory that gets modified after each account is created. Here’s how we might set that up:
next_id = 1 # stores the next available id number
def create_acct(init_bal: int, cust_name: str) -> Account:
global next_id
new_acct = Account(next_id, init_bal, [])
next_id = next_id + 1
new_cust = Customer(cust_name, new_acct)
new_acct.owners.append(new_cust)
return new_acct
Here, we create the next_id
variable to hold the next id number
to use. When we create an Account
, we update next_id
to the
next unused number. Note that we also took the new account’sid number
out of the parameters for the function. Problem solved!
26.5 The Many Roles of Variables
At this point, we have used the single coding construct of a variable in the directory for multiple purposes. It’s worth stepping back and calling those out explicitly. In general, variables serve one of the following purposes:
Tracking progress of a computation (e.g., the running value of a result in a
for
-loop)Sharing data across functions (e.g., casting and counting votes)
Maintaining information across multiple calls to a single function (e.g., the
next-id
variable)Naming a local or intermediate value in a computation
Each of these uses involves a different programming pattern. The first
creates a variable locally within a function. The second two create
top-level variables and require using global
in functions
that modify the contents. The third is different from the second,
however, in that the third is only meant to be used by a single
function. Ideally, there would be a way to not expose the variable to
all functions in the third case. Indeed, many programming languages
(including Pyret) make it easy to do that. This is harder to achieve
with introductory-level concepts in Python, however. The fourth is
more about local names rather than variables, in that our code never
updates the value after the variable is created.
We call out these three roles precisely because they invoke different code patterns, despite using the same fine-grained concept (assigning a new value to a variable). When you look at a new programming problem, you can ask yourself whether the problem involves one of these purposes, and use that to guide your choice of pattern to use.
26.6 Managing All Accounts
So far, we have created individual customers and accounts and named them in the directory. This is fine for a small number of accounts but a realistic bank manages tens of thousands (if not millions) of accounts. Naming individual accounts doesn’t scale with that many accounts. Instead, a bank would maintain something like a list or table of all of the accounts, with functions to locate specific accounts (say by their ID numbers). For example:
all_accts = [Account(8, 100, []),
Account(2, 300, []),
Account(10, 225, []),
Account(3, 200, []),
Account(1, 120, []),
...
]
Do Now!
Modify
create_acct
to add the newly-created account to theall_accts
list.
next_id = 1 # stores the next available id number
all_accts = [] # stores all of the accounts
def create_acct(init_bal: int, cust_name: str) -> Account:
global next_id
new_acct = Account(next_id, init_bal, [])
all_accts.append(new_acct) # <-- this is the new line
next_id = next_id + 1
new_cust = Customer(cust_name, new_acct)
new_acct.owners.append(new_cust)
return new_acct
Exercise
Draw the memory diagram that results from the following code:create_acct(100, "Tina") create_acct(250, "Elena")
When someone asks for their balance, they usually provide their
account number. This suggests that we need a function that takes an
account id and returns the Account
value with that id. From there,
we can dig into the account to get the balance.
Exercise
Write a function
find_acct
that takes an id number and returns theAccount
(fromall_accts
) with that id number.
Here’s one solution:
def find_acct(which_id : int) -> Account:
"""returns the account with the given id number"""
for acct in all_accts:
if acct.id == which_id:
return acct
raise ValueError("no account has id " + str(which_id))
def test_find():
test("match num", find_acct(3).id, 3)
test("match exactly", find_acct(1).id, 1)
testValueError("no account", lambda: find_acct(22))
This example shows how to report errors in Python. As in Pyret, there
is a raise
construct for reporting errors. In Python,
however, rather than raise just a string, you raise a particular type
of data called a ValueError
that contains the string. (There
are other kinds of errors too, but ValueError
will suffice
for what this book will do).
Exercise
Leverage
find_acct
to write the following two functions:def deposit(which_id: int, amount: double) {} def close(which_id: int) {}
As these exercises show, the find_acct
function becomes a
valuable helper for many other functions. This raises a question: how
fast is find_acct
? Here, we have to search through all of the
accounts to locate the one with a specific id before doing the
operation we actually care about. With millions of accounts, this
seems like it could get expensive. We’ll explore this more deeply in
the next chapter.
27 Hashtables and Dictionaries
At the end of Managing All Accounts, we had created a list of
Account
values and written a function to search through that
list to find the account with a specific ID number. We noted that
finding an individual account could require us to check every account
in the list (as we check accounts one at a time in the for
loop), which could start to get expensive as the bank scales to more
customers.
Let’s step back however, and make two observations about this problem:
Every account has a unique ID number
We have a function that needs to retrieve accounts based on this unique ID number
In this case, we can use a different data structure, called a hashtable or dictionary (different languages use different terms for this same concept; Python uses “dictionary”). Dictionaries are designed for precisely such a scenario: we have a set of values (the accounts) that we wish to access using a piece of information (the ID number) that is unique to each value. The piece of information that we use for access is called the key. The key does not need to be a field in the data value, but it can be (as with accounts).
To set up a dictionary, we write a series of entries of the form
key: value
Here’s what a dictionary mapping ID numbers to accounts might look like:
accts_dict = {5: Account(5, 225, []),
3: Account(3, 200, []),
2: Account(2, 300, []),
4: Account(4, 75, []),
1: Account(1, 100, [])
}
In terms of notation, we can create a dictionary by placing a
collection of key: value
pairs between curly braces,
separated by commas. The dictionary shown here has five account IDs
that map to accounts with those same ID numbers. There is no order
within a dictionary, so it doesn’t matter which order we write the
pairs within the sequence.
Now, if we want to get the account with ID 1 from the dictionary, we can simply write:
accts_dict[1]
This says "in the accts_dict
dictionary, get the value
associated with key 1
. Evaluting this expression will produce
Account(1, 100, [])
. If the key you request isn’t in the
dictionary, Python will raise an error. However, you check whether a
key is in the dictionary before trying to use it, as follows:
if 6 in accts_dict:
accts_dict(6)
If we want to change the value associated with a key, we use an assignment statement of the following form:
dictionary[key] = new_value
To add a new key (and its value) to a dictionary, we use the same notation as for updating a value, just with an unused key:
accts_dict[6] = Account(6, 150, [])
Finally, to remove a key (and its value) from the dictionary, we write
del dictionary[key]
27.1 Searching by Criteria Other than Keys
What if we wanted to find all of the accounts with balances below
100
? For this, we have to search through all of the key-value
pairs and check their balances. This again sounds like we need a
for
loop. What does that look like on a dictionary though?
Turns out, it looks much like writing a for
loop on a list
(at least in Python). Here’s a program that creates a list of the
accounts with balances below 100
:
below_100 = []
# the room variable takes on each key in the dictionary
for ID in accts_dict:
if accts_dict[ID].balance < 100:
below_100.append(accts_dict[ID])
Here, the for
-loop iterates over the keys. Within the loop,
we use each key to retrieve its corresponding Account
,
perform the balance check on the Account
, then put the
Account
in our running list if it meets our criterion.
Exercise
Create a dictionary that maps names of classrooms or meeting rooms to the numbers of seats that they have. Write expressions to:
Look up how many seats are in a specific room
Change the capacity of a specific room to have 10 more seats than it did initially
Find all rooms that can seat at least 50 students
27.2 Dictionaries with More Complex Values
Do Now!
A track-and-field tournament needs to manage the names of the players on the individual teams that will be competing. For example, “Team Red” has “Shaoming” and “Lijin”, “Team Green” contains “Obi” and ”Chinara”, and “Team Blue” has “Mateo” and “Sophia”. Come up with a way to organize the data that will allow the organizers to easily access the names of the players on each team, keeping in mind that there could be many more teams than just the three listed here.
This feels like a dictionary situation, in that we have a meaningful key (the team name) with which we want to access values (the names of the players). However, we have already said that dictionaries allow only one value per key. Consider the following code:
players = {}
players["Team Red"] = "Shaoming"
players["Team Red"] = "Lijin"
Do Now!
What would be in the dictionary after running this code? If you aren’t sure, try it out!
How do we store multiple player names under the same key? The insight here is that the collection of players, not an individual player, is what we want to associate with the team name. We should therefore store a list of players under each key, as follows:
players = {}
players["Team Red"] = ["Shaoming", "Lijin"]
players["Team Green"] = ["Obi", "Chinara"]
players["Team Blue"] = ["Mateo", "Sophia"]
The values in a dictionary aren’t limited to being basic values. They can be arbitrarily complex, including lists, tables, or even other dictionaries (and more!). There is still only one value per key, which is the requirement of a dictionary.
27.3 Using Structured Data as Keys
Using structured data as keys is also possible, but it’s more subtle
than using structured data as values. This has to do with how
dictionaries work. Dictionaries depend on the keys being
values that cannot be internally modified. What do we mean? An entire
Account
datum, for example, could not be used as a key
because we want to be able to modify the balance, which is a field
within the account.
A dataclass value can be used as a key as long as the fields will never be modified. For example, let’s return to the earlier example that mapped room names to their numbers of seats. Imagine that the office that maintains rooms wants to be able to access rooms by ranges of seats (such as 0-29, 30-49, and 50-100). In this case, they would want to make a dataclass for a range and use that as a key.
@dataclass
class Range:
low: int
high: str
{ Range(0,29): ["West Wing 200", "North Wing 110"],
Range(30,49): ["North Wing 150", "South Wing 320"],
...
}
Since the fields of a range aren’t going to change, this is a
reasonable datum to use as a key. To make Python accept this
dictionary, however, we have to indicate that we will not be modifying
the field values of Range
. You do this with an additional
piece on the dataclass
annotation:
@dataclass(frozen=True)
class Range:
low: int
high: str
The frozen annotation says "the components of this class cannot be
changed". If you try to assign to a component of a frozen class such
as Flight, Python will produce an error.
There actually are ways to use values with changing components as
keys, but they are more advanced than this book covers. If you want to
read details for yourself, look up __hash__
methods in Python.