IV Programming with State
23 From Pyret to Python
Through our work in Pyret to this point, we’ve covered several core
programming skills: how to work with tables, how to design good
examples, the basics of creating datatypes, and how to work with the
fundamental computational building blocks of functions, conditionals,
and repetition (through filter
and map
, as well as
recursion). You’ve got a solid initial toolkit, as well as a wide
world of other possible programs ahead of you!
But we’re going to shift gears for a little while and show you how to work in Python instead. Why?
Seeing how the same concepts play out in multiple languages can help you distinguish core computational ideas from the notations and idioms of specific languages. If you plan to write programs as part of your professional work, you’ll inevitably have to work in different languages at different times: we’re giving you a chance to practice that skill in a controlled and gentle setting.
Why do we call this gentle? Because the notations in Pyret were designed partly with this transition in mind. You’ll find many similarities between Pyret and Python at a notational level, yet also some interesting differences that highlight some philosophical differences that underlie languages. The next set of programs that we want to write (specifically, data-rich programs where the data must be updated and maintained over time) fit nicely with certain features of Python that you haven’t seen in Pyret. A future release will contain material that contrasts the strengths and weaknesses of the two languages.
We highlight the basic notational differences between Pyret and Python by redoing some of our earlier code examples in Python.
23.1 Expessions, Functions, and Types
Back in Functions Practice: Cost of pens, we introduced the notation for functions and types using an example of computing the cost of an order of pens. An order consisted of a number of pens and a message to be printed on the pens. Each pen cost 25 cents, plus 2 cents per character for the message. Here was the original Pyret code:
fun pen-cost(num-pens :: Number, message :: String) -> Number:
doc: ```total cost for pens, each 25 cents
plus 2 cents per message character```
num-pens * (0.25 + (string-length(message) * 0.02))
end
Here’s the corresponding Python code:
def pen_cost(num_pens: int, message: str) -> float:
'''total cost for pens, each at 25 cents plus
2 cents per message character'''
return num_pens * (0.25 + (len(message) * 0.02))
Do Now!
What notational differences do you see between the two versions?
Here’s a summary of the differences:
Python uses
def
instead offun
.Python uses underscores in names (like
pen_cost
) instead of hyphens as in Pyret.The type names are written differently: Python uses
str
andint
instead ofString
andNumber
. In addition, Python uses only a single colon before the type whereas Pyret uses a double colon.Python has different types for different kinds of numbers:
int
is for integers, whilefloat
allows decimal and real numbers as well. Pyret just used a single type (Number
) for all numbers.Python doesn’t label the documentation string (as Pyret does with
doc:
).There is no
end
annotation in Python.Python labels the outputs of functions with
return
.
These are minor differences in notation, which you will get used to as you write more programs in Python.
Exercise
Convert the followingmoon-weight
function from Functions Practice: Moon Weight into Python:fun moon-weight(earth-weight :: Number) -> Number: doc:" Compute weight on moon from weight on earth" earth-weight * 1/6 end
23.2 Returning Values from Functions
In Pyret, a function body consisted of optional statements to name intermediate values, followed by a single expression. The value of that single expression is the result of calling the function. In Pyret, every function produces a result, so there is no need to label where the result comes from.
As we will see, Python is different: not all "functions" return
results (note the name change from fun
to
def
).In mathematics, functions have results by
definition. Programmers sometimes distinguish between the terms "function"
and "procedure": both refer to parameterized computations, but only
the former returns a result to the surrounding computation. Some
programmers and languages do, however, use the term "function" more
loosely to cover both kinds of parameterized computations. Moreover,
the result isn’t necessarily the last expression of the
def
. In Python, the keyword return
explicitly labels
the expression whose value serves as the result of the function.
Do Now!
Put these two definitions in a Python file.def add1v1(x: int) -> int: return x + 1 def add1v2(x: int) -> int: x + 1
At the Python prompt, call each function in turn. What do you notice about the result from using each function?
Hopefully, you noticed that using add1v1
displays an answer
after the prompt, while using add1v2
does not. This
difference has consequences for composing functions.
Do Now!
Try evaluating the following two expressions at the Python prompt: what happens in each case?
3 * add1v1(4)
3 * add1v2(4)
This example illustrates why return
is essential in Python:
without it, no value is returned, which means you can’t use the result
of a function within another expression. So what use is
add1v2
then? Hold that question; we’ll return to it in Modifying Variables.
23.3 Examples and Test Cases
In Pyret, we included examples with every function using where:
blocks. We also had the ability to write check:
blocks for more
extensive tests. As a reminder, here was the pen-cost
code
including a where:
block:
fun pen-cost(num-pens :: Number, message :: String) -> Number:
doc: ```total cost for pens, each 25 cents
plus 2 cents per message character```
num-pens * (0.25 + (string-length(message) * 0.02))
where:
pen-cost(1, "hi") is 0.29
pen-cost(10, "smile") is 3.50
end
Python does not have a notion of where:
blocks, or a
distinction between examples and tests. There are a couple of
different testing packages for Python; here we will use pytest
,
a standard lightweight framework that resembles the form of testing
that we did in Pyret.How you set up pytest and your test
file contents will vary according to your Python IDE. We assume
instructors will provide separate instructions that align with their
tool choices. To use pytest
, we put both examples
and tests in a separate function. Here’s an example of this for the
pen_cost
function:
import pytest
def pen_cost(num_pens: int, message: str) -> float:
'''total cost for pens, each at 25 cents plus
2 cents per message character'''
return num_pens * (0.25 + (len(message) * 0.02))
def test_pens():
assert pen_cost(1, "hi") == 0.29
assert pen_cost(10, "smile") == 3.50
Things to note about this code:
We’ve imported
pytest
, the lightweight Python testing library.The examples have moved into a function (here
test_pens
) that takes no inputs. Note that the names of functions that contain test cases must have names that start withtest_
in order forpytest
to find them.- In Python, individual tests have the form
assert EXPRESSION == EXPECTED_ANS
rather than theis
form from Pyret.
Do Now!
Add one more test to the Python code, corresponding to the Pyret testpen-cost(3, "wow") is 0.93
Make sure to run the test.
Do Now!
Did you actually try to run the test?
Whoa! Something weird happened: the test failed. Stop and think about that: the same test that worked in Pyret failed in Python. How can that be?
23.4 An Aside on Numbers
It turns out that different programming languages make different
decisions about how to represent and manage real (non-integer)
numbers. Sometimes, differences in these representations lead to
subtle quantitative differences in computed values. As a simple
example, let’s look at two seemingly simple real numbers 1/2
and
1/3
. Here’s what we get when we type these two numbers at a
Pyret prompt:
| |
|
| |
|
| |
|
| |
|
Notice that the answers look different for 1/3
. As you may (or
may not!) recall from an earlier math class, 1/3
is an example of a
non-terminating, repeating decimal. In plain terms, if we tried to
write out the exact value of 1/3
in decimal form, we would need
to write an infinite sequence of 3
. Mathematicians denote this
by putting a horizontal bar over the 3
. This is the notation we
see in Pyret. Python, in contrast, writes out a partial sequence of
3
s.
Underneath this distinction lies some interesting details about representing numbers in computers. Computers don’t have infinite space to store numbers (or anything else, for that matter): when a program needs to work with a non-terminating decimal, the underlying language can either:
approximate the number (by chopping off the infinite sequence of digits at some point), then work only with the approximated value going forward, or
store additional information about the number that may enable doing more precise computation with it later (though there are always some numbers that cannot be represented exactly in finite space).
Python takes the first approach. As a result, computations with the
approximated values sometimes yield approximated results. This is what
happens with our new pen_cost
test case. While
mathematically, the computation should result in 0.93
, the
approximations yield 0.9299999999999999
instead.
So how do we write tests in this situation? We need to tell Python
that the answer should be "close" to 0.93
, within the error
range of approximations. Here’s what that looks like:
assert pen_cost(3, "wow") == pytest.approx(0.93)
pytest.approx
, to
indicate that we’ll accept any answer that is nearly the value we
specified. You can control the number of decimal points of precision
if you want to, but the default of ± 2.3e-06
often suffices.23.5 Conditionals
Continuing with our original pen_cost
example, here’s the
Python version of the function that computed shipping costs on an
order:
def add_shipping(order_amt: float) -> float:
'''increase order price by costs for shipping'''
if order_amt == 0:
return 0
elif order_amt <= 10:
return order_amt + 4
elif (order_amt > 10) and (order_amt < 30):
return order_amt + 8
else:
return order_amt + 12
The main difference to notice here is that else if
is written
as the single-word elif
in Python. We use return
to
mark the function’s results in each branch of the conditional.
Otherwise, the conditional constructs are quire similarly across the
two languages.
You may have noticed that Python does not require an explicit
end
annotation on if
-expressions or functions. Instead,
Python looks at the indentation of your code to determine when a
construct has ended. For example, in the code sample for
pen_cost
and test_pens
, Python determines that the
pen_cost
function has ended because it detects a new
definition (for test_pens
) at the left edge of the program
text. The same principle holds for ending conditionals.
We’ll return to this point about indentation, and see more examples, as we work more with Python.
23.6 Creating and Processing Lists
As an example of lists, let’s assume we’ve been playing a game that involves making words out of a collection of letters. In Pyret, we could have written a sample word list as follows:
words = [list: "banana", "bean", "falafel", "leaf"]
In Python, this definition would look like:
words = ["banana", "bean", "falafel", "leaf"]
The only difference here is that Python does not use the list:
label that is needed in Pyret.
23.6.1 Filters, Maps, and Friends
When we first learned about lists in Pyret, we started with common
built-in functions such as filter
, map
, member
and length
. We also saw the use of lambda
to help us use
some of these functions concisely. These same functions, including
lambda
, also exist in Python. Here are some samples:
words = ["banana", "bean", "falafel", "leaf"]
# filter and member
words_with_b = list(filter(lambda wd: "b" in wd, words))
# filter and length
short_words = list(filter(lambda wd: len(wd) < 5, words))
# map and length
word_lengths = list(map(len, words))
Note that you have to wrap calls to filter
(and map
)
with a use of list()
. Internally, Python has these functions
return a type of data that we haven’t yet discussed (and don’t
need). Using list
converts the returned data into a list. If
you omit the list
, you won’t be able to chain certain
functions together. For example, if we tried to compute the length of
the result of a map
without first converting to a
list
, we’d get an error:
| |
|
Don’t worry if this error message makes no sense at the moment (we
haven’t yet learned what an "object" is). The point is that if you see
an error like this while using the result of filter
or
map
, you likely forgot to wrap the result in list
.
Exercise
Practice Python’s list functions by writing expressions for the following problems. Use only the list functions we have shown you so far.
Given a list of numbers, convert it to a list of strings "pos", "neg", "zero", based on the sign of each number.
Given a list of strings, is the length of any string equal to 5?
Given a list of numbers, produce a list of the even numbers between 10 and 20 from that list.
We’re intentionally focusing on computations that use Python’s built-in functions for processing lists, rather than showing you how to write you own (as we did with recursion in Pyret). While you can write recursive functions to process lists in Pyret, a different style of program is more conventional for that purpose. We’ll look at that in the chapter on Modifying Variables.
23.7 Data with Components
The analog to a Pyret data definition is called a dataclass in Python.Those experienced with Python may wonder why we are using dataclasses instead of dictionaries or raw classes. Compared to dictionaries, dataclasses allow the use of type hints and capture that our data has a fixed collection of fields. Compared to raw classes, dataclasses generate a lot of boilerplate code that makes them much lighterweight than raw classes. Here’s an example of a todo-list datatype in Pyret and its corresponding Python code:
# a todo item in Pyret
data ToDoItemData:
| todoItem(descr :: String,
due :: Date,
tags :: List[String]
end
------------------------------------------
# the same todo item in Python
# to allow use of dataclasses
from dataclasses import dataclass
# to allow dates as a type (in the ToDoItem)
from datetime import date
@dataclass
class ToDoItem:
descr: str
due: date
tags: list
# a sample list of ToDoItem
MyTD = [ToDoItem("buy milk", date(2020, 7, 27), ["shopping", "home"]),
ToDoItem("grade hwk", date(2020, 7, 27), ["teaching"]),
ToDoItem("meet students", date(2020, 7, 26), ["research"])
]
Things to note:
There is a single name for the type and the constructor, rather than separate names as we had in Pyret.
There are no commas between field names (but each has to be on its own line in Python)
There is no way to specify the type of the contents of the list in Python (at least, not without using more advance packages for writing types)
The
@dataclass
annotation is needed beforeclass
.
Other than these notational differences, this concept carries over nicely from Pyret to Python. (Creating datatypes with multiple cases, as we did in Pyret, is a bit harder in Python, so we’ll set that aside for now.)
23.7.1 Accessing Fields within Dataclasses
In Pyret, we extracted a field from structured data by using a dot (period) to "dig into" the datum and access the field. The same notation works in Python:
|
| |
|
23.8 Traversing Lists
23.8.1 Introducing For
Loops
fun sum-list(numlist :: List[Number]) -> Number:
cases (List) numlist:
| empty => 0
| link(fst, rst) => fst + sum-list(rst)
end
end
In Python, it is unusual to break a list into its first and rest
components and process the rest recursively. Instead, we use a construct called a for
to
visit each element of a list in turn. Here’s the form of for
,
using a concrete (example) list of odd numbers:
for num in [5, 1, 7, 3]:
// do something with num
num
here is of our choosing, just as with the names
of parameters to a function in Pyret. When a for
loop
evaluates, each item in the list is referred to as num
in
turn. Thus, this for
example is equivalent to writing the
following:
// do something with 5
// do something with 1
// do something with 7
// do something with 3
for
construct saves us from writing the common code
multiple times, and also handles the fact that the lists we are
processing can be of arbitrary length (so we can’t predict how many
times to write the common code).Let’s now use for
to compute the running sum of a list. We’ll
start by figuring out the repeated computation with our concrete list
again. At first, let’s express the repeated computation just in
prose. In Pyret, our repeated computation was along the lines of "add
the first item to the sum of the rest of the items". We’ve already
said that we cannot easily access the "rest of the items" in Python,
so we need to rephrase this. Here’s an alternative:
// set a running total to 0
// add 5 to the running total
// add 1 to the running total
// add 7 to the running total
// add 3 to the running total
my-running-sum
: Examples and Code, this framing might be familiar.Let’s convert this prose sketch to code by replacing each line of the
sketch with concrete code. We do this by setting up a variable named
run_total
and updating its value for each element.
run_total = 0
run_total = run_total + 5
run_total = run_total + 1
run_total = run_total + 7
run_total = run_total + 3
for
:run_total = 0
for num in [5, 1, 7, 3]:
run_total = run_total + num
for
in a function as we have done for
other examples earlier in this chapter. This is the final version.def sum_list(numlist : list) -> float:
"sum a list of numbers"
run_total = 0
for num in numlist:
run_total = run_total + num
return(run_total)
Do Now!
Write a set of tests for
sum_list
(the Python version).
Now that the Python version is done, let’s compare it to the original Pyret version:
fun sum-list(numlist :: List[Number]) -> Number:
cases (List) numlist:
| empty => 0
| link(fst, rst) => fst + sum-list(rst)
end
end
Here are some things to notice about the two pieces of code:
The Python version needs a variable (here
run_total
) to hold the result of the computation as we build it up while traversing (working through) the list.The initial value of that variable is the answer we returned in the
empty
case in Pyret.The computation in the
link
case of the Pyret function is used to update that variable in the body of thefor
.After the
for
has finished processing all items in the list, the Python version returns the value in the variable as the result of the function.
23.8.1.1 An Aside on Order of Processing List Elements
There’s another subtlety here if we consider how the two programs run:
the Python version sums the elements from left to right, whereas
the Pyret version sums them right to left. Concretely, the sequence of
values of run_total
are computed as:
run_total = 0
run_total = 0 + 5
run_total = 5 + 1
run_total = 6 + 7
run_total = 13 + 3
sum_list([list: 5, 1, 7, 3])
5 + sum_list([list: 1, 7, 3])
5 + 1 + sum_list([list: 7, 3])
5 + 1 + 7 + sum_list([list: 3])
5 + 1 + 7 + 3 + sum_list([list:])
5 + 1 + 7 + 3 + 0
5 + 1 + 7 + 3
5 + 1 + 10
5 + 11
16
+
in the
link
case can only reduce to an answer once the sum of the rest
of the list has been computed. Even though we as humans see the chain
of +
operations in each line of the Pyret unrolling, Pyret sees
only the expression fst + sum-list(rst)
, which requires the
function call to finish before the +
executes.In the case of summing a list, we don’t notice the difference between the two versions because the sum is the same whether we compute it left-to-right or right-to-left. In other functions we write, this difference may start to matter.
23.8.2 Using For
Loops in Functions that Produce Lists
Let’s practice using for
loops on another function that
traverses lists, this time one that produces a list. Specifically,
let’s write a program that takes a list of strings and produces a list
of words within that list that contain the letter "z"
.
As in our sum_list
function, we will need a variable to store
the resulting list as we build it up. The following code calls this
zlist
. The code also shows how to use in
to check
whether a character is in a string (it also works for checking whether
an item is in a list) and how to add an element to the end of a list
(pythin{append}).
def all_z_words(wordlist : list) -> list:
"produce list of words from the input that contain z"
zlist = [] // start with an empty list
for wd in wordlist:
if "z" in wd:
zlist = [wd] + zlist
return(zlist)
sum_list
, in that we
update the value of zlist
using an expression similar to what
we would have used in Pyret. For those with prior Python
experience who would have used zlist.append
here, hold that
thought. We will get there in Sharing List Updates.Exercise
Write tests for
all_z_words
.
Exercise
Write a second version of
all_z_words
usingfilter
. Be sure to write tests for it!
Exercise
Contrast these two versions and the corresponding tests. Did you notice anything interesting?
23.8.3 Summary: The List-Processing Template for Python
Just as we had a template for writing list-processing functions in
Pyret, there is a corresponding template in Python based on
for
loops. As a reminder, that pattern is as follow:
def func(lst: list):
result = ... # what to return if the input list is empty
for item in lst:
# combine item with the result so far
result = ... item ... result
return result
Keep this template in mind as you learn to write functions over lists in Python.
24 Modifying Variables
Our list-processing functions in Traversing Lists used
a new concept that might seem minor but actually has significant
consequences in programming: they modified the value of an existing
variable (the one holding the running result during the for
loop).
In terms of programming notation, there was nothing new. We first saw
the name = value
notation for associating names with values back in
Example: Similar Flags. What is new is that we just reused a
name, thus changing the value associated with that name. Back in
Pyret, we said that once your code associated a name with a value, you
couldn’t associate that same name with a different value. In Python,
we have broken this rule, using the same name to hold the
current value of the running total, a value which changes over
time as the program runs. Pyret also allows reusing names
through its shadow
construct, but we did not need that in any
of the programs in the Foundations section.
24.1 Modifying Variables in Memory
Let’s map this out at the level of the directory, reusing our code for summing a list from the last chapter. Here is the code again:
run_total = 0
for num in [5, 1, 7, 3]:
run_total = run_total + num
The first line makes an entry in the directory:
Directory
run_total –> 0
The for
loop also sets up a directory entry, this time for
the variable num
used to refer to the list elements. Thus,
the directory appears as:
Directory
run_total –> 0 num –> 5
Inside the for
loop, we compute a new value for
run_total
. The use of =
tells Python to modify the
value of run_total
in the directory.
Directory
run_total –> 5 num –> 5
This process continues: Python advances num
to the next list
element
Directory
run_total –> 5 num –> 1
run_total
Directory
run_total –> 6 num –> 1
What is the takeaway from this? There are two:
The
=
construct associates a name with a value. If the name is not already in the directory, it gets added. If the name is already in the directory, the new value replaces the old value. The old value is no longer accessible.for
loops also introduce a name into the directory, specifically the one the programmer chose to refer to the individual list elements. Python modifies the value of that name as part of processing thefor
loop.
The ability to modify the values associated with names probably doesn’t seem like a big deal just yet, but upcoming programs will show where this ability gets interesting.
Exercise
Draw the sequence of directory contents for the following program:
score = 0 score = score + 4 score = 10
Exercise
Draw the sequence of directory contents for the following program:
count_long = 0 for word in ["here", "are", "some", "words"]: if len(word) > 4: count_long = count_long + 1
24.2 Modifying Variables Associated with Lists
Having the ability to modify values helps us imagine more interesting applications. Think about our to-do list example from Data with Components. We learned how to create to-do items and created an example list of to-do items. In the real-world, our set of to-do items changes frequently. We add new items and remove completed ones. Sometimes, we have to search for a to-do item on our list (say to check its due date). In other words, the contents of our to-do list changes over time. If we want to keep our to-do list current, then, we’ll want to store it in a variable whose value we modify over time.
Let’s build up to managing to-do lists (which are lists containing dataclass items) by starting with a simpler example: managing votes cast through a poll, survey, or election. Each option will be a string. We’ll need functions that let people cast votes, as well as functions to tally the votes cast for specific options and functions to determine which option had the most votes. (We’ll return to the to-do list example in Modifying Structured Data.)
As a concrete example, let’s assume that students are responding to a poll about their favorite desserts. We’ll start by setting up an initial (empty) list of the votes that have been cast:
theVotes = []
This would create an entry in the directory:
Directory
theVotes –> []
Do Now!
Write a sequence of 2 commands: one to add "fruit" to
theVotes
and another to add "churros".
theVotes = ["fruit"] + theVotes
theVotes = ["churros"] + theVotes
Do Now!
Draw the directory after both of these modifications have been made.
Directory
theVotes –> ["churros", "fruit"]
Do Now!
Why are new items at the front of the list? What would need to change to put new votes at the end of the list instead?
Exercise
Write a function
count_votes_for
that takes the name of a dessert and returns the number of times that the given dessert appears intheVotes
.
Exercise
Imagine that you were now asked to determine which option received the most votes. Develop two different task plans (Task Plans) that would perform this computation. You don’t have to write the code, just the plans.
24.3 Writing Functions that Modify Variables
Our two modification commands are (not surprisingly) the same except for the vote that we are casting. Seeing as we expect many votes to be cast, it would make sense to turn this repeated code into a function. As we did in Pyret, we create a parameter name for the information that differs across the commands, and make the remainder of the commands the body of the function. Here’s the proposed code:
def cast_vote(item: str):
"add given item to the theVotes list"
theVotes = [item] + theVotes
If this function were the only code in your file, Python would not
accept it. It would complain that the theVotes
variable is
unbound (which means it isn’t associated with a name). We need to have
created an initial value of theVotes
so that it exists in
the directory. Earlier in this section, we started with an initial
empty list of votes. We have to include that in our file.
theVotes = []
def cast_vote(item: str):
"add given item to the theVotes list"
theVotes = [item] + theVotes
theVotes
is unbound. But we just defined
it. What could be the problem here?The problem is has to do with where the variable is
defined. Here, we are trying to modify the variable inside a
function, but we created the variable outside the
function. When we wrote sum_list
, the variable with the
running total was created and modified within the same function. Here, the
theVotes
variable is defined outside the function. Why
might this matter? The answer lies in the directory.
24.3.1 The global
annotation
Whenever a function gets called, the
programming language creates a new directory area with a space for
local information. If we create an empty list of votes then call
cast_vote("fruit")
, the directory has the following contents
just before we modify the list:
Directory
theVotes –> []
***** area for cast_vote ***** item –> "fruit" ******************************
sum_list
, for example,
we want to sum each list independently, without the results of summing
one list affecting the sum reported for another.Sometimes, however, we want data from one function call to
affect later calls. That is, in fact, the whole point of having a
variable where we maintain our current list of votes. We therefore
have to tell Python that we want to modify the value of the
theVotes
variable that is outside of the local
area. The Python keyword global
achieves that:
theVotes = []
def cast_vote(item: str):
"add given item to the theVotes list"
global theVotes
theVotes = [item] + theVotes
The global
annotation is used with the name of a variable
that was defined outside the function that we want this function to
modify. You can’t use this to modify the values of variables that were
introduced in other functions, just ones that were defined at the same
level as the functions in your file (the so-called "top" level).
24.4 Testing Functions that Modify global
Variables
We developed cast_vote
by making a function from two commands
with shared content. We didn’t write tests for it though. Let’s add a
testing function now.
Do Now!
Try to write a test case for
cast_vote
. Write down either a concrete test (inpytest
form) or write down a question whose answer would help you figure out how to write a test.
Following the form of tests that we developed in Examples and Test Cases, we expect to write something like
def test_cast_vote():
assert cast_vote("basbousa") == ...
...
? That should be the result we expect
from calling the function, but our cast_vote
function doesn’t
return anything! It simply modifies the variable. We could replace the
...
with None
(Python’s value for "nothing"), but
that doesn’t tell us anything about whether the function actually has
the effect that we expected. We need a different testing strategy for
functions that modify variables but don’t return anything.The strategy is to test for the effect that we expected, not the value that we expected.
Do Now!
What effect do we expect
cast_vote("basbousa")
to have? It modifiestheVotes
, so refer totheVotes
or"basbousa"
in your answer.
One claim that directly follows from the problem statement would be
"theVotes
should contain "basbousa"
". Can we turn
this into code to form a test case? Yes, and here’s the code:
def test_cast_vote():
cast_vote("basbousa")
assert "basbousa" in theVotes
This example shows the essence of testing functions that modify
variables. Rather than put the function call in the assert
statement, we call the function first, then write an assert
that uses the variable itself to check for the expected change.
Do Now!
Are there other expected changes that we should check? Can you come up with a
cast_vote
function that would pass this test but not do what we expected?
Imagine that someone wrote the following cast_vote
function
instead:
theVotes = []
def cast_vote(item: str):
"add given item to the theVotes list"
global theVotes
theVotes = [item]
This modifies theVotes
to include the given item, but it
accidentally throws away the rest of the list! Our current
test_cast_vote
would approve of this version, which suggests
that additional tests are needed.
Do Now!
What other expectations do you have about
theVotes
after the function runs? Can you capture those asassert
statements about the variable?
One expectation we might have is that theVotes
would be 1
element longer after the call than it was before the call. Here’s how
to augment test_cast_vote
to include this assertion:
def test_cast_vote():
pre_length = len(theVotes)
cast_vote("basbousa")
assert "basbousa" in theVotes
assert len(theVotes) == pre_length + 1
The strategy here is to use a local variable to save the length of
theVotes
before calling cast_vote
, then use that
saved value to check the expected impact after the call.
Do Now!
Are there other impacts to consider? Could a program satisfy our current two tests and still not do what we expect?
If we start imagining worst-case scenarios, we could envision a
malicious voting machine that doubles the impact of a vote by removing
an old vote and replacing it with two copies of the newly cast
vote. Here’s code that wouild do that (the list[1:]
returns
the rest of list
. in general list[x:y]
takes the sublist from
index x
up to but not including index y
).
def cast_vote(item: str):
"add given item to the theVotes list"
global theVotes
theVotes = [item] + [item] + theVotes[1:]
Okay, yes, this could happen, but isn’t this sort of idea generation getting a little excessive? Yes and no. If you are writing test cases for a voting machine, you absolutely should be thinking about situations like this (because people trying to influence elections might try them). The ability to think about how adversaries might abuse a system is a valuable skill (and you can build a career around doing this well). But it does feel a bit open-ended, especially in a first programming course.
Fortunately, we have a much more systematic way to raise the
robustness of our tests. Recall that we asked you to write a function
called count_votes_for
as an exercise. Since both
cast_vote
and count_votes_for
work with the
theVotes
variable, we should check that the results of
count_votes_for
are what we would expect after
cast_vote
modifies the variable.
Do Now!
Modify
test_cast_vote
to also check that the results ofcount_votes_for
change as expected.
Here’s a revised test function:
def test_cast_vote():
pre_length = len(theVotes)
pre_count = count_votes_for("basbousa")
cast_vote("basbousa")
assert "basbousa" in cast_vote
assert len(theVotes) == pre_length + 1
assert count_votes_for("basbousa") == pre_count + 1
To be careful, we also should test that the vote count has not
changed for a dessert other than "basbousa"
.
Do Now!
Add a test for a dessert other than
"basbousa"
.
One challenge here is that we don’t know what other votes have been cast. From the perspective of this test, only one vote gets cast, so a test for any other dessert might only check desserts that aren’t in the list.
A better test function would set theVotes
to have specific
contents, to give us more data to work with.
def test_cast_vote():
theVotes = ["fruit", "churros", "fruit", "basbousa"]
pre_length = len(theVotes)
pre_count = count_votes_for("basbousa")
pre_fruit = count_votes_for("fruit")
cast_vote("basbousa")
assert "basbousa" in cast_vote
assert len(theVotes) == pre_length + 1
assert count_votes_for("basbousa") == pre_count + 1
assert count_votes_for("fruit") == pre_fruit
Do Now!
This code has fixed the content of
theVotes
. This suggests that we no longer need to save information about the results of functions before callingcast_vote
. We could edit theassert
statements to use manually-computed values (such as3
in place ofpre_length
). Is this a good idea?
You certainly could remove the pre_
definitions and just use
concrete numbers in the assert
statements. As you are getting
started, you may prefer to do that. However, the benefit to the
current setup is that we could modify the initial value of
theVotes
(if we think of another important case to check)
without then having to modify all of the assert
statements. The current setup also lets someone else read your
assert
statements and clearly understand the effect being
tested: having the right side of ==
say pre_count +
1
is more informative than a concrete number like
2
. Remember: examples and code are meant to be read by other people!
24.4.1 The Internal Structure of a Test Function
This testing function seems to be getting a bit long. If we look closely, however, we’ll see it has some structure to it, which we’ll highlight by adding some blank lines and labeling strings between key testing tasks. (Remember that Python uses strings as a commenting mechanism.)
def test_cast_vote():
"SETUP"
theVotes = ["fruit", "churros", "fruit", "basbousa"]
"SAVE CURRENT VALUES"
pre_length = len(theVotes)
pre_count = count_votes_for("basbousa")
pre_fruit = count_votes_for("fruit")
"PERFORM MODIFICATIONS"
cast_vote("basbousa")
"CHECK EFFECTS"
assert "basbousa" in cast_vote
assert len(theVotes) == pre_length + 1
assert count_votes_for("basbousa") == pre_count + 1
assert count_votes_for("fruit") == pre_fruit
The first section sets up our data. The second saves any current
values that are needed for the tests. The third makes the function
calls that we want to test. The last checks the effects with the
assert
statements. Viewed this way, the long function seems
more manageable. The labels also remind us of the key tasks that go
into testing functions that modify variables.
24.4.2 Takeaways on Testing Modifications
What are the takeaways from all of this?
When testing a function that modifies variables, we test assertions about the effects of the function.
A systematic way to identify effects to test is to leverage other functions that use the variable that’s being modified. Testing basic properties like length can also be useful.
There are four tasks involved in testing a function that modifies variables: setting up the data, saving current values, calling the function to test, and checking the effects.
Does it feel like testing has gotten more complicated now that variables are being modified? It has. When there are no variables that are shared and modified across functions, we can test each function individually. But once one function modifies the value of a variable that other functions also use, we have to make sure that the modifying function isn’t breaking the expected behavior of the others. Variables are powerful, but that power comes with responsibility.
Strategy: Testing a Function that Modifies Data
In prose, write down the effects that you expect the modification to have, both on traits of the variable’s value and on the results of other functions that use the variable. Remember to consider situations that should not be effected by the modification.
Create a test-function skeleton with areas for setup, saving current values, calling the function, and checking the effects.
In the setup section, set the value of the variable to a concrete value for testing.
In the calling-the-function section, write the function call that you want to test.
Turn each of the effects that you want to check into an
assert
statement, naming and storing computations of any values from before the test was run in the saving-current-values section.
25 Modifying Structured Data
In Modifying Variables, we learned how to modify variables, working with a list of votes. We deferred managing and modifying to-do items, as introduced inef["python-data-with-components"]. We now return to managing to-do lists.
As a reminder, here is the dataclass for ToDoItem
, along with
examples of the data.
@dataclass
class ToDoItem:
descr : str
due : date
tags : list
milk_item = ToDoItem("buy milk", date(2020, 7, 27), ["shopping", "home"])
grading_item = ToDoItem("grade hwk", date(2020, 7, 28), ["teaching"])
paper_item = ToDoItem("meet students", date(2020, 7, 26), ["research"])
todo_list = [grading_item, milk_item, paper_item]
Managing a to-do list involves two different kinds of modifications: modifying the list of items (which we introduced in Modifying Variables) and modifying details of individual entries (such as revised due dates). We’ll start with the latter.
25.1 Modifying Fields of Structured Data
In Python, we modify a field of a dataclass by using = to give a new value to the field within the value. For example:
milk_item.due = date(11, 15, 2020)
paper_item.descr = "meet Jack"
Once we do this, the original value is gone, replaced by the new value. We can see this by accessing the field (or by looking in the dictionary window if your programming environment has that feature):
print(milk_item.due)
Do Now!
Check the value of
todo_list
. Doesmilk_item
have the original date or the new date?
milk_item
, which was in the
list, todo_list
shows the new date.Let’s now write a function to modify the date in a ToDoItem
.
def modify_duedate(forItem : ToDoItem, new_date : date):
"""change due date on given item to the new date"""
forItem.due = new_date
What would the test for this look like? In Takeaways on Testing Modifications we discussed how to write tests for functions that modify data and variables. How should those lessons apply when we are modifying fields of data? As with variable modifications, we are testing the effect of a function, so the same rules apply. Let’s apply the testing strategy from that section:
Do Now!
What effects do you expect
modify_duedate
to have? Remember to include situations that should not be effected.
For modify_duedate
, we expect the due
field to
change, but the other fields of the given item should remain
unchanged. We haven’t written any other functions that use due dates,
so the field tests will suffice for now. To reinforce the parts of the
test, we include the testing task labels (but these would not normally
be included unless you find them useful).
def test_modify_duedate():
"SETUP"
item = ToDoItem("register", date(2020, 11, 9), ["school"])
"PERFORM MODIFICATIONS"
modify_duedate(item, date(2020, 11, 15))
"CHECK EFFECTS"
assert item.due == date(2020, 11, 15)
assert item.descr == "register"
assert item.tags == ["school"]
Here, we wrote a test that used concrete values from the original object rather than include a "save current values" section. Here, the current values are simply basic data from our setup data, not the result of other functions. In this case, the concrete-values approach better matches the complexity of the testing task.
25.2 Modifications to Shared Data
Imagine that you have a very important phone call to make, so you made multiple to-do items to make sure you don’t miss it. By the time you made the third item, you were tired of typing, so you decided to reuse the definition of the second one, just with a new name:
call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2
Do Now!
Draw the program directory for this code
Directory
call1 –> ToDoItem("call Fred", date(2020, 11, 19), ["urgent"]) call2 –> ToDoItem("call Fred", date(2020, 11, 19), ["urgent"]) call3 –> ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
The directory indicates that all three calls are the same. We can confirm this by running the following commmads:
| |
|
| |
|
Fred is unexpectedly out of the office, so now you need to call Tina instead. You decide to modify the call reminders to reflect this change.
Do Now!
Write a command to change the description within
call1
to"call Tina"
instead of"call Fred"
(we’ll get to the other two in a moment).
With what we have learned, we might have written either of the following two commands:
call1 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])
call1.descr = "call Tina"
In all three cases, the resulting program directory now associates
call1
with a call to Tina. The first one creates an
entirely new ToDoItem
and modifies the value of
call1
to this new item. The second one modifes a field within
the existing ToDoItem
. The same value prints out in each
case, so the difference doesn’t seem important.
Let’s similarly update call2
and call3
.
Do Now!
What do you expect to be printed if you run the following sequence of commands?
call2
call3
call2 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])
call2
call3
After you’ve written down your hypothesis, run the commands and see what happens. Is it what you expected?
Hopefully you observed that call2
and call3
no
longer have the same value. On the one hand, this seems reasonable: we
only explicitly updated call2
. On the other hand, we appeared
to define a relationship between call2
and call3
when we wrote call2 = call3
, and we might have expected that
relationship to be maintained on update.
Let’s repeat the exercise, this time using one of our other
approaches. Reset call1
, call2
and call3
to
their original values and try the second of our possible modification
commands.
Do Now!
Evaluate the following commands. What do you notice about the results?
>>> call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"]) >>> call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"]) >>> call3 = call2 >>> call2 >>> call3 >>> call2.descr = "call Tina" >>> call2 >>> call3
Hopefully you now noticed that call2
and call3
were
both modified, even though we only explicitly modified
call2
. To be safe, let’s check the value of call1
.
Do Now!
Evaluate
call1
and compare its value to that ofcall2
:>>> call1 >>> call2
Modifying call2
seems to have only affected call3
,
not call1
. This suggests that the relationship between
call2
and call3
has indeed been recorded, and that
some modification approaches maintain that relationship while others
don’t. But what’s actually going on? This seems like the kind of
question that we usually turn to the program directory to answer.
As it turns out, there is more to the directory than we have explained
thus far. The directory organization that we have used until now
doesn’t reflect the relationship between call2
and
call3
. We need to refine our directory to capture connected
variables, then return to the question of modifying data.
25.3 Understanding Memory
Every time you use a constructor to create data, your programming environment stores in the memory of your computer. Memory consists of a (large) number of slots. Your newly-created datum goes into one of these slots. Each slot is labeled with an address. Just as a street address refers to a specific building, a memory address refers to a specific slot where a datum is stored. Memory slots are physical entities, not conceptual ones. A computer with a 500GB hard drive has 500 billion slots in which it can store data. Not all of that memory is available to your programming environment: your web browser, applications, operating system, etc all get stored in the memory. Your programming environment does get a portion of memory to use for storing its data. That portion is called the heap.
When you write a statement like
call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
ToDoItem
into a
physical slot in the heap, then associates the address of that
slot with the variable name in the directory. The name in the
directory doesn’t map to the value itself, but rather to the
address that holds the value. The address bridges
between the physical storage location and the conceptual name you want
to associate with the new datum. In other words, our directory really
looks like:Directory
call1 –> 1001
Heap
1001 :
ToDoItem("call Fred" ...)
This revised directory-plus-heap information explains what we observed
earlier about call2
and call3
. Making a new
ToDoItem
and naming it call2
updates the directory
and heap as follows:
Directory
call1 –> 1001 call2 –> 1002
Heap
1001 :
ToDoItem("call Fred" ...)
1002 :
ToDoItem("call Fred" ...)
What happens now when we evaluate call3 = call2
? This
statement evaluates just as it did before: we create a directory entry
for call3
and associate it with whatever call2
maps
to in the directory:
Directory
call1 –> 1001 call2 –> 1002 call3 –> 1002
Heap
1001 :
ToDoItem("call Fred" ...)
1002 :
ToDoItem("call Fred" ...)
With the heap articulated separately from the directory, we now see
the relationship between call3
and call2
: they
refer to the same address, which in turn means that they refer to the
same value. call1
, however, refers to a different address
(that just happens to contain a similar value).
25.4 Modifications to Structured Data, Revisited
With the directory/heap distinction in hand, let’s revisit our two
candidate approaches to modifying call2
to call Tina instead
of Fred. As a reminder, here was our first approach:
call2 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])
call2
mentioning Tina but not
call3
. Let’s reconstruct the behavior in the directory and
heap.The use of the ToDoItem
constructor creates a new datum in
the heap, storing it in an unused address. The assignment
call2 =
associates call2
with the address of the
new datum. Thus, we’d end up with the following directory/heap structure:
Directory
call1 –> 1001 call2 –> 1003 call3 –> 1002
Heap
1001 :
ToDoItem("call Fred" ...)
1002 :
ToDoItem("call Fred" ...)
1003 :
ToDoItem("call Tina" ...)
call2
has been changed in the directory to
refer to 1003. The directory entry for call3
was not
affected (it’s still 1002).In contrast, return to the original directory/heap, and consider what happens if we instead perform the modification via
call2.descr = "call Tina"
There is no call to a constructor on the right side of the =
,
so new item gets created in the heap. On the left, we see
call2.descr
. We said earlier that we could read the
.
as “dig into”. At the level of memory, this means
“find the value associated with call2
in the heap and go to the
descr
field within the datum”. Because of the .
,
we update the descr
field within the heap itself. In
other words, we end up with the following directory/heap contents:
Directory
call1 –> 1001 call2 –> 1002 call3 –> 1002
Heap
1001 :
ToDoItem("call Fred" ...)
1002 :
ToDoItem("call Tina" ...)
call2
and call3
, which refer to the same
address 1002, get the modified description.The takeaway from this is that =
behaves differently
depending on whether the left side is a plain variable name or a field
reference off of a variable name. The former modifies the
directory. The latter modifies the heap. Under the latter, all
variables or fields that map to the same address will be affected by a
change to one of them. Under the former, only the value of the named
variable changes. Whether you want a change to affect only the current
variable or several is a design decision that we will practice making
going forward.
25.5 Basic Data in Memory
We have shown how to place structured data in our new directory/heap structure, but what about basic data? For example, what if our original program to set up the call items also included the following two lines:
x = 4
y = 5
The corresponding directory/heap contents would be as follows:
Directory
call1 –> 1001 call2 –> 1002 call3 –> 1002 x –> 4 y –> 5
Heap
1001 :
ToDoItem("call Fred" ...)
1002 :
ToDoItem("call Tina" ...)
This shows that basic data live in the directory, not the heap. The whole point of structured data is that they have both their own identity and multiple components. The heap gives access to both concepts. Basic data can’t be broken down (by definition). As such, there is nothing lost by putting them only in the directory.
What about strings, however? We’ve referred to them as basic data until now, but doesn’t they have components, in terms of the sequence of characters that make up the string? Yes, that is technically accurate. However, we are treating strings as basic data because we aren’t using operations that modify that sequence of strings. This is a fine point, one that usually comes up in later CS courses. This book will leave strings in the directory, but if you are writing programs that modify the internal characters, put them in the heap instead.
Do Now!
Edit the directory and heap to show what happens if we run the following statement:
x = 10
call1
, we evaluate the
right side of the =
and store the result in the
directory. Thus, x --> 10
would be in the directory.Do Now!
Edit the directory and heap to show what happens if we run the following statement:
x = y
Follow the same rule: evaluating y
yields value 5
,
which we then associate with x
, yielding the following:
Directory
call1 –> 1001 call2 –> 1002 call3 –> 1002 x –> 5 y –> 5
Heap
1001 :
ToDoItem("call Fred" ...)
1002 :
ToDoItem("call Tina" ...)
Do Now!
If we now evaluate
y = 3
, does the value ofx
change?
It does not. The value associated with y
in the directory
changes, but there is no connection between x
and y
in the directory. The =
command does associate the value
mapped to y
with x
, but that is not the same as
having the two variables track each other.
This is one of the most common misconceptions in introductory
programming. Statements like var1 = var2
do not set up
relationships that persist between variables over time. Moreover,
these statements are not commutative, meaning that var1 =
var2
and var2 = var1
do not make the same modifications to
the directory. The variable on the left of =
is reassigned to
a new value in the directory, and that value comes from evaluating
whatever is on the right. If the expression on the right of =
is just a variable, then the value is retrieved from the directory and
associated with the variable on the left.
Is there any way to get two variables to track each other’s values?
No, but as we saw with call2
and call3
, it is
possible for two variables to reference the same structured datum, and
thereby share modifications. We’ll see more examples of this as we go.
25.6 Variables and Equality
Let’s return one last time to our original collection of calls:
Directory
call1 –> 1001 call2 –> 1002 call3 –> 1002
Heap
1001 :
ToDoItem("call Fred" ...)
1002 :
ToDoItem("call Fred" ...)
Do Now!
If someone asked you whether
call2
andcall3
are the same, what would you say? What about if they asked aboutcall3
andcall1
?
On the one hand, all three variables refer to ToDoItem
s with
the same contents. But only two of them are at the same
address. When we say “equals”, which one do we mean?
Both notions are useful, and indeed most languages will give you two
different equality operators: one for equality of address, and one for
equality of (component) values. In Python, we use is
for
address equality and ==
for equality of values. If you did
Re-Examining Equality, you saw this difference in Pyret as <=>
versus
==
. (If you did not do anything in Re-Examining Equality, you weren’t exposed
to this concept in Pyret.)
As we go forward, you’ll get more practice with when to use each kind
of equality. The ==
operator is more accepting, so it is
usually the right default. If you actually need to know whether two
expressions refer to the same address, you can switch to is
.
25.7 Takeaways on Modifying Variables
Strategy: Rules for updating the directory andthe heap
Summarizing, the rules for how the directory and memory update are as follows:
We add to memory when a data constructor is used
We update memory when a field of existing data is reassigned
We add to the directory when a name is used for the first time (this includes parameters and internal variables when a function is called)
We update the directory when a name that is already in the directory is reassigned to a different value)
Ultimately, this book is trying to teach you how to work with various kinds of information that might show up in a real project. Up until now, we’ve focused on the structure of information (tables vs lists vs datatypes/dataclasses) and how to write programs to process those structures.
But now, we’re starting to consider an additional question: does any of our data need to be shared across different parts of the program, such that multiple parts of the program might need to change the data and see the changes? In this chapter, we saw how dataclasses work when they are shared. Can we also share lists? What does that look like? We turn to that, as well as how lists are stored in the heap, in the next chapter.
26 Revisiting Lists and Variables
26.4.2 Revisiting Variables: A Function to Create Accounts for New Customers |
26.1 Sharing List Updates
In Modifying Structured Data, we saw that we can have two names that refer to the same dataclass value in memory. In this way, changes made to one are visible in the other. Put differently, we can have two names that refer to the same datum. This same ability could also be useful for lists. For example, you and a roommate might share a shopping list, the registrar and the financial aid office might share a list of students, and so on. In both examples, an list update made by one person or office should be visible to the other. How do we implement this in Python?
More concretely, assume that Shaunae and Jonella are roommates who take turns shopping for groceries. Each of them has access to a shared shopping list (through their phones): any time one of them adds or removes an item, the other person should see the change. How could we mock this up in code?
Do Now!
Create a shopping list for Shaunae with a couple of items, then give the same list a second directory name corresponding to Jonella.
You might have written something like the following:
shaunae_list = ["bread", "coffee"]
jonella_list = shaunae_list
Do Now!
Jonella realizes they are also out of eggs. Run an command at the prompt to add
"eggs"
to Jonella’s list, then make sure"eggs"
appears under both names.
For
Loops in Functions that Produce Lists, perhaps you
wrote something like>>> jonella_list = ["eggs"] + jonella_list
>>> jonella_list
["eggs", "bread", "coffee"]
>>> shaunae_list
["bread", "coffee"]
+
to
combine two lists, Python creates a new list out of the elements of
the existing lists. Since the left side of the =
is a name in
the directory, Python modifies the directory to refer to this new
list. It does not maintain the relationship between
jonella_list
and shaunae_list
.Surely sharing lists is a common enough pattern that there is a way to do this, however. There is, but it requires us to work with lists in a different way than we first saw in Processing Lists.
26.1.1 Operations that Mutate Lists
So far, we have used +
to append two lists together. This
operation creates a new list. Python offers a different operation
(called append
) for modifying an existing list to include a
new element. Here’s a different way we could have added
"eggs"
to Jonella’s list.
>>> jonella_list.append("eggs")
>>> jonella_list
["bread", "coffee", "eggs"]
>>> shaunae_list
["bread", "coffee", "eggs"]
jonella_list
in
the directory, follows the .
over to the list datum in the
heap, then operates on the list itself to add "eggs"
. Unlike
in the previous version, we don’t use =
to assign a new value
to jonella_list
. The append
updates the list in the
heap. Since jonella_list
and shaunae_list
refer to
the same address in memory, the addition via jonella_list
also shows up when we access the list through the name
shaunae_list
.Similarly, if Shaunae buys "coffee"
, she can remove it from
the shared list using the remove
operator:
>>> shaunae_list.remove("coffee")
>>> jonella_list
["bread", "eggs"]
>>> shaunae_list
["bread", "eggs"]
The point here is that both append
and remove
modify
the contents of the list in the heap. If two names refer to the same
heap address, changes made through one name appear through the other.
Does this mean we always want to modify lists using append
and remove
? Not at all! Sometimes, we will want to build new
lists rather than modifying old ones (we’ll see some cases of this
later). If you want a modification to be visible through all names
that refer to a list, use append
. If you only want the
addition of an element to be visible through one name, use +
.
26.2 Lists in Memory
In Modifying Structured Data, we drew directory-heap diagrams
to show how dataclass values appear in the heap. What do lists look
like in the heap? In Python, lists are stored by putting a
dataclass-like datum into memory that contains the number of elements
in the list, followed by individual elements in the subsequent
addresses. For example, here’s what Shaunae and Jonella’s shopping
lists look like after adding "eggs"
:
Directory
shaunae_list –> 1001 jonella_list –> 1001
Heap
1001 :
List(len:3)
1002 :
"bread"
1003 :
"coffee"
1004 :
"eggs"
There are no directory entries to the list elements, but the fact that
the List
datum reflects size 3
tells us that the
next three addresses are part of the list.
Do Now!
Work out the memory diagram for the following program:
scores = [89,72,92] colors = ["blue", "brown"] scores.append(83)
Does this raise any questions?
After defining the first two lists, your memory diagram should have looked like (perhaps with a different starting address):
Directory
scores –> 1005 colors –> 1009
Heap
1005 :
List(len:3)
1006 :
89
1007 :
72
1008 :
92
1009 :
List(2)
1010 :
"blue"
1011 :
"brown"
What now happens when we append 83
to the scores
list? Since lists are supposed to have all of their elements in
consecutive memory slots, 83
should go into
@1009
, but that address is already occupied! We shouldn’t
destroy the colors
list just because we added to the
scores
list.
In practice, lists are created with some extra space to allow for later additions. Here’s a more accurate diagram:
Directory
scores –> 1005 colors –> 1014
Heap
1005 :
List(len:3, slots:8)
1006 :
89
1007 :
72
1008 :
92
1009 :
None
1010 :
None
1011 :
None
1012 :
None
1013 :
None
1014 :
List(len:2, slots:8)
1015 :
"blue"
1016 :
"brown"
1017 :
None
1018 :
None
1019 :
None
1020 :
None
1021 :
None
1022 :
None
The None
value is what Python uses to indicate "there’s
nothing here". Note that the List
datum has also changed: it
how has two fields, one for the number of elements, and another for
the number of slots. We also modified the directory to reflect the new
address for the scores
list.
There’s now room to perform the append
on the scores
list. The 83
would get inserted into @1009
and
the len
field of the List
would get updated to
4
.
Couldn’t a list still run out of room though? Absolutely. When that happens, the whole list gets moved to a new sequence of addresses with more room. The details of this are a more advanced topic. For now, the key takeaway for you is that lists are arranged in consecutive memory addresses. This layout will be particularly relevant when we get to Hashtables and Dictionaries.
26.3 Practice: Data for Shared Bank Accounts
Let’s apply what we’ve been learning about sharing modifications to a real-world problem: having bank accounts shared by multiple people (such as partners or spouses). A bank wants to allow multiple customers to manage the same account (meaning each person deposits to or withdraws from the same shared balance). Our task is to define data structures for managing bank customers and their (shared) accounts.
How might you set up the data about customers and their accounts? To keep things simple, we’ll limit the information about customers to their names (leaving out phone numbers, etc), and information about accounts to their balances (leaving out interest rates, etc).
In a full banking system, we might want to allow customers to be part of multiple accounts as well. This would suggest the following data structure:
@dataclass
class Account:
id : int
balance : int
owners : lst # of Accounts
@dataclass
class Customer:
name : str
accts : lst # of Accounts
You can have these kinds of circular connections, in which customers connect to accounts which connect to customers. In the interest of letting us focus on basic sharing and memory, however, let’s simplify our data as follows: accounts don’t track who their customers are, and each customer has only one account:
@dataclass
class Account:
id : int
balance : int
@dataclass
class Customer:
name : str
acct : Account
Do Now!
We could consider making the
acct
field hold an ID number, rather than anAccount
datum? Should we do this? Why or why not?
Do Now!
Write an expression that creates a new
Customer
named Tina with a newAccount
. Give the new account ID number1
and initial balance of100
.
Do Now!
Assume we gave the name
t_cust
to theCustomer
datum created in the previous exercise. Draw the memory diagram that containst_cust
and the new data values.
In code, you should have written:
t_cust = Customer("Tina", Account(1, 100))
Directory
t_cust –> 1015
Heap
1015 :
Customer("Tina", Account(1, 100))
Do Now!
Tina wishes to deposit $50 to her account. Write a statement using
t_cust
to perform the deposit. Also show the changes that this makes to the dictionary-memory diagram.
t_cust.acct.balance = t_cust.acct.balance + 50
t_cust
(rather than to the
t_cust
variable itself), we make the update in memory
(specifically, to the balance
field).Directory
t_cust –> 1015
Heap
1015 :
Customer("Tina", Account(1, 150))
Now, let’s try a more complicated situation. Maria and Jorge are new customers who want to share an account. Sharing means that either of them can make deposits or withdrawals and both will see any changes to their shared balance.
Do Now!
Draw a memory diagram with
Customer
data for each of Maria and Jorge and a sharedAccount
with a balance of250
. Don’t write the code, just draw the diagram that the code should produce.
You should have ended up with something like the following (perhaps
with a name for the Account
, perhaps not (either works).
Directory
m_cust1 –> 1016 j_cust1 –> 1017
Heap
1015 :
Account(id:2, balance:250)
1016 :
Customer("Maria", (loc 1015))
1017 :
Customer("Jorge", (loc 1015))
Now, we need to figure out the code to produce this diagram.
Do Now!
Here are four possible proposals for code to create the shared account. Which one(s) produce the desired memory diagram? You might find it useful to draw the memory diagrams for each approach and compare them to the one that we want.
Version 1
m_cust1 = Customer("Maria", Account(2, 250)) j_cust1 = Customer("Jorge", Account(2, 250))
Version 2
m_cust2 = Customer("Maria", Account(2, 250)) j_cust2 = Customer("Jorge", m_cust2.acct)
Version 3
new_acct = Account(2, 250) m_cust3 = Customer("Maria", new_acct) j_cust3 = Customer("Jorge", new_acct)
Version 4
init_bal = 250 m_cust4 = Customer("Maria", Account(2, init_bal)) j_cust4 = Customer("Jorge", Account(2, init_bal))
We want to end up with a single account in the heap. Versions 1 and 4
create two separate Account
data, so we can rule them
out. The difference between versions 2 and 3 lies in the directory:
version 3 names the shared Account
in the directory, while
version 2 does not. Either way, the Account
is accessible
from the directory (by going through the m_cust
and
j_cust
variables.
Do Now!
Let’s assume we set up the accounts with version 3. Jorge wants to make a deposit of
100
. Which of the following lines of code are appropriate if Maria should be able to access the deposited funds?1. new_acct = Account(2, new_acct.balance + 100) 2. new_acct.balance = new_acct.balance + 100 3. j_cust3.acct = Account(2, new_acct.balance + 100) 4. j_cust3.acct.balance = new_acct.balance + 100 5. j_cust3.acct.balance = j_cust3.acct.balance + 100
For the deposit to be visible to Maria, we have to modify the account that Maria shares with Jorge. We can’t make a new account, so lines 1 and 3 are not appropriate.
Whether lines 2 and 4 work depends on whether new_acct
still
refers to the memory location of Jorge’s (and Maria’s) account. If it
does, both lines work (though the intent isn’t as clear with line 2,
which doesn’t indicate that the goal is to update Jorge’s
balance). But if new_acct
now refers elsewhere, neither line
works: line 2 puts the adjusted balance in the wrong account, while
line 4 may compute the new balance from the wrong existing balance.
Line 5 works fine.
26.4 Circular References
At the start of this chapter, we said that we wanted each
Account
to also contain a reference to the
Customer
(s) who own the account. We deferred setting up this
reference, but return to it now. Here are the desired dataclasses (to
keep things simple, we’ll allow a Customer
to have only one
Account
.
@dataclass
class Account:
id: int
balance: int
owners: list # of Customer
@dataclass
class Customer:
name: str
acct: Account
Let’s create a new Customer
with a new Account
using
these dataclasses:
new_acct = Account(5, 150, Customer("Elena", __________))
How do we fill in the blank in the Customer
? We’d like to say
new_acct
but Python (and most other languages) will raise an
error that new_acct
isn’t defined. Why is that?
When given this assignment, Python first evaluates the right side, to
get the value or memory location that should be stored in the
directory for new_acct
. If we filled in the blank with
new_acct
, Python would start by running:
Account(5, 150, Customer("Elena", new_acct))
To do this, it needs to look up new_acct
in the directory, but
that name isn’t in the directory yet (it only goes in after we
compute the value to store for that name). Hence the error.
To get around this, we leverage the ability to update the contents of
memory locations after names for data are in place. We’ll create the
Account
partially, but without filling in the
Customer
. Then we create the Customer
to reference
the new Account
. Then we update the Account
owners with the
now-created Customer
:
new_acct = Account(5, 150, []) # note the empty Customer list
new_cust = Customer("Elena", new_acct)
new_acct.owners = [new_cust]
Note here that each part gets a spot in memory and an entry in the
directory, but the datum hasn’t been finished yet. Once we have the
datum set up in memory though, we can update the owners
field to
the correct value.
Here’s what this looks like at the level of memory and the directory after running the first two lines:
Directory
new_acct -> 1016 new_cust -> 1017
Heap
1015 :
[]
1016 :
Account(5, 150, @1015)
1017 :
Customer("Elena", @1016)
Then, when we run the third line, we create a new list containing
new_cust
and update the owner list within new_acct
:
Directory
new_acct -> 1016 new_cust -> 1017
Heap
1015 :
[]
1016 :
Account(5, 150, @1018)
1017 :
Customer("Elena", @1016)
1018 :
[@1017]
Notice that the two owners lists each live in memory but aren’t
associated with names in the directory. They are only reachable going
through new_acct
, and after the update, the empty list isn’t
reachable at all.
Do Now!
Why did a new list get created (at address 1018) rather than have
new_cust
get added to the list at address 1015?
We used new_acct.owners = [new_cust]
to finish setting up the
owners
field, and the right-hand side of =
creates a
new list. If we had instead written
new_acct.owners.append(new_cust)
, then 1017 would
have gone into the list stored at 1015, as follows:
Directory
new_acct -> 1016 new_cust -> 1017
Heap
1015 :
[@1017]
1016 :
Account(5, 150, @1018)
1017 :
Customer("Elena", @1016)
Either approach works. However, not all languages give you an
operation like append
that modifies the contents of an
existing list (Pyret didn’t, at least not the portion that we saw). We
therefore chose to show you the more general approach here.
26.4.1 Testing Circular Data
When you want to write a test involving circular data, you can’t write out the circular data manually. For example, imagine that we wanted to write out new_acct from the previous examples:
test("data test", new_acct,
Account(5, 150, [Customer("Elena", Account(5, 150, ...)])
Because of the circularity, you can’t finish writing down the data. You have two options: write tests in terms of the names of data, or write tests on the components of the data.
Here’s an example that illustrates both. After setting up the account, we might want to check that the owner of the new account is the new customer:
test("new owner", new_acct.owner, new_cust)
Here, rather than write out the Customer
explicitly, we use the name
of the existing item in the directory. This doesn’t require you to write
ellipses. We also focused on just the owner
component, as a part of
the Account
value that we expected to change.
26.4.2 Revisiting Variables: A Function to Create Accounts for New Customers
What if we turned the sequence for creating dependencies between customers and their accounts into a function? We might get something like the following:
def create_acct(new_id: int, init_bal: int, cust_name: str) -> Account:
new_acct = Account(new_id, init_bal, []) # note the empty Customer list
new_cust = Customer(cust_name, new_acct)
new_acct.owners.append(new_cust)
return new_acct
This looks useful, but it does have a flaw: we could accidentally create two accounts with the same id number. It would be better to maintain a variable containing the next unused account id, which would guaranteed that a specific id gets used only once.
Do Now!
How might you modify the code to do this? Which concepts seem relevant?
The next available id keeps changing as the program runs. This suggests that we need a variable in the directory that gets modified after each account is created. Here’s how we might set that up:
next_id = 1 # stores the next available id number
def create_acct(init_bal: int, cust_name: str) -> Account:
global next_id
new_acct = Account(next_id, init_bal, [])
next_id = next_id + 1
new_cust = Customer(cust_name, new_acct)
new_acct.owners.append(new_cust)
return new_acct
Here, we create the next_id
variable to hold the next id number
to use. When we create an Account
, we update next_id
to the
next unused number. Note that we also took the new account’sid number
out of the parameters for the function. Problem solved!
26.5 The Many Roles of Variables
At this point, we have used the single coding construct of a variable in the directory for multiple purposes. It’s worth stepping back and calling those out explicitly. In general, variables serve one of the following purposes:
Tracking progress of a computation (e.g., the running value of a result in a
for
-loopSharing data across functions (e.g., casting and counting votes)
Maintaining information across multiple calls to a single function (e.g., the
next-id
variable)Naming a local or intermediate value in a computation
Each of these uses involves a different programming pattern. The first
creates a variable locally within a function. The second two create
top-level variables and require using global
in functions
that modify the contents. The third is different from the second,
however, in that the third is only meant to be used by a single
function. Ideally, there would be a way to not expose the variable to
all functions in the third case. Indeed, many programming languages
(including Pyret) make it easy to do that. This is harder to achieve
with introductory-level concepts in Python, however. The fourth is
more about local names rather than variables, in that our code never
updates the value after the variable is created.
We call out these three roles precisely because they invoke different code patterns, despite using the same fine-grained concept (assigning a new value to a variable). When you look at a new programming problem, you can ask yourself whether the problem involves one of these purposes, and use that to guide your choice of pattern to use.
26.6 Managing All Accounts
So far, we have created individual customers and accounts and named them in the directory. This is fine for a small number of accounts but a realistic bank manages tens of thousands (if not millions) of accounts. Naming individual accounts doesn’t scale with that many accounts. Instead, a bank would maintain something like a list or table of all of the accounts, with functions to locate specific accounts (say by their ID numbers). For example:
all_accts = [Account(8, 100, []),
Account(2, 300, []),
Account(10, 225, []),
Account(3, 200, []),
Account(1, 120, []),
...
]
Do Now!
Modify
create_acct
to add the newly-created account to theall_accts
list.
next_id = 1 # stores the next available id number
all_accts = [] # stores all of the accounts
def create_acct(init_bal: int, cust_name: str) -> Account:
global next_id
new_acct = Account(next_id, init_bal, [])
all_accts.append(new_acct) # <-- this is the new line
next_id = next_id + 1
new_cust = Customer(cust_name, new_acct)
new_acct.owners.append(new_cust)
return new_acct
Exercise
Draw the memory diagram that results from the following code:create_acct(100, "Tina") create_acct(250, "Elena")
When someone asks for their balance, they usually provide their
account number. This suggests that we need a function that takes an
account id and returns the Account
value with that id. From there,
we can dig into the account to get the balance.
Exercise
Write a function
find_acct
that takes an id number and returns theAccount
(fromall_accts
) with that id number.
Here’s one solution:
def find_acct(which_id : int) -> Account:
"""returns the account with the given id number"""
for acct in all_accts:
if acct.id == which_id:
return acct
raise ValueError("no account has id " + str(which_id))
def test_find():
test("match num", find_acct(3).id, 3)
test("match exactly", find_acct(1).id, 1)
testValueError("no account", lambda: find_acct(22))
This example shows how to report errors in Python. As in Pyret, there
is a raise
construct for reporting errors. In Python,
however, rather than raise just a string, you raise a particular type
of data called a ValueError
that contains the string. (There
are other kinds of errors too, but ValueError
will suffice
for what this book will do).
Exercise
Leverage
find_acct
to write the following two functions:def deposit(which_id: int, amount: double) {} def close(which_id: int) {}
As these exercises show, the find_acct
function becomes a
valuable helper for many other functions. This raises a question: how
fast is find_acct
? Here, we have to search through all of the
accounts to locate the one with a specific id before doing the
operation we actually care about. With millions of accounts, this
seems like it could get expensive. We’ll explore this more deeply in
the next chapter.
27 Hashtables and Dictionaries
At the end of Managing All Accounts, we had created a list of
Account
values and written a function to search through that
list to find the account with a specific ID number. We noted that
finding an individual account could require us to check every account
in the list (as we check accounts one at a time in the for
loop), which could start to get expensive as the bank scales to more
customers.
Let’s step back however, and make two observations about this problem:
Every account has a unique ID number
We have a function that needs to retrieve accounts based on this unique ID number
In this case, we can use a different data structure, called a hashtable or dictionary (different languages use different terms for this same concept; Python uses “dictionary”). Dictionaries are designed for precisely such a scenario: we have a set of values (the accounts) that we wish to access using a piece of information (the ID number) that is unique to each value. The piece of information that we use for access is called the key. The key does not need to be a field in the data value, but it can be (as with accounts).
To set up a dictionary, we write a series of entries of the form
key: value
Here’s what a dictionary mapping ID numbers to accounts might look like:
accts_dict = {5: Account(5, 225, []),
3: Account(3, 200, []),
2: Account(2, 300, []),
4: Account(4, 75, []),
1: Account(1, 100, [])
}
In terms of notation, we can create a dictionary by placing a
collection of key: value
pairs between curly braces,
separated by commas. The dictionary shown here has five account IDs
that map to accounts with those same ID numbers. There is no order
within a dictionary, so it doesn’t matter which order we write the
pairs within the sequence.
Now, if we want to get the account with ID 1 from the dictionary, we can simply write:
accts_dict[1]
This says "in the accts_dict
dictionary, get the value
associated with key 1
. Evaluting this expression will produce
Account(1, 100, [])
. If the key you request isn’t in the
dictionary, Python will raise an error. However, you check whether a
key is in the dictionary before trying to use it, as follows:
if 6 in accts_dict:
accts_dict(6)
If we want to change the value associated with a key, we use an assignment statement of the following form:
dictionary[key] = new_value
To add a new key (and its value) to a dictionary, we use the same notation as for updating a value, just with an unused key:
accts_dict[6] = Account(6, 150, [])
Finally, to remove a key (and its value) from the dictionary, we write
del dictionary[key]
27.1 Searching by Criteria Other than Keys
What if we wanted to find all of the accounts with balances below
100
? For this, we have to search through all of the key-value
pairs and check their balances. This again sounds like we need a
for
loop. What does that look like on a dictionary though?
Turns out, it looks much like writing a for
loop on a list
(at least in Python). Here’s a program that creates a list of the
accounts with balances below 100
:
below_100 = []
# the room variable takes on each key in the dictionary
for ID in accts_dict:
if accts_dict[ID].balance < 100:
below_100.append(accts_dict[ID])
Here, the for
-loop iterates over the keys. Within the loop,
we use each key to retrieve its corresponding Account
,
perform the balance check on the Account
, then put the
Account
in our running list if it meets our criterion.
Exercise
Create a dictionary that maps names of classrooms or meeting rooms to the numbers of seats that they have. Write expressions to:
Look up how many seats are in a specific room
Change the capacity of a specific room to have 10 more seats than it did initially
Find all rooms that can seat at least 50 students
27.2 Dictionaries with More Complex Values
Do Now!
A track-and-field tournament needs to manage the names of the players on the individual teams that will be competing. For example, “Team Red” has “Shaoming” and “Lijin”, “Team Green” contains “Obi” and ”Chinara”, and “Team Blue” has “Mateo” and “Sophia”. Come up with a way to organize the data that will allow the organizers to easily access the names of the players on each team, keeping in mind that there could be many more teams than just the three listed here.
This feels like a dictionary situation, in that we have a meaningful key (the team name) with which we want to access values (the names of the players). However, we have already said that dictionaries allow only one value per key. Consider the following code:
players = {}
players["Team Red"] = "Shaoming"
players["Team Red"] = "Lijin"
Do Now!
What would be in the dictionary after running this code? If you aren’t sure, try it out!
How do we store multiple player names under the same key? The insight here is that the collection of players, not an individual player, is what we want to associate with the team name. We should therefore store a list of players under each key, as follows:
players = {}
players["Team Red"] = ["Shaoming", "Lijin"]
players["Team Green"] = ["Obi", "Chinara"]
players["Team Blue"] = ["Mateo", "Sophia"]
The values in a dictionary aren’t limited to being basic values. They can be arbitrarily complex, including lists, tables, or even other dictionaries (and more!). There is still only one value per key, which is the requirement of a dictionary.
27.3 Using Structured Data as Keys
Using structured data as keys is also possible, but it’s more subtle
than using structured data as values. This has to do with how
dictionaries work. Dictionaries depend on the keys being
values that cannot be internally modified. What do we mean? An entire
Account
datum, for example, could not be used as a key
because we want to be able to modify the balance, which is a field
within the account.
A dataclass value can be used as a key as long as the fields will never be modified. For example, let’s return to the earlier example that mapped room names to their numbers of seats. Imagine that the office that maintains rooms wants to be able to access rooms by ranges of seats (such as 0-29, 30-49, and 50-100). In this case, they would want to make a dataclass for a range and use that as a key.
@dataclass
class Range:
low: int
high: str
{ Range(0,29): ["West Wing 200", "North Wing 110"],
Range(30,49): ["North Wing 150", "South Wing 320"],
...
}
Since the fields of a range aren’t going to change, this is a
reasonable datum to use as a key. To make Python accept this
dictionary, however, we have to indicate that we will not be modifying
the field values of Range
. You do this with an additional
piece on the dataclass
annotation:
@dataclass(frozen=True)
class Range:
low: int
high: str
The frozen annotation says "the components of this class cannot be
changed". If you try to assign to a component of a frozen class such
as Flight, Python will produce an error.
Tthere actually are ways to use values with changing components as
keys, but they are more advanced than this book covers. If you want to
read details for yourself, look up _hash_
methods in Python.