12.1 Modifying Variables
12.1.1 Modifying Variables in Memory
Now that we have introduced the idea of the heap, let’s revisit our use of a variable to compute the sum of elements in a list. Here again is our code from earlier:
run_total = 0
for num in [5, 1, 7, 3]:
run_total = run_total + num
Let’s see how the directory and heap update as we run this code. In
Basic Data and the Heap, we pointed out that basic data (such as
numbers, strings, and booleans) don’t get put in the heap because they
have no internal structure. Those values are stored in the directory
itself. Therefore, the initial value for run_total
is stored
within the directory.
Directory
run_total
→0
The for
loop also sets up a directory entry, this time for
the variable num
that is used to refer to the list
elements. When the loop starts, num
takes on the first value
in the list. Thus, the directory appears as:
Directory
run_total
→0
num
→5
Inside the for
loop, we compute a new value for
run_total
. The use of =
tells Python to modify the
value of run_total
.
Do Now!
Does this modification get made in the directory or the heap?
Since basic data values are stored only in the directory, this update modifies the contents of the directory. The heap isn’t involved:
Directory
run_total
→5
num
→5
This process continues: Python advances num
to the next list
element
Directory
run_total
→5
num
→1
run_total
Directory
run_total
→6
num
→1
Directory
run_total
→16
num
→3
There are two takeaways from this example:
When we use
=
to update the value associated with a variable, the variable’s entry in the directory changes to reflect the new value.For
loops introduce a name into the directory (the one the programmer chose to refer to the individual list elements). As the loop progresses, Python updates the value associated with that name to refer to each successive element.
Exercise
Draw the sequence of directory contents for the following program:
score = 0 score = score + 4 score = 10
Exercise
Draw the sequence of directory contents for the following program:
count_long = 0 for word in ["here", "are", "some", "words"]: if len(word) > 4: count_long = count_long + 1
12.1.2 Variable Updates and Aliasing
In State, Change, and Testing, we saw how a statement of the form
elena.acct.balance = 500
resulted in a change to
jorge.acct.balance
. Does this same effect occur if we update the
value of a variable directly, rather than a field? Consider the
following example:
y = 5
x = y
Do Now!
What do the directory and heap look like after running this code?
Since x
and y
are assigned basic values, there are
no values in the heap:
Directory
y
→5
x
→5
Do Now!
If we now evaluate
y = 3
, does the value ofx
change?
It does not. The value associated with y
in the directory
changes, but there is no connection between x
and y
in the directory. The statement x = y
says "get the value of
y
and associate it with x
in the
directory". Immediately after this statement, y
and
x
refer to the same value, but this relationship is neither
tracked nor maintained. If we associate either variable with a new
value, as we do with y = 3
, the directory entry for that
variable—y = 3
appears as follows:
Directory
y
→3
x
→5
This example highlights that aliasing occurs only when two variables
refer to the same piece of data with components, not when variables
refer to basic data. This is because data with components are stored
in the heap, with heap address stored in the directory. Note, though,
that uses of varname = ...
still affect the directory, even
when the values are data with components.
Do Now!
After running the following code, what is the value of
ac2.balance
?ac1 = Account(8623, 600) ac2 = ac1 ac1 = Account(8721, 350)
Draw the directory and heap contents for this program and check your prediction.
All three of these lines results in changes in the directory; the
first two result in changes in the heap, but only because we made new
pieces of data. ac1
and ac2
are alises immediately
after running the second line, but the third line breaks that
relationship.
Do Now!
After running the following code, what is the value of
ac1.balance
?savings = 475 ac3 = Account(8722, savings) savings = 500
Draw the directory and heap contents for this program and check your prediction.
Since the value of savings
is stored in ac3.balance
,
and not the name savings
itself, updating the value of
savings
on the third line does not affect ac3.balance
.
12.1.3 Updating Variables versus Updating Data Fields
We’ve now seen two different forms of updates in programs: updates to
fields of structured data in State, Change, and Testing, and updates to
the values associated with names when computing over lists with
for
loops. At a quick glance, these two forms of update look
similar:
acct1.balance = acct1.balance - 50
run_total = run_total + fst
=
operator and compute a new value on the right
side. The left sides, however, are subtly different: one is a field
within structured data, while the other is a name in the directory. This
difference turns out to be significant: the first form changes a value
stored in the heap but leaves the directory unchanged, while the
second updates the directory but leaves the heap unchanged.At this point, you might not appreciate why this difference is significant. But for now, let’s summarize how each of these forms impacts each of the directory and the heap.
Strategy: Rules for updating the directory and the heap
Summarizing, the rules for how the directory and memory update are as follows:
We add to the heap when a data constructor is used
We update the heap when a field of existing data is reassigned
We add to the directory when a name is used for the first time (this includes parameters and internal variables when a function is called)
We update the directory when a name that is already in the directory is subsequently assigned a new value)
Do Now!
After running the following code, what is the value of
ac3.balance
?ac2 = Account(8728, 200) ac3 = ac2 print(ac3.balance) ac2.balance = 500 print(ac3.balance) ac2 = Account(8734, 350) ac2.balance = 700 print(ac3.balance)
Draw the directory and heap contents for this program and check your prediction.
This example combines updates to variables and updates to fields. On
the third line, ac2
and ac3
refer to the same
address in the heap (which contains the Account
with id
8728
. Immediately after updating ac2.balance
on the
fourth line, the balance in both ac2
and ac3
is 500. Line
six, however, creates a new Account
in the heap and updates
the directory to have ac2
refer to that new
Account
. From that point on, ac2
and ac3
refer to different accounts, so the update to the balance in
ac2
on the seventh line does not affect ac3
.
This example illustrates the subtleties and impacts of different uses of
=
. Programs behave differently depending on whether the left
side of the =
is a variable name or a field reference, and on
whether the right side is basic data or data with components. We will
continue to work with these various combinations to build your
understanding of when and how to use each one.
12.1.4 Updating Parameters in Function Calls
When we first learned about the directory in [REFSEC], we showed how function calls created their own local directory segments to store any names that got introduced while running the function. Now that we have the ability to update the values associated with variables, we should revisit this topic to understand what happens when these updates occur within functions.
Consider the following two functions:
def add10(num: int):
num = num + 10
def deposit10(ac: Account)
ac.balance = ac.balance + 10
Let’s use these two functions in a program:
x = 15
a = Account(8435, 500)
add10(x)
deposit10(a)
Do Now!
What are the values of
x
anda
when the program has finished?
Let’s draw out the directory and heap for this program.
We need a way to distinguish local directories from the global one – easiest for now might be to add a form for local-env-with-heap that uses the label "Local Directory (fun name)".
After the first two lines but before the function calls, we have the following:
Directory
x
→15
a
→ 1014
Heap
- 1014:
Account(8435, 500)
Calling add10
creates a local directory containing the name
of the parameter:
Directory
num
→15
Heap
- 1014:
Account(8435, 500)
Wait – why is the heap listed alongside the local directory? Only the directory gets localized during function calls. The same heap is used at all times.
The body of add10
now updates the value of num
in
the directory to 25. This does not affect the value of x
in
the top-level directory, for the same reasons we explained in [REFSEC]
regarding the lack of aliasing between variables that refer to basic
data. Thus, once the function finishes and the local directory is
deleted, the value associated with x
is unchanged.
Now, let’s evaluate the call deposit10(a)
. As with
add10
, we create a local directory and create an entry for
the parameter. What gets associated with that parameter in the
directory, however?
Directory
ac
→ 1014
Heap
- 1014:
Account(8435, 500)
Do Now!
Why didn’t we create a new
Account
datum when we made the function call?
Remember our rule for when we create new data in the heap: we only
create heap data when we explicitly use a constructor. The function
call does not involve creating a new Account
. Whatever is
associated with the name a
gets associated with the parameter
name ac
. In other words, we have created an alias between
a
and ac
.
In the body of deposit10
, we update the balance of
ac
, which is also the balance of a
due to the
aliasing. Since there is no local heap, when the function call is
over, the new balance persists in a
.
All we’ve done here is put together pieces that we’ve already seen,
just in a new context. We’re passing parameters and updating either
the (local) directory or the heap according to how we have used
=
. But this example highlights a detail that initially
confuses many people when they start writing functions that update
variables.
Strategy: Updating Values within Functions
If you want a function to update a value and have that update persist after the function completes, you must put that value inside a piece of data. You cannot have it be basic data associated with a variable name.
12.1.5 Updating Top-Level Variables within Function Calls
Let’s return to our banking example to illustrate a situation where the ability to update variables is extremely useful. Consider our current process for creating new accounts in the bank by looking at the following example:
ac5 = Account(8702, 435)
ac6 = Account(8703, 280)
ac7 = Account(8704, 375)
Notice that each time we create an Account
we have to take
care to increase the id number? What if we made a typo or
accidentally forgot to do this?
ac5 = Account(8702, 435)
ac6 = Account(8703, 280)
ac7 = Account(8703, 375)
Now we’d have multiple accounts with the same ID number, when we really need these numbers to be unique across all accounts. To avoid such problems, we should instead have a function for creating accounts that takes the initial balance as input and uses a guaranteed-unique ID number.
How might we write such a function? The challenge is to be able to generate unique ID numbers each time. What if we used a variable to store the next available ID number, updating it each time we created a new account? That function might look at follows:
nextID = 8000 # stores the next available ID number
def create_acct(init_bal: float) -> Account:
new_acct = Account(nextID, init_bal)
nextID = nextID + 1
return(new_acct)
Let’s run this program, creating new accounts as follows:
ac5 = create_acct(435)
ac6 = create_acct(280)
ac7 = create_acct(375)
Do Now!
Copy this code into Python and run it. Check that each of
ac5
,ac6
, andac7
have unique ID numbers.
What happened? All three of these have the same ID of
8000
. It looks like our update to nextID
just didn’t
work. Actually, it did work, but to understand how, we have to look at
what happened in the directory.
Do Now!
Draw the memory diagram for this example.
After we set up nextID
and define the function, our memory
diagram appears as:
Directory
nextID
→8000
Now, let’s evaluate ac5 = create_acct(435)
. We call
create_acct
, which yields the following local directory after
creating the Account
but before updating nextID
.
Directory
init_bal
→435
new_acct
→ 1015
Heap
- 1015:
Account(8000, 435)
Do Now!
What do you think happens when we run
nextID = nextID + 1
?
Let’s run this carefully. Python first evaluates the right side of the
=
(nextID + 1
). nextID
is not in the local
directory, so Python retrieves its value (8000
) from the
top-level directory. Thus, this computation becomes nextID = 8001
.
The question here is how Python treats nextID = 8001
: we
currently have both the local directory for the function call and the
top-level directory. Which one should get the new value of
nextID
? Since the local directory is active, Python sets the
value of nextID
there.
Directory
init_bal
→435
new_acct
→ 1015nextID
→8001
Heap
- 1015:
Account(8000, 435)
Let’s repeat that: Python computed nextID + 1
using the
nextID
value from the top-level directory since there was no
value for nextID
in the local directory. But the setting of
the value of nextID
could and did occur in the local
directory. Thus, when create_acct
finishes, the value of
nextID
in the top-level directory is unchanged. As a result,
all of the accounts get the same value.
The computuation we are trying to do—next_id
in the
top-level directory. Here’s the version of create_acct
that
does that:
def create_acct(init_bal: float) -> Account:
global nextID
new_acct = Account(nextID, init_bal)
nextID = nextID + 1
return(new_acct)
The global
keyword tells Python to make updates to the given
variable in the top-level directory, not the local directory. Once we
make this modification, each account we create will get a unique ID
number.
Responsible Computing: Keeping IDs Unpredictable
While this general pattern of generating unique IDs works, in practice we shouldn’t use consecutive numbers. Consecutive numbers are guessable: if there is an account
8000
there must be an account8001
, and so on. Guessable account numbers could make it easier for someone who keeps trying to guess valid IDs to use to log into websites or otherwise access information.Instead, we would use a computation that is less predictable than “add 1” when storing the
nextID
value. For now, the pattern we have shown you is fine. If you were building a real system, however, you’d want to make that computation a bit more sophisticated.
12.1.6 The Many Roles of Variables
At this point, we have used the single coding construct of a variable in the directory for multiple purposes. It’s worth stepping back and calling those out explicitly. In general, variables serve one of the following purposes:
Tracking progress of a computation (e.g., the running value of a result in a
for
-loop)Maintaining information across multiple calls to a single function (e.g., the
next-id
variable)Naming a local or intermediate value in a computation
Each of these uses involves a different programming pattern. The first
creates a variable locally within a function. The second two create
top-level variables and require using global
in functions
that modify the contents. The third is different from the second,
however, in that the third is only meant to be used by a single
function. Ideally, there would be a way to not expose the variable to
all functions in the third case. Indeed, many programming languages
(including Pyret) make it easy to do that. This is harder to achieve
with introductory-level concepts in Python, however. The fourth is
more about local names rather than variables, in that our code never
updates the value after the variable is created.
We call out these three roles precisely because they invoke different code patterns, despite using the same fine-grained concept (assigning a new value to a variable). When you look at a new programming problem, you can ask yourself whether the problem involves one of these purposes, and use that to guide your choice of pattern to use.