On this page:
12.1.1 Modifying Variables in Memory
12.1.2 Variable Updates and Aliasing
12.1.3 Updating Variables versus Updating Data Fields
12.1.4 Updating Parameters in Function Calls
12.1.5 Updating Top-Level Variables within Function Calls
12.1.6 The Many Roles of Variables

12.1 Modifying Variables

    12.1.1 Modifying Variables in Memory

    12.1.2 Variable Updates and Aliasing

    12.1.3 Updating Variables versus Updating Data Fields

    12.1.4 Updating Parameters in Function Calls

    12.1.5 Updating Top-Level Variables within Function Calls

    12.1.6 The Many Roles of Variables

12.1.1 Modifying Variables in Memory

Now that we have introduced the idea of the heap, let’s revisit our use of a variable to compute the sum of elements in a list. Here again is our code from earlier:

run_total = 0
for num in [5, 1, 7, 3]:
   run_total = run_total + num

Let’s see how the directory and heap update as we run this code. In Basic Data and the Heap, we pointed out that basic data (such as numbers, strings, and booleans) don’t get put in the heap because they have no internal structure. Those values are stored in the directory itself. Therefore, the initial value for run_total is stored within the directory.

Directory

  • run_total

      

    0

The for loop also sets up a directory entry, this time for the variable num that is used to refer to the list elements. When the loop starts, num takes on the first value in the list. Thus, the directory appears as:

Directory

  • run_total

      

    0

  • num

      

    5

Inside the for loop, we compute a new value for run_total. The use of = tells Python to modify the value of run_total.

Do Now!

Does this modification get made in the directory or the heap?

Since basic data values are stored only in the directory, this update modifies the contents of the directory. The heap isn’t involved:

Directory

  • run_total

      

    5

  • num

      

    5

This process continues: Python advances num to the next list element

Directory

  • run_total

      

    5

  • num

      

    1

then modifies the value of run_total

Directory

  • run_total

      

    6

  • num

      

    1

This process continues until all of the list elements have been processed. When the for-loop ends, the directory contents are:

Directory

  • run_total

      

    16

  • num

      

    3

There are two takeaways from this example:

Exercise

Draw the sequence of directory contents for the following program:

score = 0
score = score + 4
score = 10

Exercise

Draw the sequence of directory contents for the following program:

count_long = 0
for word in ["here", "are", "some", "words"]:
  if len(word) > 4:
    count_long = count_long + 1

12.1.2 Variable Updates and Aliasing

In State, Change, and Testing, we saw how a statement of the form elena.acct.balance = 500 resulted in a change to jorge.acct.balance. Does this same effect occur if we update the value of a variable directly, rather than a field? Consider the following example:

y = 5
x = y

Do Now!

What do the directory and heap look like after running this code?

Since x and y are assigned basic values, there are no values in the heap:

Directory

  • y

      

    5

  • x

      

    5

Do Now!

If we now evaluate y = 3, does the value of x change?

It does not. The value associated with y in the directory changes, but there is no connection between x and y in the directory. The statement x = y says "get the value of y and associate it with x in the directory". Immediately after this statement, y and x refer to the same value, but this relationship is neither tracked nor maintained. If we associate either variable with a new value, as we do with y = 3, the directory entry for that variable—and only the directory entry for that variable—are changed to reflect the new value. Thus, the directory after we evaluate y = 3 appears as follows:

Directory

  • y

      

    3

  • x

      

    5

This example highlights that aliasing occurs only when two variables refer to the same piece of data with components, not when variables refer to basic data. This is because data with components are stored in the heap, with heap address stored in the directory. Note, though, that uses of varname = ... still affect the directory, even when the values are data with components.

Do Now!

After running the following code, what is the value of ac2.balance?

ac1 = Account(8623, 600)
ac2 = ac1
ac1 = Account(8721, 350)

Draw the directory and heap contents for this program and check your prediction.

All three of these lines results in changes in the directory; the first two result in changes in the heap, but only because we made new pieces of data. ac1 and ac2 are alises immediately after running the second line, but the third line breaks that relationship.

Do Now!

After running the following code, what is the value of ac1.balance?

savings = 475
ac3 = Account(8722, savings)
savings = 500

Draw the directory and heap contents for this program and check your prediction.

Since the value of savings is stored in ac3.balance, and not the name savings itself, updating the value of savings on the third line does not affect ac3.balance.

12.1.3 Updating Variables versus Updating Data Fields

We’ve now seen two different forms of updates in programs: updates to fields of structured data in State, Change, and Testing, and updates to the values associated with names when computing over lists with for loops. At a quick glance, these two forms of update look similar:

acct1.balance = acct1.balance - 50
run_total = run_total + fst

Both use the = operator and compute a new value on the right side. The left sides, however, are subtly different: one is a field within structured data, while the other is a name in the directory. This difference turns out to be significant: the first form changes a value stored in the heap but leaves the directory unchanged, while the second updates the directory but leaves the heap unchanged.

At this point, you might not appreciate why this difference is significant. But for now, let’s summarize how each of these forms impacts each of the directory and the heap.

Strategy: Rules for updating the directory and the heap

Summarizing, the rules for how the directory and memory update are as follows:

  • We add to the heap when a data constructor is used

  • We update the heap when a field of existing data is reassigned

  • We add to the directory when a name is used for the first time (this includes parameters and internal variables when a function is called)

  • We update the directory when a name that is already in the directory is subsequently assigned a new value)

Do Now!

After running the following code, what is the value of ac3.balance?

ac2 = Account(8728, 200)
ac3 = ac2
print(ac3.balance)
ac2.balance = 500
print(ac3.balance)
ac2 = Account(8734, 350)
ac2.balance = 700
print(ac3.balance)

Draw the directory and heap contents for this program and check your prediction.

This example combines updates to variables and updates to fields. On the third line, ac2 and ac3 refer to the same address in the heap (which contains the Account with id 8728. Immediately after updating ac2.balance on the fourth line, the balance in both ac2 and ac3 is 500. Line six, however, creates a new Account in the heap and updates the directory to have ac2 refer to that new Account. From that point on, ac2 and ac3 refer to different accounts, so the update to the balance in ac2 on the seventh line does not affect ac3.

This example illustrates the subtleties and impacts of different uses of =. Programs behave differently depending on whether the left side of the = is a variable name or a field reference, and on whether the right side is basic data or data with components. We will continue to work with these various combinations to build your understanding of when and how to use each one.

12.1.4 Updating Parameters in Function Calls

When we first learned about the directory in [REFSEC], we showed how function calls created their own local directory segments to store any names that got introduced while running the function. Now that we have the ability to update the values associated with variables, we should revisit this topic to understand what happens when these updates occur within functions.

Consider the following two functions:

def add10(num: int):
  num = num + 10

def deposit10(ac: Account)
  ac.balance = ac.balance + 10

Let’s use these two functions in a program:

x = 15
a = Account(8435, 500)
add10(x)
deposit10(a)

Do Now!

What are the values of x and a when the program has finished?

Let’s draw out the directory and heap for this program.

We need a way to distinguish local directories from the global one – easiest for now might be to add a form for local-env-with-heap that uses the label "Local Directory (fun name)".

After the first two lines but before the function calls, we have the following:

Directory

  • x

      15

  • a

      1014

Heap

  • 1014: 

    Account(8435, 500)

Calling add10 creates a local directory containing the name of the parameter:

Directory

  • num

      15

Heap

  • 1014: 

    Account(8435, 500)

Wait – why is the heap listed alongside the local directory? Only the directory gets localized during function calls. The same heap is used at all times.

The body of add10 now updates the value of num in the directory to 25. This does not affect the value of x in the top-level directory, for the same reasons we explained in [REFSEC] regarding the lack of aliasing between variables that refer to basic data. Thus, once the function finishes and the local directory is deleted, the value associated with x is unchanged.

Now, let’s evaluate the call deposit10(a). As with add10, we create a local directory and create an entry for the parameter. What gets associated with that parameter in the directory, however?

Directory

  • ac

      1014

Heap

  • 1014: 

    Account(8435, 500)

Do Now!

Why didn’t we create a new Account datum when we made the function call?

Remember our rule for when we create new data in the heap: we only create heap data when we explicitly use a constructor. The function call does not involve creating a new Account. Whatever is associated with the name a gets associated with the parameter name ac. In other words, we have created an alias between a and ac.

In the body of deposit10, we update the balance of ac, which is also the balance of a due to the aliasing. Since there is no local heap, when the function call is over, the new balance persists in a.

All we’ve done here is put together pieces that we’ve already seen, just in a new context. We’re passing parameters and updating either the (local) directory or the heap according to how we have used =. But this example highlights a detail that initially confuses many people when they start writing functions that update variables.

Strategy: Updating Values within Functions

If you want a function to update a value and have that update persist after the function completes, you must put that value inside a piece of data. You cannot have it be basic data associated with a variable name.

12.1.5 Updating Top-Level Variables within Function Calls

Let’s return to our banking example to illustrate a situation where the ability to update variables is extremely useful. Consider our current process for creating new accounts in the bank by looking at the following example:

ac5 = Account(8702, 435)
ac6 = Account(8703, 280)
ac7 = Account(8704, 375)

Notice that each time we create an Account we have to take care to increase the id number? What if we made a typo or accidentally forgot to do this?

ac5 = Account(8702, 435)
ac6 = Account(8703, 280)
ac7 = Account(8703, 375)

Now we’d have multiple accounts with the same ID number, when we really need these numbers to be unique across all accounts. To avoid such problems, we should instead have a function for creating accounts that takes the initial balance as input and uses a guaranteed-unique ID number.

How might we write such a function? The challenge is to be able to generate unique ID numbers each time. What if we used a variable to store the next available ID number, updating it each time we created a new account? That function might look at follows:

nextID = 8000 # stores the next available ID number

def create_acct(init_bal: float) -> Account:
  new_acct = Account(nextID, init_bal)
  nextID = nextID + 1
  return(new_acct)

Let’s run this program, creating new accounts as follows:

ac5 = create_acct(435)
ac6 = create_acct(280)
ac7 = create_acct(375)

Do Now!

Copy this code into Python and run it. Check that each of ac5, ac6, and ac7 have unique ID numbers.

What happened? All three of these have the same ID of 8000. It looks like our update to nextID just didn’t work. Actually, it did work, but to understand how, we have to look at what happened in the directory.

Do Now!

Draw the memory diagram for this example.

After we set up nextID and define the function, our memory diagram appears as:

Directory

  • nextID

      8000

Now, let’s evaluate ac5 = create_acct(435). We call create_acct, which yields the following local directory after creating the Account but before updating nextID.

Directory

  • init_bal

      435

  • new_acct

      1015

Heap

  • 1015: 

    Account(8000, 435)

Do Now!

What do you think happens when we run nextID = nextID + 1?

Let’s run this carefully. Python first evaluates the right side of the = (nextID + 1). nextID is not in the local directory, so Python retrieves its value (8000) from the top-level directory. Thus, this computation becomes nextID = 8001.

The question here is how Python treats nextID = 8001: we currently have both the local directory for the function call and the top-level directory. Which one should get the new value of nextID? Since the local directory is active, Python sets the value of nextID there.

Directory

  • init_bal

      435

  • new_acct

      1015

  • nextID

      8001

Heap

  • 1015: 

    Account(8000, 435)

Let’s repeat that: Python computed nextID + 1 using the nextID value from the top-level directory since there was no value for nextID in the local directory. But the setting of the value of nextID could and did occur in the local directory. Thus, when create_acct finishes, the value of nextID in the top-level directory is unchanged. As a result, all of the accounts get the same value.

The computuation we are trying to do—updating the top-level variable—is just fine. The problem is that Python (reasonably) defaults to the local directory. To make this work, we need to tell Python that we want to make updates to next_id in the top-level directory. Here’s the version of create_acct that does that:

def create_acct(init_bal: float) -> Account:
  global nextID
  new_acct = Account(nextID, init_bal)
  nextID = nextID + 1
  return(new_acct)

The global keyword tells Python to make updates to the given variable in the top-level directory, not the local directory. Once we make this modification, each account we create will get a unique ID number.

Responsible Computing: Keeping IDs Unpredictable

While this general pattern of generating unique IDs works, in practice we shouldn’t use consecutive numbers. Consecutive numbers are guessable: if there is an account 8000 there must be an account 8001, and so on. Guessable account numbers could make it easier for someone who keeps trying to guess valid IDs to use to log into websites or otherwise access information.

Instead, we would use a computation that is less predictable than “add 1” when storing the nextID value. For now, the pattern we have shown you is fine. If you were building a real system, however, you’d want to make that computation a bit more sophisticated.

12.1.6 The Many Roles of Variables

At this point, we have used the single coding construct of a variable in the directory for multiple purposes. It’s worth stepping back and calling those out explicitly. In general, variables serve one of the following purposes:

  1. Tracking progress of a computation (e.g., the running value of a result in a for-loop)

  2. Maintaining information across multiple calls to a single function (e.g., the next-id variable)

  3. Naming a local or intermediate value in a computation

Each of these uses involves a different programming pattern. The first creates a variable locally within a function. The second two create top-level variables and require using global in functions that modify the contents. The third is different from the second, however, in that the third is only meant to be used by a single function. Ideally, there would be a way to not expose the variable to all functions in the third case. Indeed, many programming languages (including Pyret) make it easy to do that. This is harder to achieve with introductory-level concepts in Python, however. The fourth is more about local names rather than variables, in that our code never updates the value after the variable is created.

We call out these three roles precisely because they invoke different code patterns, despite using the same fine-grained concept (assigning a new value to a variable). When you look at a new programming problem, you can ask yourself whether the problem involves one of these purposes, and use that to guide your choice of pattern to use.