8.1

IV Programming with State

23 From Pyret to Python

    23.1 Expessions, Functions, and Types

    23.2 Returning Values from Functions

    23.3 Examples and Test Cases

    23.4 An Aside on Numbers

    23.5 Conditionals

    23.6 Creating and Processing Lists

      23.6.1 Filters, Maps, and Friends

    23.7 Data with Components

      23.7.1 Accessing Fields within Dataclasses

    23.8 Traversing Lists

      23.8.1 Introducing For Loops

        23.8.1.1 An Aside on Order of Processing List Elements

      23.8.2 Using For Loops in Functions that Produce Lists

      23.8.3 Summary: The List-Processing Template for Python

Through our work in Pyret to this point, we’ve covered several core programming skills: how to work with tables, how to design good examples, the basics of creating datatypes, and how to work with the fundamental computational building blocks of functions, conditionals, and repetition (through filter and map, as well as recursion). You’ve got a solid initial toolkit, as well as a wide world of other possible programs ahead of you!

But we’re going to shift gears for a little while and show you how to work in Python instead. Why?

Seeing how the same concepts play out in multiple languages can help you distinguish core computational ideas from the notations and idioms of specific languages. If you plan to write programs as part of your professional work, you’ll inevitably have to work in different languages at different times: we’re giving you a chance to practice that skill in a controlled and gentle setting.

Why do we call this gentle? Because the notations in Pyret were designed partly with this transition in mind. You’ll find many similarities between Pyret and Python at a notational level, yet also some interesting differences that highlight some philosophical differences that underlie languages. The next set of programs that we want to write (specifically, data-rich programs where the data must be updated and maintained over time) fit nicely with certain features of Python that you haven’t seen in Pyret. A future release will contain material that contrasts the strengths and weaknesses of the two languages.

We highlight the basic notational differences between Pyret and Python by redoing some of our earlier code examples in Python.

23.1 Expessions, Functions, and Types

Back in Functions Practice: Cost of pens, we introduced the notation for functions and types using an example of computing the cost of an order of pens. An order consisted of a number of pens and a message to be printed on the pens. Each pen cost 25 cents, plus 2 cents per character for the message. Here was the original Pyret code:

fun pen-cost(num-pens :: Number, message :: String) -> Number:
  doc: ```total cost for pens, each 25 cents
          plus 2 cents per message character```
  num-pens * (0.25 + (string-length(message) * 0.02))
end

Here’s the corresponding Python code:

def pen_cost(num_pens: int, message: str) -> float:
    '''total cost for pens, each at 25 cents plus
       2 cents per message character'''
    return num_pens * (0.25 + (len(message) * 0.02))

Do Now!

What notational differences do you see between the two versions?

Here’s a summary of the differences:

These are minor differences in notation, which you will get used to as you write more programs in Python.

Exercise

Convert the following moon-weight function from Functions Practice: Moon Weight into Python:

fun moon-weight(earth-weight :: Number) -> Number:
  doc:" Compute weight on moon from weight on earth"
  earth-weight * 1/6
end

23.2 Returning Values from Functions

In Pyret, a function body consisted of optional statements to name intermediate values, followed by a single expression. The value of that single expression is the result of calling the function. In Pyret, every function produces a result, so there is no need to label where the result comes from.

As we will see, Python is different: not all "functions" return results (note the name change from fun to def).In mathematics, functions have results by definition. Programmers sometimes distinguish between the terms "function" and "procedure": both refer to parameterized computations, but only the former returns a result to the surrounding computation. Some programmers and languages do, however, use the term "function" more loosely to cover both kinds of parameterized computations. Moreover, the result isn’t necessarily the last expression of the def. In Python, the keyword return explicitly labels the expression whose value serves as the result of the function.

Do Now!

Put these two definitions in a Python file.

def add1v1(x: int) -> int:
    return x + 1

def add1v2(x: int) -> int:
    x + 1

At the Python prompt, call each function in turn. What do you notice about the result from using each function?

Hopefully, you noticed that using add1v1 displays an answer after the prompt, while using add1v2 does not. This difference has consequences for composing functions.

Do Now!

Try evaluating the following two expressions at the Python prompt: what happens in each case?

3 * add1v1(4)

3 * add1v2(4)

This example illustrates why return is essential in Python: without it, no value is returned, which means you can’t use the result of a function within another expression. So what use is add1v2 then? Hold that question; we’ll return to it in Modifying Variables.

23.3 Examples and Test Cases

In Pyret, we included examples with every function using where: blocks. We also had the ability to write check: blocks for more extensive tests. As a reminder, here was the pen-cost code including a where: block:

fun pen-cost(num-pens :: Number, message :: String) -> Number:
  doc: ```total cost for pens, each 25 cents
       plus 2 cents per message character```
  num-pens * (0.25 + (string-length(message) * 0.02))
where:
  pen-cost(1, "hi") is 0.29
  pen-cost(10, "smile") is 3.50
end

Python does not have a notion of where: blocks, or a distinction between examples and tests. There are a couple of different testing packages for Python; here we will use pytest, a standard lightweight framework that resembles the form of testing that we did in Pyret.How you set up pytest and your test file contents will vary according to your Python IDE. We assume instructors will provide separate instructions that align with their tool choices. To use pytest, we put both examples and tests in a separate function. Here’s an example of this for the pen_cost function:

import pytest

def pen_cost(num_pens: int, message: str) -> float:
    '''total cost for pens, each at 25 cents plus
       2 cents per message character'''
    return num_pens * (0.25 + (len(message) * 0.02))

def test_pens():
  assert pen_cost(1, "hi") == 0.29
  assert pen_cost(10, "smile") == 3.50

Things to note about this code:

Do Now!

Add one more test to the Python code, corresponding to the Pyret test

pen-cost(3, "wow") is 0.93

Make sure to run the test.

Do Now!

Did you actually try to run the test?

Whoa! Something weird happened: the test failed. Stop and think about that: the same test that worked in Pyret failed in Python. How can that be?

23.4 An Aside on Numbers

It turns out that different programming languages make different decisions about how to represent and manage real (non-integer) numbers. Sometimes, differences in these representations lead to subtle quantitative differences in computed values. As a simple example, let’s look at two seemingly simple real numbers 1/2 and 1/3. Here’s what we get when we type these two numbers at a Pyret prompt:

1/2

0.5

1/3

0.3

If we type these same two numbers in a Python console, we instead get:

1/2

0.5

1/3

0.3333333333333333

Notice that the answers look different for 1/3. As you may (or may not!) recall from an earlier math class, 1/3 is an example of a non-terminating, repeating decimal. In plain terms, if we tried to write out the exact value of 1/3 in decimal form, we would need to write an infinite sequence of 3. Mathematicians denote this by putting a horizontal bar over the 3. This is the notation we see in Pyret. Python, in contrast, writes out a partial sequence of 3s.

Underneath this distinction lies some interesting details about representing numbers in computers. Computers don’t have infinite space to store numbers (or anything else, for that matter): when a program needs to work with a non-terminating decimal, the underlying language can either:

Python takes the first approach. As a result, computations with the approximated values sometimes yield approximated results. This is what happens with our new pen_cost test case. While mathematically, the computation should result in 0.93, the approximations yield 0.9299999999999999 instead.

So how do we write tests in this situation? We need to tell Python that the answer should be "close" to 0.93, within the error range of approximations. Here’s what that looks like:

assert pen_cost(3, "wow") == pytest.approx(0.93)

We wrapped the exact answer we wanted in pytest.approx, to indicate that we’ll accept any answer that is nearly the value we specified. You can control the number of decimal points of precision if you want to, but the default of ± 2.3e-06 often suffices.

23.5 Conditionals

Continuing with our original pen_cost example, here’s the Python version of the function that computed shipping costs on an order:

def add_shipping(order_amt: float) -> float:
    '''increase order price by costs for shipping'''
    if order_amt == 0:
      return 0
    elif order_amt <= 10:
      return order_amt + 4
    elif (order_amt > 10) and (order_amt < 30):
      return order_amt + 8
    else:
      return order_amt + 12

The main difference to notice here is that else if is written as the single-word elif in Python. We use return to mark the function’s results in each branch of the conditional. Otherwise, the conditional constructs are quire similarly across the two languages.

You may have noticed that Python does not require an explicit end annotation on if-expressions or functions. Instead, Python looks at the indentation of your code to determine when a construct has ended. For example, in the code sample for pen_cost and test_pens, Python determines that the pen_cost function has ended because it detects a new definition (for test_pens) at the left edge of the program text. The same principle holds for ending conditionals.

We’ll return to this point about indentation, and see more examples, as we work more with Python.

23.6 Creating and Processing Lists

As an example of lists, let’s assume we’ve been playing a game that involves making words out of a collection of letters. In Pyret, we could have written a sample word list as follows:

words = [list: "banana", "bean", "falafel", "leaf"]

In Python, this definition would look like:

words = ["banana", "bean", "falafel", "leaf"]

The only difference here is that Python does not use the list: label that is needed in Pyret.

23.6.1 Filters, Maps, and Friends

When we first learned about lists in Pyret, we started with common built-in functions such as filter, map, member and length. We also saw the use of lambda to help us use some of these functions concisely. These same functions, including lambda, also exist in Python. Here are some samples:

words = ["banana", "bean", "falafel", "leaf"]

# filter and member
words_with_b = list(filter(lambda wd: "b" in wd, words))
# filter and length
short_words = list(filter(lambda wd: len(wd) < 5, words))
# map and length
word_lengths = list(map(len, words))

Note that you have to wrap calls to filter (and map) with a use of list(). Internally, Python has these functions return a type of data that we haven’t yet discussed (and don’t need). Using list converts the returned data into a list. If you omit the list, you won’t be able to chain certain functions together. For example, if we tried to compute the length of the result of a map without first converting to a list, we’d get an error:

len(map(len,b))

TypeError: object of type 'map' has no len()

Don’t worry if this error message makes no sense at the moment (we haven’t yet learned what an "object" is). The point is that if you see an error like this while using the result of filter or map, you likely forgot to wrap the result in list.

Exercise

Practice Python’s list functions by writing expressions for the following problems. Use only the list functions we have shown you so far.

  • Given a list of numbers, convert it to a list of strings "pos", "neg", "zero", based on the sign of each number.

  • Given a list of strings, is the length of any string equal to 5?

  • Given a list of numbers, produce a list of the even numbers between 10 and 20 from that list.

We’re intentionally focusing on computations that use Python’s built-in functions for processing lists, rather than showing you how to write you own (as we did with recursion in Pyret). While you can write recursive functions to process lists in Pyret, a different style of program is more conventional for that purpose. We’ll look at that in the chapter on Modifying Variables.

23.7 Data with Components

The analog to a Pyret data definition is called a dataclass in Python.Those experienced with Python may wonder why we are using dataclasses instead of dictionaries or raw classes. Compared to dictionaries, dataclasses allow the use of type hints and capture that our data has a fixed collection of fields. Compared to raw classes, dataclasses generate a lot of boilerplate code that makes them much lighterweight than raw classes. Here’s an example of a todo-list datatype in Pyret and its corresponding Python code:

# a todo item in Pyret
data ToDoItemData:
  | todoItem(descr :: String,
             due :: Date,
             tags :: List[String]
end

------------------------------------------
# the same todo item in Python

# to allow use of dataclasses
from dataclasses import dataclass
# to allow dates as a type (in the ToDoItem)
from datetime import date

@dataclass
class ToDoItem:
    descr: str
    due: date
    tags: list

# a sample list of ToDoItem
MyTD = [ToDoItem("buy milk", date(2020, 7, 27), ["shopping", "home"]),
        ToDoItem("grade hwk", date(2020, 7, 27), ["teaching"]),
        ToDoItem("meet students", date(2020, 7, 26), ["research"])
       ]

Things to note:

Other than these notational differences, this concept carries over nicely from Pyret to Python. (Creating datatypes with multiple cases, as we did in Pyret, is a bit harder in Python, so we’ll set that aside for now.)

23.7.1 Accessing Fields within Dataclasses

In Pyret, we extracted a field from structured data by using a dot (period) to "dig into" the datum and access the field. The same notation works in Python:

travel = ToDoItem("buy tickets", date(2020, 7, 30), ["vacation"])

travel.descr

"buy tickets"

23.8 Traversing Lists

23.8.1 Introducing For Loops

In Pyret, we wrote recursive functions to compute summary values over lists. As a reminder, here’s a Pyret function that sums the numbers in a list:

fun sum-list(numlist :: List[Number]) -> Number:
  cases (List) numlist:
    | empty => 0
    | link(fst, rst) => fst + sum-list(rst)
  end
end

In Python, it is unusual to break a list into its first and rest components and process the rest recursively. Instead, we use a construct called a for to visit each element of a list in turn. Here’s the form of for, using a concrete (example) list of odd numbers:

for num in [5, 1, 7, 3]:
   // do something with num

The name num here is of our choosing, just as with the names of parameters to a function in Pyret. When a for loop evaluates, each item in the list is referred to as num in turn. Thus, this for example is equivalent to writing the following:

// do something with 5
// do something with 1
// do something with 7
// do something with 3

The for construct saves us from writing the common code multiple times, and also handles the fact that the lists we are processing can be of arbitrary length (so we can’t predict how many times to write the common code).

Let’s now use for to compute the running sum of a list. We’ll start by figuring out the repeated computation with our concrete list again. At first, let’s express the repeated computation just in prose. In Pyret, our repeated computation was along the lines of "add the first item to the sum of the rest of the items". We’ve already said that we cannot easily access the "rest of the items" in Python, so we need to rephrase this. Here’s an alternative:

// set a running total to 0
// add 5 to the running total
// add 1 to the running total
// add 7 to the running total
// add 3 to the running total

Note that this framing refers not to the "rest of the computation", but rather to the computation that has happened so far (the "running total"). If you happened to work through the chapter on my-running-sum: Examples and Code, this framing might be familiar.

Let’s convert this prose sketch to code by replacing each line of the sketch with concrete code. We do this by setting up a variable named run_total and updating its value for each element.

run_total = 0
run_total = run_total + 5
run_total = run_total + 1
run_total = run_total + 7
run_total = run_total + 3

This idea that you can give a new value to an existing variable name is something we haven’t seen before. We’ll explore that ability in more depth shortly. First, let’s collapse the repeated lines of code into a single use of for:

run_total = 0
for num in [5, 1, 7, 3]:
   run_total = run_total + num

This code works fine for a specific list, but our Pyret version took the list to sum as a parameter to a function. To achieve this in Python, we wrap the for in a function as we have done for other examples earlier in this chapter. This is the final version.

def sum_list(numlist : list) -> float:
    "sum a list of numbers"
    run_total = 0
    for num in numlist:
        run_total = run_total + num
    return(run_total)

Do Now!

Write a set of tests for sum_list (the Python version).

Now that the Python version is done, let’s compare it to the original Pyret version:

fun sum-list(numlist :: List[Number]) -> Number:
  cases (List) numlist:
    | empty => 0
    | link(fst, rst) => fst + sum-list(rst)
  end
end

Here are some things to notice about the two pieces of code:

23.8.1.1 An Aside on Order of Processing List Elements

There’s another subtlety here if we consider how the two programs run: the Python version sums the elements from left to right, whereas the Pyret version sums them right to left. Concretely, the sequence of values of run_total are computed as:

run_total = 0
run_total = 0 + 5
run_total = 5 + 1
run_total = 6 + 7
run_total = 13 + 3

In contrast, the Pyret version unrolls as:

sum_list([list: 5, 1, 7, 3])
5 + sum_list([list: 1, 7, 3])
5 + 1 + sum_list([list: 7, 3])
5 + 1 + 7 + sum_list([list: 3])
5 + 1 + 7 + 3 + sum_list([list:])
5 + 1 + 7 + 3 + 0
5 + 1 + 7 + 3
5 + 1 + 10
5 + 11
16

As a reminder, the Pyret version did this because the + in the link case can only reduce to an answer once the sum of the rest of the list has been computed. Even though we as humans see the chain of + operations in each line of the Pyret unrolling, Pyret sees only the expression fst + sum-list(rst), which requires the function call to finish before the + executes.

In the case of summing a list, we don’t notice the difference between the two versions because the sum is the same whether we compute it left-to-right or right-to-left. In other functions we write, this difference may start to matter.

23.8.2 Using For Loops in Functions that Produce Lists

Let’s practice using for loops on another function that traverses lists, this time one that produces a list. Specifically, let’s write a program that takes a list of strings and produces a list of words within that list that contain the letter "z".

As in our sum_list function, we will need a variable to store the resulting list as we build it up. The following code calls this zlist. The code also shows how to use in to check whether a character is in a string (it also works for checking whether an item is in a list) and how to add an element to the end of a list (pythin{append}).

def all_z_words(wordlist : list) -> list:
    "produce list of words from the input that contain z"
    zlist = [] // start with an empty list
    for wd in wordlist:
        if "z" in wd:
            zlist = [wd] + zlist
    return(zlist)

This code follows the structure of sum_list, in that we update the value of zlist using an expression similar to what we would have used in Pyret. For those with prior Python experience who would have used zlist.append here, hold that thought. We will get there in Sharing List Updates.

Exercise

Write tests for all_z_words.

Exercise

Write a second version of all_z_words using filter. Be sure to write tests for it!

Exercise

Contrast these two versions and the corresponding tests. Did you notice anything interesting?

23.8.3 Summary: The List-Processing Template for Python

Just as we had a template for writing list-processing functions in Pyret, there is a corresponding template in Python based on for loops. As a reminder, that pattern is as follow:

def func(lst: list):
  result = ...  # what to return if the input list is empty
  for item in lst:
    # combine item with the result so far
    result = ... item ... result
  return result

Keep this template in mind as you learn to write functions over lists in Python.

24 Modifying Variables

    24.1 Modifying Variables in Memory

    24.2 Modifying Variables Associated with Lists

    24.3 Writing Functions that Modify Variables

      24.3.1 The global annotation

    24.4 Testing Functions that Modify global Variables

      24.4.1 The Internal Structure of a Test Function

      24.4.2 Takeaways on Testing Modifications

Our list-processing functions in Traversing Lists used a new concept that might seem minor but actually has significant consequences in programming: they modified the value of an existing variable (the one holding the running result during the for loop).

In terms of programming notation, there was nothing new. We first saw the name = value notation for associating names with values back in Example: Similar Flags. What is new is that we just reused a name, thus changing the value associated with that name. Back in Pyret, we said that once your code associated a name with a value, you couldn’t associate that same name with a different value. In Python, we have broken this rule, using the same name to hold the current value of the running total, a value which changes over time as the program runs. Pyret also allows reusing names through its shadow construct, but we did not need that in any of the programs in the Foundations section.

24.1 Modifying Variables in Memory

Let’s map this out at the level of the directory, reusing our code for summing a list from the last chapter. Here is the code again:

run_total = 0
for num in [5, 1, 7, 3]:
   run_total = run_total + num

The first line makes an entry in the directory:

Directory

run_total –> 0

The for loop also sets up a directory entry, this time for the variable num used to refer to the list elements. Thus, the directory appears as:

Directory

run_total –> 0 num –> 5

Inside the for loop, we compute a new value for run_total. The use of = tells Python to modify the value of run_total in the directory.

Directory

run_total –> 5 num –> 5

This process continues: Python advances num to the next list element

Directory

run_total –> 5 num –> 1

then modifies the value of run_total

Directory

run_total –> 6 num –> 1

This process continues until all of the list elements have been processed.

What is the takeaway from this? There are two:

  1. The = construct associates a name with a value. If the name is not already in the directory, it gets added. If the name is already in the directory, the new value replaces the old value. The old value is no longer accessible.

  2. for loops also introduce a name into the directory, specifically the one the programmer chose to refer to the individual list elements. Python modifies the value of that name as part of processing the for loop.

The ability to modify the values associated with names probably doesn’t seem like a big deal just yet, but upcoming programs will show where this ability gets interesting.

Exercise

Draw the sequence of directory contents for the following program:

score = 0
score = score + 4
score = 10

Exercise

Draw the sequence of directory contents for the following program:

count_long = 0
for word in ["here", "are", "some", "words"]:
  if len(word) > 4:
    count_long = count_long + 1

24.2 Modifying Variables Associated with Lists

Having the ability to modify values helps us imagine more interesting applications. Think about our to-do list example from Data with Components. We learned how to create to-do items and created an example list of to-do items. In the real-world, our set of to-do items changes frequently. We add new items and remove completed ones. Sometimes, we have to search for a to-do item on our list (say to check its due date). In other words, the contents of our to-do list changes over time. If we want to keep our to-do list current, then, we’ll want to store it in a variable whose value we modify over time.

Let’s build up to managing to-do lists (which are lists containing dataclass items) by starting with a simpler example: managing votes cast through a poll, survey, or election. Each option will be a string. We’ll need functions that let people cast votes, as well as functions to tally the votes cast for specific options and functions to determine which option had the most votes. (We’ll return to the to-do list example in Modifying Structured Data.)

As a concrete example, let’s assume that students are responding to a poll about their favorite desserts. We’ll start by setting up an initial (empty) list of the votes that have been cast:

theVotes = []

This would create an entry in the directory:

Directory

theVotes –> []

Do Now!

Write a sequence of 2 commands: one to add "fruit" to theVotes and another to add "churros".

Your code should look something like:

theVotes = ["fruit"] + theVotes
theVotes = ["churros"] + theVotes

Do Now!

Draw the directory after both of these modifications have been made.

Our directory now contains a single list with all of our items:

Directory

theVotes –> ["churros", "fruit"]

Do Now!

Why are new items at the front of the list? What would need to change to put new votes at the end of the list instead?

Exercise

Write a function count_votes_for that takes the name of a dessert and returns the number of times that the given dessert appears in theVotes.

Exercise

Imagine that you were now asked to determine which option received the most votes. Develop two different task plans (Task Plans) that would perform this computation. You don’t have to write the code, just the plans.

24.3 Writing Functions that Modify Variables

Our two modification commands are (not surprisingly) the same except for the vote that we are casting. Seeing as we expect many votes to be cast, it would make sense to turn this repeated code into a function. As we did in Pyret, we create a parameter name for the information that differs across the commands, and make the remainder of the commands the body of the function. Here’s the proposed code:

def cast_vote(item: str):
    "add given item to the theVotes list"
    theVotes = [item] + theVotes

}

If this function were the only code in your file, Python would not accept it. It would complain that the theVotes variable is unbound (which means it isn’t associated with a name). We need to have created an initial value of theVotes so that it exists in the directory. Earlier in this section, we started with an initial empty list of votes. We have to include that in our file.

theVotes = []

def cast_vote(item: str):
    "add given item to the theVotes list"
    theVotes = [item] + theVotes

If we try to load this file, Python is still complaining, again that the variable theVotes is unbound. But we just defined it. What could be the problem here?

The problem is has to do with where the variable is defined. Here, we are trying to modify the variable inside a function, but we created the variable outside the function. When we wrote sum_list, the variable with the running total was created and modified within the same function. Here, the theVotes variable is defined outside the function. Why might this matter? The answer lies in the directory.

24.3.1 The global annotation

Whenever a function gets called, the programming language creates a new directory area with a space for local information. If we create an empty list of votes then call cast_vote("fruit"), the directory has the following contents just before we modify the list:

Directory

theVotes –> []

***** area for cast_vote ***** item –> "fruit" ******************************

By default, Python restricts your program to modifying variables in the local directory. This is a good default, as it prevents function calls from interfering with each other. In sum_list, for example, we want to sum each list independently, without the results of summing one list affecting the sum reported for another.

Sometimes, however, we want data from one function call to affect later calls. That is, in fact, the whole point of having a variable where we maintain our current list of votes. We therefore have to tell Python that we want to modify the value of the theVotes variable that is outside of the local area. The Python keyword global achieves that:

theVotes = []

def cast_vote(item: str):
    "add given item to the theVotes list"
    global theVotes
    theVotes = [item] + theVotes

The global annotation is used with the name of a variable that was defined outside the function that we want this function to modify. You can’t use this to modify the values of variables that were introduced in other functions, just ones that were defined at the same level as the functions in your file (the so-called "top" level).

24.4 Testing Functions that Modify global Variables

We developed cast_vote by making a function from two commands with shared content. We didn’t write tests for it though. Let’s add a testing function now.

Do Now!

Try to write a test case for cast_vote. Write down either a concrete test (in pytest form) or write down a question whose answer would help you figure out how to write a test.

Following the form of tests that we developed in Examples and Test Cases, we expect to write something like

def test_cast_vote():
  assert cast_vote("basbousa") == ...

What fills in the ...? That should be the result we expect from calling the function, but our cast_vote function doesn’t return anything! It simply modifies the variable. We could replace the ... with None (Python’s value for "nothing"), but that doesn’t tell us anything about whether the function actually has the effect that we expected. We need a different testing strategy for functions that modify variables but don’t return anything.

The strategy is to test for the effect that we expected, not the value that we expected.

Do Now!

What effect do we expect cast_vote("basbousa") to have? It modifies theVotes, so refer to theVotes or "basbousa" in your answer.

One claim that directly follows from the problem statement would be "theVotes should contain "basbousa"". Can we turn this into code to form a test case? Yes, and here’s the code:

def test_cast_vote():
    cast_vote("basbousa")
    assert "basbousa" in theVotes

This example shows the essence of testing functions that modify variables. Rather than put the function call in the assert statement, we call the function first, then write an assert that uses the variable itself to check for the expected change.

Do Now!

Are there other expected changes that we should check? Can you come up with a cast_vote function that would pass this test but not do what we expected?

Imagine that someone wrote the following cast_vote function instead:

theVotes = []

def cast_vote(item: str):
    "add given item to the theVotes list"
    global theVotes
    theVotes = [item]

This modifies theVotes to include the given item, but it accidentally throws away the rest of the list! Our current test_cast_vote would approve of this version, which suggests that additional tests are needed.

Do Now!

What other expectations do you have about theVotes after the function runs? Can you capture those as assert statements about the variable?

One expectation we might have is that theVotes would be 1 element longer after the call than it was before the call. Here’s how to augment test_cast_vote to include this assertion:

def test_cast_vote():
    pre_length = len(theVotes)
    cast_vote("basbousa")
    assert "basbousa" in theVotes
    assert len(theVotes) == pre_length + 1

The strategy here is to use a local variable to save the length of theVotes before calling cast_vote, then use that saved value to check the expected impact after the call.

Do Now!

Are there other impacts to consider? Could a program satisfy our current two tests and still not do what we expect?

If we start imagining worst-case scenarios, we could envision a malicious voting machine that doubles the impact of a vote by removing an old vote and replacing it with two copies of the newly cast vote. Here’s code that wouild do that (the list[1:] returns the rest of list. in general list[x:y] takes the sublist from index x up to but not including index y).

def cast_vote(item: str):
    "add given item to the theVotes list"
    global theVotes
    theVotes = [item] + [item] + theVotes[1:]

Okay, yes, this could happen, but isn’t this sort of idea generation getting a little excessive? Yes and no. If you are writing test cases for a voting machine, you absolutely should be thinking about situations like this (because people trying to influence elections might try them). The ability to think about how adversaries might abuse a system is a valuable skill (and you can build a career around doing this well). But it does feel a bit open-ended, especially in a first programming course.

Fortunately, we have a much more systematic way to raise the robustness of our tests. Recall that we asked you to write a function called count_votes_for as an exercise. Since both cast_vote and count_votes_for work with the theVotes variable, we should check that the results of count_votes_for are what we would expect after cast_vote modifies the variable.

Do Now!

Modify test_cast_vote to also check that the results of count_votes_for change as expected.

Here’s a revised test function:

def test_cast_vote():
    pre_length = len(theVotes)
    pre_count = count_votes_for("basbousa")
    cast_vote("basbousa")
    assert "basbousa" in cast_vote
    assert len(theVotes) == pre_length + 1
    assert count_votes_for("basbousa") == pre_count + 1

To be careful, we also should test that the vote count has not changed for a dessert other than "basbousa".

Do Now!

Add a test for a dessert other than "basbousa".

One challenge here is that we don’t know what other votes have been cast. From the perspective of this test, only one vote gets cast, so a test for any other dessert might only check desserts that aren’t in the list.

A better test function would set theVotes to have specific contents, to give us more data to work with.

def test_cast_vote():
    theVotes = ["fruit", "churros", "fruit", "basbousa"]
    pre_length = len(theVotes)
    pre_count = count_votes_for("basbousa")
    pre_fruit = count_votes_for("fruit")
    cast_vote("basbousa")
    assert "basbousa" in cast_vote
    assert len(theVotes) == pre_length + 1
    assert count_votes_for("basbousa") == pre_count + 1
    assert count_votes_for("fruit") == pre_fruit

Do Now!

This code has fixed the content of theVotes. This suggests that we no longer need to save information about the results of functions before calling cast_vote. We could edit the assert statements to use manually-computed values (such as 3 in place of pre_length). Is this a good idea?

You certainly could remove the pre_ definitions and just use concrete numbers in the assert statements. As you are getting started, you may prefer to do that. However, the benefit to the current setup is that we could modify the initial value of theVotes (if we think of another important case to check) without then having to modify all of the assert statements. The current setup also lets someone else read your assert statements and clearly understand the effect being tested: having the right side of == say pre_count + 1 is more informative than a concrete number like 2. Remember: examples and code are meant to be read by other people!

24.4.1 The Internal Structure of a Test Function

This testing function seems to be getting a bit long. If we look closely, however, we’ll see it has some structure to it, which we’ll highlight by adding some blank lines and labeling strings between key testing tasks. (Remember that Python uses strings as a commenting mechanism.)

def test_cast_vote():
    "SETUP"
    theVotes = ["fruit", "churros", "fruit", "basbousa"]

    "SAVE CURRENT VALUES"
    pre_length = len(theVotes)
    pre_count = count_votes_for("basbousa")
    pre_fruit = count_votes_for("fruit")

    "PERFORM MODIFICATIONS"
    cast_vote("basbousa")

    "CHECK EFFECTS"
    assert "basbousa" in cast_vote
    assert len(theVotes) == pre_length + 1
    assert count_votes_for("basbousa") == pre_count + 1
    assert count_votes_for("fruit") == pre_fruit

The first section sets up our data. The second saves any current values that are needed for the tests. The third makes the function calls that we want to test. The last checks the effects with the assert statements. Viewed this way, the long function seems more manageable. The labels also remind us of the key tasks that go into testing functions that modify variables.

24.4.2 Takeaways on Testing Modifications

What are the takeaways from all of this?

Does it feel like testing has gotten more complicated now that variables are being modified? It has. When there are no variables that are shared and modified across functions, we can test each function individually. But once one function modifies the value of a variable that other functions also use, we have to make sure that the modifying function isn’t breaking the expected behavior of the others. Variables are powerful, but that power comes with responsibility.

Strategy: Testing a Function that Modifies Data

  1. In prose, write down the effects that you expect the modification to have, both on traits of the variable’s value and on the results of other functions that use the variable. Remember to consider situations that should not be effected by the modification.

  2. Create a test-function skeleton with areas for setup, saving current values, calling the function, and checking the effects.

  3. In the setup section, set the value of the variable to a concrete value for testing.

  4. In the calling-the-function section, write the function call that you want to test.

  5. Turn each of the effects that you want to check into an assert statement, naming and storing computations of any values from before the test was run in the saving-current-values section.

25 Modifying Structured Data

    25.1 Modifying Fields of Structured Data

    25.2 Modifications to Shared Data

    25.3 Understanding Memory

    25.4 Modifications to Structured Data, Revisited

    25.5 Basic Data in Memory

    25.6 Variables and Equality

    25.7 Takeaways on Modifying Variables

In Modifying Variables, we learned how to modify variables, working with a list of votes. We deferred managing and modifying to-do items, as introduced inef["python-data-with-components"]. We now return to managing to-do lists.

As a reminder, here is the dataclass for ToDoItem, along with examples of the data.

@dataclass
class ToDoItem:
  descr : str
  due : date
  tags : list

milk_item = ToDoItem("buy milk", date(2020, 7, 27), ["shopping", "home"])
grading_item = ToDoItem("grade hwk", date(2020, 7, 28), ["teaching"])
paper_item = ToDoItem("meet students", date(2020, 7, 26), ["research"])

todo_list = [grading_item, milk_item, paper_item]

Managing a to-do list involves two different kinds of modifications: modifying the list of items (which we introduced in Modifying Variables) and modifying details of individual entries (such as revised due dates). We’ll start with the latter.

25.1 Modifying Fields of Structured Data

In Python, we modify a field of a dataclass by using = to give a new value to the field within the value. For example:

milk_item.due = date(11, 15, 2020)
paper_item.descr = "meet Jack"

Once we do this, the original value is gone, replaced by the new value. We can see this by accessing the field (or by looking in the dictionary window if your programming environment has that feature):

print(milk_item.due)

Do Now!

Check the value of todo_list. Does milk_item have the original date or the new date?

Since we modified the contents of milk_item, which was in the list, todo_list shows the new date.

Let’s now write a function to modify the date in a ToDoItem.

def modify_duedate(forItem : ToDoItem, new_date : date):
    """change due date on given item to the new date"""
    forItem.due = new_date

What would the test for this look like? In Takeaways on Testing Modifications we discussed how to write tests for functions that modify data and variables. How should those lessons apply when we are modifying fields of data? As with variable modifications, we are testing the effect of a function, so the same rules apply. Let’s apply the testing strategy from that section:

Do Now!

What effects do you expect modify_duedate to have? Remember to include situations that should not be effected.

For modify_duedate, we expect the due field to change, but the other fields of the given item should remain unchanged. We haven’t written any other functions that use due dates, so the field tests will suffice for now. To reinforce the parts of the test, we include the testing task labels (but these would not normally be included unless you find them useful).

def test_modify_duedate():
    "SETUP"
    item = ToDoItem("register", date(2020, 11, 9), ["school"])

    "PERFORM MODIFICATIONS"
    modify_duedate(item, date(2020, 11, 15))

    "CHECK EFFECTS"
    assert item.due == date(2020, 11, 15)
    assert item.descr == "register"
    assert item.tags == ["school"]

Here, we wrote a test that used concrete values from the original object rather than include a "save current values" section. Here, the current values are simply basic data from our setup data, not the result of other functions. In this case, the concrete-values approach better matches the complexity of the testing task.

25.2 Modifications to Shared Data

Imagine that you have a very important phone call to make, so you made multiple to-do items to make sure you don’t miss it. By the time you made the third item, you were tired of typing, so you decided to reuse the definition of the second one, just with a new name:

call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2

Do Now!

Draw the program directory for this code

Directory

call1 –> ToDoItem("call Fred", date(2020, 11, 19), ["urgent"]) call2 –> ToDoItem("call Fred", date(2020, 11, 19), ["urgent"]) call3 –> ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

The directory indicates that all three calls are the same. We can confirm this by running the following commmads:

call1 == call2

True

call2 == call3

True

Fred is unexpectedly out of the office, so now you need to call Tina instead. You decide to modify the call reminders to reflect this change.

Do Now!

Write a command to change the description within call1 to "call Tina" instead of "call Fred" (we’ll get to the other two in a moment).

With what we have learned, we might have written either of the following two commands:

call1 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])
call1.descr = "call Tina"

In all three cases, the resulting program directory now associates call1 with a call to Tina. The first one creates an entirely new ToDoItem and modifies the value of call1 to this new item. The second one modifes a field within the existing ToDoItem. The same value prints out in each case, so the difference doesn’t seem important.

Let’s similarly update call2 and call3.

Do Now!

What do you expect to be printed if you run the following sequence of commands?

call2

call3

call2 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])

call2

call3

After you’ve written down your hypothesis, run the commands and see what happens. Is it what you expected?

Hopefully you observed that call2 and call3 no longer have the same value. On the one hand, this seems reasonable: we only explicitly updated call2. On the other hand, we appeared to define a relationship between call2 and call3 when we wrote call2 = call3, and we might have expected that relationship to be maintained on update.

Let’s repeat the exercise, this time using one of our other approaches. Reset call1, call2 and call3 to their original values and try the second of our possible modification commands.

Do Now!

Evaluate the following commands. What do you notice about the results?

>>> call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
>>> call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
>>> call3 = call2

>>> call2
>>> call3

>>> call2.descr = "call Tina"
>>> call2
>>> call3

Hopefully you now noticed that call2 and call3 were both modified, even though we only explicitly modified call2. To be safe, let’s check the value of call1.

Do Now!

Evaluate call1 and compare its value to that of call2:

>>> call1
>>> call2

Modifying call2 seems to have only affected call3, not call1. This suggests that the relationship between call2 and call3 has indeed been recorded, and that some modification approaches maintain that relationship while others don’t. But what’s actually going on? This seems like the kind of question that we usually turn to the program directory to answer.

As it turns out, there is more to the directory than we have explained thus far. The directory organization that we have used until now doesn’t reflect the relationship between call2 and call3. We need to refine our directory to capture connected variables, then return to the question of modifying data.

25.3 Understanding Memory

Every time you use a constructor to create data, your programming environment stores in the memory of your computer. Memory consists of a (large) number of slots. Your newly-created datum goes into one of these slots. Each slot is labeled with an address. Just as a street address refers to a specific building, a memory address refers to a specific slot where a datum is stored. Memory slots are physical entities, not conceptual ones. A computer with a 500GB hard drive has 500 billion slots in which it can store data. Not all of that memory is available to your programming environment: your web browser, applications, operating system, etc all get stored in the memory. Your programming environment does get a portion of memory to use for storing its data. That portion is called the heap.

When you write a statement like

call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

your programming environment puts the new ToDoItem into a physical slot in the heap, then associates the address of that slot with the variable name in the directory. The name in the directory doesn’t map to the value itself, but rather to the address that holds the value. The address bridges between the physical storage location and the conceptual name you want to associate with the new datum. In other words, our directory really looks like:

Directory

call1 –> 1001

Heap

  • 1001 :  ToDoItem("call Fred" ...)

Our revised version has two separate areas: the directory (mapping names to addresses) and the heap (showing the values stored in the addresses). The initial address 1001 is arbitrary: you can start from any address number. We will use four digit numbers to distinguish addresses from smaller values that we will use as data in our programs.

This revised directory-plus-heap information explains what we observed earlier about call2 and call3. Making a new ToDoItem and naming it call2 updates the directory and heap as follows:

Directory

call1 –> 1001 call2 –> 1002

Heap

  • 1001 :  ToDoItem("call Fred" ...)

  • 1002 :  ToDoItem("call Fred" ...)

What happens now when we evaluate call3 = call2? This statement evaluates just as it did before: we create a directory entry for call3 and associate it with whatever call2 maps to in the directory:

Directory

call1 –> 1001 call2 –> 1002 call3 –> 1002

Heap

  • 1001 :  ToDoItem("call Fred" ...)

  • 1002 :  ToDoItem("call Fred" ...)

You can hover over the location markers to see the connections between the directory and memory areas.

With the heap articulated separately from the directory, we now see the relationship between call3 and call2: they refer to the same address, which in turn means that they refer to the same value. call1, however, refers to a different address (that just happens to contain a similar value).

25.4 Modifications to Structured Data, Revisited

With the directory/heap distinction in hand, let’s revisit our two candidate approaches to modifying call2 to call Tina instead of Fred. As a reminder, here was our first approach:

call2 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])

This approach resulted in call2 mentioning Tina but not call3. Let’s reconstruct the behavior in the directory and heap.

The use of the ToDoItem constructor creates a new datum in the heap, storing it in an unused address. The assignment call2 = associates call2 with the address of the new datum. Thus, we’d end up with the following directory/heap structure:

Directory

call1 –> 1001 call2 –> 1003 call3 –> 1002

Heap

  • 1001 :  ToDoItem("call Fred" ...)

  • 1002 :  ToDoItem("call Fred" ...)

  • 1003 :  ToDoItem("call Tina" ...)

Note that the new item to call Tina is in a new address (1003) in the heap, and call2 has been changed in the directory to refer to 1003. The directory entry for call3 was not affected (it’s still 1002).

In contrast, return to the original directory/heap, and consider what happens if we instead perform the modification via

call2.descr = "call Tina"

There is no call to a constructor on the right side of the =, so new item gets created in the heap. On the left, we see call2.descr. We said earlier that we could read the . as “dig into”. At the level of memory, this means “find the value associated with call2 in the heap and go to the descr field within the datum”. Because of the ., we update the descr field within the heap itself. In other words, we end up with the following directory/heap contents:

Directory

call1 –> 1001 call2 –> 1002 call3 –> 1002

Heap

  • 1001 :  ToDoItem("call Fred" ...)

  • 1002 :  ToDoItem("call Tina" ...)

Now, both call2 and call3, which refer to the same address 1002, get the modified description.

The takeaway from this is that = behaves differently depending on whether the left side is a plain variable name or a field reference off of a variable name. The former modifies the directory. The latter modifies the heap. Under the latter, all variables or fields that map to the same address will be affected by a change to one of them. Under the former, only the value of the named variable changes. Whether you want a change to affect only the current variable or several is a design decision that we will practice making going forward.

25.5 Basic Data in Memory

We have shown how to place structured data in our new directory/heap structure, but what about basic data? For example, what if our original program to set up the call items also included the following two lines:

x = 4
y = 5

The corresponding directory/heap contents would be as follows:

Directory

call1 –> 1001 call2 –> 1002 call3 –> 1002 x –> 4 y –> 5

Heap

  • 1001 :  ToDoItem("call Fred" ...)

  • 1002 :  ToDoItem("call Tina" ...)

This shows that basic data live in the directory, not the heap. The whole point of structured data is that they have both their own identity and multiple components. The heap gives access to both concepts. Basic data can’t be broken down (by definition). As such, there is nothing lost by putting them only in the directory.

What about strings, however? We’ve referred to them as basic data until now, but doesn’t they have components, in terms of the sequence of characters that make up the string? Yes, that is technically accurate. However, we are treating strings as basic data because we aren’t using operations that modify that sequence of strings. This is a fine point, one that usually comes up in later CS courses. This book will leave strings in the directory, but if you are writing programs that modify the internal characters, put them in the heap instead.

Do Now!

Edit the directory and heap to show what happens if we run the following statement:

x = 10

Following what we did for modifying call1, we evaluate the right side of the = and store the result in the directory. Thus, x --> 10 would be in the directory.

Do Now!

Edit the directory and heap to show what happens if we run the following statement:

x = y

Follow the same rule: evaluating y yields value 5, which we then associate with x, yielding the following:

Directory

call1 –> 1001 call2 –> 1002 call3 –> 1002 x –> 5 y –> 5

Heap

  • 1001 :  ToDoItem("call Fred" ...)

  • 1002 :  ToDoItem("call Tina" ...)

Do Now!

If we now evaluate y = 3, does the value of x change?

It does not. The value associated with y in the directory changes, but there is no connection between x and y in the directory. The = command does associate the value mapped to y with x, but that is not the same as having the two variables track each other.

This is one of the most common misconceptions in introductory programming. Statements like var1 = var2 do not set up relationships that persist between variables over time. Moreover, these statements are not commutative, meaning that var1 = var2 and var2 = var1 do not make the same modifications to the directory. The variable on the left of = is reassigned to a new value in the directory, and that value comes from evaluating whatever is on the right. If the expression on the right of = is just a variable, then the value is retrieved from the directory and associated with the variable on the left.

Is there any way to get two variables to track each other’s values? No, but as we saw with call2 and call3, it is possible for two variables to reference the same structured datum, and thereby share modifications. We’ll see more examples of this as we go.

25.6 Variables and Equality

Let’s return one last time to our original collection of calls:

Directory

call1 –> 1001 call2 –> 1002 call3 –> 1002

Heap

  • 1001 :  ToDoItem("call Fred" ...)

  • 1002 :  ToDoItem("call Fred" ...)

Do Now!

If someone asked you whether call2 and call3 are the same, what would you say? What about if they asked about call3 and call1?

On the one hand, all three variables refer to ToDoItems with the same contents. But only two of them are at the same address. When we say “equals”, which one do we mean?

Both notions are useful, and indeed most languages will give you two different equality operators: one for equality of address, and one for equality of (component) values. In Python, we use is for address equality and == for equality of values. If you did Re-Examining Equality, you saw this difference in Pyret as <=> versus ==. (If you did not do anything in Re-Examining Equality, you weren’t exposed to this concept in Pyret.)

As we go forward, you’ll get more practice with when to use each kind of equality. The == operator is more accepting, so it is usually the right default. If you actually need to know whether two expressions refer to the same address, you can switch to is.

25.7 Takeaways on Modifying Variables

Strategy: Rules for updating the directory andthe heap

Summarizing, the rules for how the directory and memory update are as follows:

  • We add to memory when a data constructor is used

  • We update memory when a field of existing data is reassigned

  • We add to the directory when a name is used for the first time (this includes parameters and internal variables when a function is called)

  • We update the directory when a name that is already in the directory is reassigned to a different value)

Ultimately, this book is trying to teach you how to work with various kinds of information that might show up in a real project. Up until now, we’ve focused on the structure of information (tables vs lists vs datatypes/dataclasses) and how to write programs to process those structures.

But now, we’re starting to consider an additional question: does any of our data need to be shared across different parts of the program, such that multiple parts of the program might need to change the data and see the changes? In this chapter, we saw how dataclasses work when they are shared. Can we also share lists? What does that look like? We turn to that, as well as how lists are stored in the heap, in the next chapter.

26 Revisiting Lists and Variables

    26.1 Sharing List Updates

      26.1.1 Operations that Mutate Lists

    26.2 Lists in Memory

    26.3 Practice: Data for Shared Bank Accounts

    26.4 Circular References

      26.4.1 Testing Circular Data

      26.4.2 Revisiting Variables: A Function to Create Accounts for New Customers

    26.5 The Many Roles of Variables

    26.6 Managing All Accounts

26.1 Sharing List Updates

In Modifying Structured Data, we saw that we can have two names that refer to the same dataclass value in memory. In this way, changes made to one are visible in the other. Put differently, we can have two names that refer to the same datum. This same ability could also be useful for lists. For example, you and a roommate might share a shopping list, the registrar and the financial aid office might share a list of students, and so on. In both examples, an list update made by one person or office should be visible to the other. How do we implement this in Python?

More concretely, assume that Shaunae and Jonella are roommates who take turns shopping for groceries. Each of them has access to a shared shopping list (through their phones): any time one of them adds or removes an item, the other person should see the change. How could we mock this up in code?

Do Now!

Create a shopping list for Shaunae with a couple of items, then give the same list a second directory name corresponding to Jonella.

You might have written something like the following:

shaunae_list = ["bread", "coffee"]
jonella_list = shaunae_list

If you load this code at the prompt and look at both lists, you’ll see they have the same values.

Do Now!

Jonella realizes they are also out of eggs. Run an command at the prompt to add "eggs" to Jonella’s list, then make sure "eggs" appears under both names.

Based on what we learned about adding to lists in Using For Loops in Functions that Produce Lists, perhaps you wrote something like

>>> jonella_list = ["eggs"] + jonella_list
>>> jonella_list
["eggs", "bread", "coffee"]
>>> shaunae_list
["bread", "coffee"]

What happened? Only Jonella’s list changed. When we use + to combine two lists, Python creates a new list out of the elements of the existing lists. Since the left side of the = is a name in the directory, Python modifies the directory to refer to this new list. It does not maintain the relationship between jonella_list and shaunae_list.

Surely sharing lists is a common enough pattern that there is a way to do this, however. There is, but it requires us to work with lists in a different way than we first saw in Processing Lists.

26.1.1 Operations that Mutate Lists

So far, we have used + to append two lists together. This operation creates a new list. Python offers a different operation (called append) for modifying an existing list to include a new element. Here’s a different way we could have added "eggs" to Jonella’s list.

>>> jonella_list.append("eggs")
>>> jonella_list
["bread", "coffee", "eggs"]
>>> shaunae_list
["bread", "coffee", "eggs"]

In this version, Python finds the address of jonella_list in the directory, follows the . over to the list datum in the heap, then operates on the list itself to add "eggs". Unlike in the previous version, we don’t use = to assign a new value to jonella_list. The append updates the list in the heap. Since jonella_list and shaunae_list refer to the same address in memory, the addition via jonella_list also shows up when we access the list through the name shaunae_list.

Similarly, if Shaunae buys "coffee", she can remove it from the shared list using the remove operator:

>>> shaunae_list.remove("coffee")
>>> jonella_list
["bread", "eggs"]
>>> shaunae_list
["bread", "eggs"]

The point here is that both append and remove modify the contents of the list in the heap. If two names refer to the same heap address, changes made through one name appear through the other.

Does this mean we always want to modify lists using append and remove? Not at all! Sometimes, we will want to build new lists rather than modifying old ones (we’ll see some cases of this later). If you want a modification to be visible through all names that refer to a list, use append. If you only want the addition of an element to be visible through one name, use +.

26.2 Lists in Memory

In Modifying Structured Data, we drew directory-heap diagrams to show how dataclass values appear in the heap. What do lists look like in the heap? In Python, lists are stored by putting a dataclass-like datum into memory that contains the number of elements in the list, followed by individual elements in the subsequent addresses. For example, here’s what Shaunae and Jonella’s shopping lists look like after adding "eggs":

Directory

shaunae_list –> 1001 jonella_list –> 1001

Heap

  • 1001 :  List(len:3)

  • 1002 :  "bread"

  • 1003 :  "coffee"

  • 1004 :  "eggs"

Instructors: Python’s default list implementation is an array-based list, not a linked list. A section on linked lists will be added in a later release of the book.

There are no directory entries to the list elements, but the fact that the List datum reflects size 3 tells us that the next three addresses are part of the list.

Do Now!

Work out the memory diagram for the following program:

scores = [89,72,92]
colors = ["blue", "brown"]
scores.append(83)

Does this raise any questions?

After defining the first two lists, your memory diagram should have looked like (perhaps with a different starting address):

Directory

scores –> 1005 colors –> 1009

Heap

  • 1005 :  List(len:3)

  • 1006 :  89

  • 1007 :  72

  • 1008 :  92

  • 1009 :  List(2)

  • 1010 :  "blue"

  • 1011 :  "brown"

What now happens when we append 83 to the scores list? Since lists are supposed to have all of their elements in consecutive memory slots, 83 should go into @1009, but that address is already occupied! We shouldn’t destroy the colors list just because we added to the scores list.

In practice, lists are created with some extra space to allow for later additions. Here’s a more accurate diagram:

Directory

scores –> 1005 colors –> 1014

Heap

  • 1005 :  List(len:3, slots:8)

  • 1006 :  89

  • 1007 :  72

  • 1008 :  92

  • 1009 :  None

  • 1010 :  None

  • 1011 :  None

  • 1012 :  None

  • 1013 :  None

  • 1014 :  List(len:2, slots:8)

  • 1015 :  "blue"

  • 1016 :  "brown"

  • 1017 :  None

  • 1018 :  None

  • 1019 :  None

  • 1020 :  None

  • 1021 :  None

  • 1022 :  None

The None value is what Python uses to indicate "there’s nothing here". Note that the List datum has also changed: it how has two fields, one for the number of elements, and another for the number of slots. We also modified the directory to reflect the new address for the scores list.

There’s now room to perform the append on the scores list. The 83 would get inserted into @1009 and the len field of the List would get updated to 4.

Couldn’t a list still run out of room though? Absolutely. When that happens, the whole list gets moved to a new sequence of addresses with more room. The details of this are a more advanced topic. For now, the key takeaway for you is that lists are arranged in consecutive memory addresses. This layout will be particularly relevant when we get to Hashtables and Dictionaries.

26.3 Practice: Data for Shared Bank Accounts

Let’s apply what we’ve been learning about sharing modifications to a real-world problem: having bank accounts shared by multiple people (such as partners or spouses). A bank wants to allow multiple customers to manage the same account (meaning each person deposits to or withdraws from the same shared balance). Our task is to define data structures for managing bank customers and their (shared) accounts.

How might you set up the data about customers and their accounts? To keep things simple, we’ll limit the information about customers to their names (leaving out phone numbers, etc), and information about accounts to their balances (leaving out interest rates, etc).

In a full banking system, we might want to allow customers to be part of multiple accounts as well. This would suggest the following data structure:

@dataclass
class Account:
    id : int
    balance : int
    owners : lst  # of Accounts

@dataclass
class Customer:
    name : str
    accts : lst  # of Accounts

You can have these kinds of circular connections, in which customers connect to accounts which connect to customers. In the interest of letting us focus on basic sharing and memory, however, let’s simplify our data as follows: accounts don’t track who their customers are, and each customer has only one account:

@dataclass
class Account:
    id : int
    balance : int

@dataclass
class Customer:
    name : str
    acct : Account

Do Now!

We could consider making the acct field hold an ID number, rather than an Account datum? Should we do this? Why or why not?

This question revisits a point that we made when linking people into ancestry trees rather than store ancestors through their names: direct references to other values, rather than going through names or id numbers, speeds up our access to related data.

Do Now!

Write an expression that creates a new Customer named Tina with a new Account. Give the new account ID number 1 and initial balance of 100.

Do Now!

Assume we gave the name t_cust to the Customer datum created in the previous exercise. Draw the memory diagram that contains t_cust and the new data values.

In code, you should have written:

t_cust = Customer("Tina", Account(1, 100))

The corresponding directory-memory diagram would be:

Directory

t_cust –> 1015

Heap

  • 1015 :  Customer("Tina", Account(1, 100))

Do Now!

Tina wishes to deposit $50 to her account. Write a statement using t_cust to perform the deposit. Also show the changes that this makes to the dictionary-memory diagram.

The following code performs the update:

t_cust.acct.balance = t_cust.acct.balance + 50

Since the update is to a field of t_cust (rather than to the t_cust variable itself), we make the update in memory (specifically, to the balance field).

Directory

t_cust –> 1015

Heap

  • 1015 :  Customer("Tina", Account(1, 150))

Now, let’s try a more complicated situation. Maria and Jorge are new customers who want to share an account. Sharing means that either of them can make deposits or withdrawals and both will see any changes to their shared balance.

Do Now!

Draw a memory diagram with Customer data for each of Maria and Jorge and a shared Account with a balance of 250. Don’t write the code, just draw the diagram that the code should produce.

You should have ended up with something like the following (perhaps with a name for the Account, perhaps not (either works).

Directory

m_cust1 –> 1016 j_cust1 –> 1017

Heap

  • 1015 :  Account(id:2, balance:250)

  • 1016 :  Customer("Maria", (loc 1015))

  • 1017 :  Customer("Jorge", (loc 1015))

Now, we need to figure out the code to produce this diagram.

Do Now!

Here are four possible proposals for code to create the shared account. Which one(s) produce the desired memory diagram? You might find it useful to draw the memory diagrams for each approach and compare them to the one that we want.

Version 1

m_cust1 = Customer("Maria", Account(2, 250))
j_cust1 = Customer("Jorge", Account(2, 250))

Version 2

m_cust2 = Customer("Maria", Account(2, 250))
j_cust2 = Customer("Jorge", m_cust2.acct)

Version 3

new_acct = Account(2, 250)
m_cust3 = Customer("Maria", new_acct)
j_cust3 = Customer("Jorge", new_acct)

Version 4

init_bal = 250
m_cust4 = Customer("Maria", Account(2, init_bal))
j_cust4 = Customer("Jorge", Account(2, init_bal))

We want to end up with a single account in the heap. Versions 1 and 4 create two separate Account data, so we can rule them out. The difference between versions 2 and 3 lies in the directory: version 3 names the shared Account in the directory, while version 2 does not. Either way, the Account is accessible from the directory (by going through the m_cust and j_cust variables.

Do Now!

Let’s assume we set up the accounts with version 3. Jorge wants to make a deposit of 100. Which of the following lines of code are appropriate if Maria should be able to access the deposited funds?

1. new_acct = Account(2, new_acct.balance + 100)

2. new_acct.balance = new_acct.balance + 100

3. j_cust3.acct = Account(2, new_acct.balance + 100)

4. j_cust3.acct.balance = new_acct.balance + 100

5. j_cust3.acct.balance = j_cust3.acct.balance + 100

For the deposit to be visible to Maria, we have to modify the account that Maria shares with Jorge. We can’t make a new account, so lines 1 and 3 are not appropriate.

Whether lines 2 and 4 work depends on whether new_acct still refers to the memory location of Jorge’s (and Maria’s) account. If it does, both lines work (though the intent isn’t as clear with line 2, which doesn’t indicate that the goal is to update Jorge’s balance). But if new_acct now refers elsewhere, neither line works: line 2 puts the adjusted balance in the wrong account, while line 4 may compute the new balance from the wrong existing balance.

Line 5 works fine.

26.4 Circular References

At the start of this chapter, we said that we wanted each Account to also contain a reference to the Customer(s) who own the account. We deferred setting up this reference, but return to it now. Here are the desired dataclasses (to keep things simple, we’ll allow a Customer to have only one Account.

@dataclass
class Account:
    id: int
    balance: int
    owners: list # of Customer

@dataclass
class Customer:
    name: str
    acct: Account

Let’s create a new Customer with a new Account using these dataclasses:

new_acct = Account(5, 150, Customer("Elena", __________))

How do we fill in the blank in the Customer? We’d like to say new_acct but Python (and most other languages) will raise an error that new_acct isn’t defined. Why is that?

When given this assignment, Python first evaluates the right side, to get the value or memory location that should be stored in the directory for new_acct. If we filled in the blank with new_acct, Python would start by running:

Account(5, 150, Customer("Elena", new_acct))

To do this, it needs to look up new_acct in the directory, but that name isn’t in the directory yet (it only goes in after we compute the value to store for that name). Hence the error.

To get around this, we leverage the ability to update the contents of memory locations after names for data are in place. We’ll create the Account partially, but without filling in the Customer. Then we create the Customer to reference the new Account. Then we update the Account owners with the now-created Customer:

new_acct = Account(5, 150, [])  # note the empty Customer list
new_cust = Customer("Elena", new_acct)
new_acct.owners = [new_cust]

Note here that each part gets a spot in memory and an entry in the directory, but the datum hasn’t been finished yet. Once we have the datum set up in memory though, we can update the owners field to the correct value.

Here’s what this looks like at the level of memory and the directory after running the first two lines:

Directory

new_acct -> 1016 new_cust -> 1017

Heap

  • 1015 :  []

  • 1016 :  Account(5, 150, @1015)

  • 1017 :  Customer("Elena", @1016)

Then, when we run the third line, we create a new list containing new_cust and update the owner list within new_acct:

Directory

new_acct -> 1016 new_cust -> 1017

Heap

  • 1015 :  []

  • 1016 :  Account(5, 150, @1018)

  • 1017 :  Customer("Elena", @1016)

  • 1018 :  [@1017]

Notice that the two owners lists each live in memory but aren’t associated with names in the directory. They are only reachable going through new_acct, and after the update, the empty list isn’t reachable at all.

Do Now!

Why did a new list get created (at address 1018) rather than have new_cust get added to the list at address 1015?

We used new_acct.owners = [new_cust] to finish setting up the owners field, and the right-hand side of = creates a new list. If we had instead written new_acct.owners.append(new_cust), then 1017 would have gone into the list stored at 1015, as follows:

Directory

new_acct -> 1016 new_cust -> 1017

Heap

  • 1015 :  [@1017]

  • 1016 :  Account(5, 150, @1018)

  • 1017 :  Customer("Elena", @1016)

Either approach works. However, not all languages give you an operation like append that modifies the contents of an existing list (Pyret didn’t, at least not the portion that we saw). We therefore chose to show you the more general approach here.

26.4.1 Testing Circular Data

When you want to write a test involving circular data, you can’t write out the circular data manually. For example, imagine that we wanted to write out new_acct from the previous examples:

test("data test", new_acct,
     Account(5, 150, [Customer("Elena", Account(5, 150, ...)])

Because of the circularity, you can’t finish writing down the data. You have two options: write tests in terms of the names of data, or write tests on the components of the data.

Here’s an example that illustrates both. After setting up the account, we might want to check that the owner of the new account is the new customer:

test("new owner", new_acct.owner, new_cust)

Here, rather than write out the Customer explicitly, we use the name of the existing item in the directory. This doesn’t require you to write ellipses. We also focused on just the owner component, as a part of the Account value that we expected to change.

26.4.2 Revisiting Variables: A Function to Create Accounts for New Customers

What if we turned the sequence for creating dependencies between customers and their accounts into a function? We might get something like the following:

def create_acct(new_id: int, init_bal: int, cust_name: str) -> Account:
  new_acct = Account(new_id, init_bal, [])  # note the empty Customer list
  new_cust = Customer(cust_name, new_acct)
  new_acct.owners.append(new_cust)
  return new_acct

This looks useful, but it does have a flaw: we could accidentally create two accounts with the same id number. It would be better to maintain a variable containing the next unused account id, which would guaranteed that a specific id gets used only once.

Do Now!

How might you modify the code to do this? Which concepts seem relevant?

The next available id keeps changing as the program runs. This suggests that we need a variable in the directory that gets modified after each account is created. Here’s how we might set that up:

next_id = 1 # stores the next available id number

def create_acct(init_bal: int, cust_name: str) -> Account:
  global next_id
  new_acct = Account(next_id, init_bal, [])
  next_id = next_id + 1
  new_cust = Customer(cust_name, new_acct)
  new_acct.owners.append(new_cust)
  return new_acct

Here, we create the next_id variable to hold the next id number to use. When we create an Account, we update next_id to the next unused number. Note that we also took the new account’sid number out of the parameters for the function. Problem solved!

26.5 The Many Roles of Variables

At this point, we have used the single coding construct of a variable in the directory for multiple purposes. It’s worth stepping back and calling those out explicitly. In general, variables serve one of the following purposes:

  1. Tracking progress of a computation (e.g., the running value of a result in a for-loop

  2. Sharing data across functions (e.g., casting and counting votes)

  3. Maintaining information across multiple calls to a single function (e.g., the next-id variable)

  4. Naming a local or intermediate value in a computation

Each of these uses involves a different programming pattern. The first creates a variable locally within a function. The second two create top-level variables and require using global in functions that modify the contents. The third is different from the second, however, in that the third is only meant to be used by a single function. Ideally, there would be a way to not expose the variable to all functions in the third case. Indeed, many programming languages (including Pyret) make it easy to do that. This is harder to achieve with introductory-level concepts in Python, however. The fourth is more about local names rather than variables, in that our code never updates the value after the variable is created.

We call out these three roles precisely because they invoke different code patterns, despite using the same fine-grained concept (assigning a new value to a variable). When you look at a new programming problem, you can ask yourself whether the problem involves one of these purposes, and use that to guide your choice of pattern to use.

26.6 Managing All Accounts

So far, we have created individual customers and accounts and named them in the directory. This is fine for a small number of accounts but a realistic bank manages tens of thousands (if not millions) of accounts. Naming individual accounts doesn’t scale with that many accounts. Instead, a bank would maintain something like a list or table of all of the accounts, with functions to locate specific accounts (say by their ID numbers). For example:

all_accts = [Account(8, 100, []),
             Account(2, 300, []),
             Account(10, 225, []),
             Account(3, 200, []),
             Account(1, 120, []),
             ...
             ]

Do Now!

Modify create_acct to add the newly-created account to theall_accts list.

next_id = 1    # stores the next available id number
all_accts = [] # stores all of the accounts

def create_acct(init_bal: int, cust_name: str) -> Account:
  global next_id
  new_acct = Account(next_id, init_bal, [])
  all_accts.append(new_acct) # <-- this is the new line
  next_id = next_id + 1
  new_cust = Customer(cust_name, new_acct)
  new_acct.owners.append(new_cust)
  return new_acct

Exercise

Draw the memory diagram that results from the following code:

create_acct(100, "Tina")
create_acct(250, "Elena")

When someone asks for their balance, they usually provide their account number. This suggests that we need a function that takes an account id and returns the Account value with that id. From there, we can dig into the account to get the balance.

Exercise

Write a function find_acct that takes an id number and returns the Account (from all_accts) with that id number.

Here’s one solution:

def find_acct(which_id : int) -> Account:
    """returns the account with the given id number"""
    for acct in all_accts:
        if acct.id == which_id:
            return acct
    raise ValueError("no account has id " + str(which_id))

def test_find():
    test("match num", find_acct(3).id, 3)
    test("match exactly", find_acct(1).id, 1)
    testValueError("no account", lambda: find_acct(22))

This example shows how to report errors in Python. As in Pyret, there is a raise construct for reporting errors. In Python, however, rather than raise just a string, you raise a particular type of data called a ValueError that contains the string. (There are other kinds of errors too, but ValueError will suffice for what this book will do).

Exercise

Leverage find_acct to write the following two functions:

def deposit(which_id: int, amount: double) {}

def close(which_id: int) {}

As these exercises show, the find_acct function becomes a valuable helper for many other functions. This raises a question: how fast is find_acct? Here, we have to search through all of the accounts to locate the one with a specific id before doing the operation we actually care about. With millions of accounts, this seems like it could get expensive. We’ll explore this more deeply in the next chapter.

27 Hashtables and Dictionaries

    27.1 Searching by Criteria Other than Keys

    27.2 Dictionaries with More Complex Values

    27.3 Using Structured Data as Keys

At the end of Managing All Accounts, we had created a list of Account values and written a function to search through that list to find the account with a specific ID number. We noted that finding an individual account could require us to check every account in the list (as we check accounts one at a time in the for loop), which could start to get expensive as the bank scales to more customers.

Let’s step back however, and make two observations about this problem:

  1. Every account has a unique ID number

  2. We have a function that needs to retrieve accounts based on this unique ID number

In this case, we can use a different data structure, called a hashtable or dictionary (different languages use different terms for this same concept; Python uses “dictionary”). Dictionaries are designed for precisely such a scenario: we have a set of values (the accounts) that we wish to access using a piece of information (the ID number) that is unique to each value. The piece of information that we use for access is called the key. The key does not need to be a field in the data value, but it can be (as with accounts).

To set up a dictionary, we write a series of entries of the form

key: value

Here’s what a dictionary mapping ID numbers to accounts might look like:

accts_dict = {5: Account(5, 225, []),
              3: Account(3, 200, []),
              2: Account(2, 300, []),
              4: Account(4,  75, []),
              1: Account(1, 100, [])
             }

In terms of notation, we can create a dictionary by placing a collection of key: value pairs between curly braces, separated by commas. The dictionary shown here has five account IDs that map to accounts with those same ID numbers. There is no order within a dictionary, so it doesn’t matter which order we write the pairs within the sequence.

Now, if we want to get the account with ID 1 from the dictionary, we can simply write:

accts_dict[1]

This says "in the accts_dict dictionary, get the value associated with key 1. Evaluting this expression will produce Account(1, 100, []). If the key you request isn’t in the dictionary, Python will raise an error. However, you check whether a key is in the dictionary before trying to use it, as follows:

if 6 in accts_dict:
  accts_dict(6)

If we want to change the value associated with a key, we use an assignment statement of the following form:

dictionary[key] = new_value

To add a new key (and its value) to a dictionary, we use the same notation as for updating a value, just with an unused key:

accts_dict[6] = Account(6, 150, [])

Finally, to remove a key (and its value) from the dictionary, we write

del dictionary[key]

27.1 Searching by Criteria Other than Keys

What if we wanted to find all of the accounts with balances below 100? For this, we have to search through all of the key-value pairs and check their balances. This again sounds like we need a for loop. What does that look like on a dictionary though?

Turns out, it looks much like writing a for loop on a list (at least in Python). Here’s a program that creates a list of the accounts with balances below 100:

below_100 = []

# the room variable takes on each key in the dictionary
for ID in accts_dict:
    if accts_dict[ID].balance < 100:
        below_100.append(accts_dict[ID])

Here, the for-loop iterates over the keys. Within the loop, we use each key to retrieve its corresponding Account, perform the balance check on the Account, then put the Account in our running list if it meets our criterion.

Exercise

Create a dictionary that maps names of classrooms or meeting rooms to the numbers of seats that they have. Write expressions to:

  1. Look up how many seats are in a specific room

  2. Change the capacity of a specific room to have 10 more seats than it did initially

  3. Find all rooms that can seat at least 50 students

27.2 Dictionaries with More Complex Values

Do Now!

A track-and-field tournament needs to manage the names of the players on the individual teams that will be competing. For example, “Team Red” has “Shaoming” and “Lijin”, “Team Green” contains “Obi” and ”Chinara”, and “Team Blue” has “Mateo” and “Sophia”. Come up with a way to organize the data that will allow the organizers to easily access the names of the players on each team, keeping in mind that there could be many more teams than just the three listed here.

This feels like a dictionary situation, in that we have a meaningful key (the team name) with which we want to access values (the names of the players). However, we have already said that dictionaries allow only one value per key. Consider the following code:

players = {}
players["Team Red"] = "Shaoming"
players["Team Red"] = "Lijin"

Do Now!

What would be in the dictionary after running this code? If you aren’t sure, try it out!

How do we store multiple player names under the same key? The insight here is that the collection of players, not an individual player, is what we want to associate with the team name. We should therefore store a list of players under each key, as follows:

players = {}
players["Team Red"] = ["Shaoming", "Lijin"]
players["Team Green"] = ["Obi", "Chinara"]
players["Team Blue"] = ["Mateo", "Sophia"]

The values in a dictionary aren’t limited to being basic values. They can be arbitrarily complex, including lists, tables, or even other dictionaries (and more!). There is still only one value per key, which is the requirement of a dictionary.

27.3 Using Structured Data as Keys

Using structured data as keys is also possible, but it’s more subtle than using structured data as values. This has to do with how dictionaries work. Dictionaries depend on the keys being values that cannot be internally modified. What do we mean? An entire Account datum, for example, could not be used as a key because we want to be able to modify the balance, which is a field within the account.

A dataclass value can be used as a key as long as the fields will never be modified. For example, let’s return to the earlier example that mapped room names to their numbers of seats. Imagine that the office that maintains rooms wants to be able to access rooms by ranges of seats (such as 0-29, 30-49, and 50-100). In this case, they would want to make a dataclass for a range and use that as a key.

@dataclass
class Range:
    low: int
    high: str

{ Range(0,29): ["West Wing 200", "North Wing 110"],
  Range(30,49): ["North Wing 150", "South Wing 320"],
  ...
}

Since the fields of a range aren’t going to change, this is a reasonable datum to use as a key. To make Python accept this dictionary, however, we have to indicate that we will not be modifying the field values of Range. You do this with an additional piece on the dataclass annotation:

@dataclass(frozen=True)
class Range:
    low: int
    high: str

The frozen annotation says "the components of this class cannot be changed". If you try to assign to a component of a frozen class such as Flight, Python will produce an error. Tthere actually are ways to use values with changing components as keys, but they are more advanced than this book covers. If you want to read details for yourself, look up _hash_ methods in Python.