8.5

VII Programming with State

29 Modifying Structured Data

    29.1 Modifying Fields of Structured Data

    29.2 Modifications to Shared Data

    29.3 Understanding Memory

    29.4 Variables and Equality

    29.5 Basic Data in Memory

Many programs work with data that get updated over time: as users of common software applications, we mark email messages as read, add items to shopping lists, add entries to our calendars, and update the due dates of to-do items as our schedules change. We haven’t yet written programs that update data, but you now have the foundational skills you need to learn how to do this.

It turns out that there are a couple of different kinds of updates that we can make to data. Consider our to-do list example: we might want to add items to the list (or better still, take them out!), but we also need to update details of existing items (such as their descriptions or due dates). We’ll learn to update details of existing items first.

As a reminder, here is the dataclass for ToDoItem, along with examples of the data.

@dataclass
class ToDoItem:
  descr : str
  due : date
  tags : list

milk_item = ToDoItem("buy milk", date(2020, 7, 27), ["shopping", "home"])
grading_item = ToDoItem("grade hwk", date(2020, 7, 28), ["teaching"])
paper_item = ToDoItem("meet students", date(2020, 7, 26), ["research"])

todo_list = [grading_item, milk_item, paper_item]

29.1 Modifying Fields of Structured Data

In Python, we modify a field of a dataclass by using = to give a new value to the field within the value. For example:

milk_item.due = date(11, 15, 2020)
paper_item.descr = "meet Jack"

Once we do this, the original value is gone, replaced by the new value. We can see this by accessing the field (or by looking in the program-directory pane if your programming environment has that feature):

print(milk_item.due)

Do Now!

Check the value of todo_list. Does milk_item have the original date or the new date?

Since we modified the contents of milk_item, which was in the list, todo_list shows the new date. This is a big deal! Changes we make through one name in our directory can be visible through another name. This happens when we have set up relationships between data (such as putting a dataclass item in a list, while having names for both the item and the list). This subtlety is why we deferred data updates until now (and we’ll take this chapter and the next to work through the subtleties and consequences).

Let’s now write a function to modify the date in a ToDoItem.

def modify_duedate(forItem : ToDoItem, new_date : date):
    """change due date on given item to the new date"""
    forItem.due = new_date

What does this function return? Nothing, as it turns out. The purpose of this function isn’t to produce an answer, but rather to simply update a piece of data. You’ll note there is no return statement here. If you run the function, Python just gives you the prompt again, without showing a result.

How, then, do we test a procedure/function that doesn’t return anything? Our whole practice of testing until now has been based on calling functions and checking their expected answers. When a function updates data, we need to check that the update occurred properly. In other words, we check for the expected effect of calling the function. Here, the expected effect is that the due field of the provided ToDoItem will have the new date. We also expect that the other fields (descr and tags) have not changed. Here’s an example test function:

def test_modify_duedate():
    # SETUP
    item = ToDoItem("register", date(2020, 11, 9), ["school"])

    # PERFORM MODIFICATIONS
    modify_duedate(item, date(2020, 11, 15))

    # CHECK EFFECTS
    assert item.due == date(2020, 11, 15)
    assert item.descr == "register"
    assert item.tags == ["school"]

This test function is longer than ones we have written before. The strings within the function are serving as labels for the three steps we take when testing for updates: we set up or create our test data, we call the function to modify the data, then we check the expected effects from the modification. While these strings are not strictly necessary, we find the labels are a useful guide for both you and the person who will read your code.

29.2 Modifications to Shared Data

Imagine that you have a very important phone call coming up, so you made multiple to-do items to make sure you don’t lose track of it. By the time you made the third item, you were tired of typing, so you decided to reuse the definition of the second one, just with a new name:

call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2

Do Now!

Draw the program directory for this code

Directory

  • call1

      

    ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

  • call2

      

    ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

  • call3

      

    ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

The directory suggests that all three calls are the same. We can confirm this by running the following commands:

call1 == call2

True

call2 == call3

True

Fred is unexpectedly out of the office, so now you need to call Tina instead. You decide to modify the call reminders to reflect this change.

Do Now!

Write a command to change the description within call3 to "call Tina" instead of "call Fred" (we’ll get to the other two in a moment).

Using what we just learned, we want the following:

call3.descr = "call Tina"

Let’s check whether call3 shows the new value:

call3

ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])

This corrects call3. Presumably, call1 and call2 still indicate that we need to call Fred. Let’s check.

call1

ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

call2

ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])

Wait! call2 has already been changed. Furthermore, the check we made earlier that call1 and call2 referred to the same values now returns False, even though we didn’t explicitly change any fields of either of them!

call1 == call2

False

If we again ask whether call2 and call3 have the same values (which they did before the update), we find that they still do:

call2 == call3

True

Perhaps this check is not surprising. We did, after all, create call3 based on call2 when we wrote call3 = call2. But we are still left with the unsettling observation that running a line of code (the update call3.descr = "call Tina") that had nothing to do with call1 or call2 changed the == relationship between call1 and call2. These observations suggest that equality is more subtle than it first appeared. Furthermore, our program directory isn’t currently sophisticated enough to show us that the relationship between call1 and call2 is different than that between call2 and call3. We’ll start with the program directory, then come back to equality.

29.3 Understanding Memory

Every time you use a constructor to create data, your programming environment stores in the memory of your computer. Memory consists of a (large) number of slots. Your newly-created datum goes into one of these slots. Each slot is labeled with an address. Just as a street address refers to a specific building, a memory address refers to a specific slot where a datum is stored. Memory slots are physical entities, not conceptual ones. A computer with a 500GB hard drive has 500 billion slots in which it can store data. Not all of that memory is available to your programming environment: your web browser, applications, operating system, etc all get stored in the memory. Your programming environment does get a portion of memory to use for storing its data. That portion is called the heap.

When you write a statement like

call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

your programming environment puts the new ToDoItem into a physical slot in the heap, then associates the address of that slot with the variable name in the directory. The name in the directory doesn’t map to the value itself, but rather to the address that holds the value. The address bridges between the physical storage location and the conceptual name you want to associate with the new datum. In other words, our directory really looks like:

Directory

  • call

      1001

Heap

  • 1001: ToDoItem("call Fred" ...)

Our revised version has two separate areas: the directory (mapping names to addresses) and the heap (showing the values stored in the addresses). The initial address 1001 is arbitrary: you can start from any address number. We will use four digit numbers to distinguish addresses from smaller values that we will use as data in our programs.

This revised directory-plus-heap information explains what we recently observed about the relationships among call1, call2, and call3 around the update to call3.descr. Let’s revisit the code by which we introduced these names into the directory. First, we made new two ToDoItem data values and associated them with call1 and call2:

call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])

After running these two statements, the directory and heap appear as follows:

Directory

  • call1

      1001

  • call2

      1002

Heap

  • 1001: ToDoItem("call Fred" ...)

  • 1002: ToDoItem("call Fred" ...)

What happens now when we evaluate call3 = call2? This statement evaluates just as it did before: we create a directory entry for call3 and associate it with the result of the right side of the =. The “result” of the computation here is the address where the value is stored. The value of call2 is in 1002, so we associate that same address with call3.

Directory

  • call1

      1001

  • call2

      1002

  • call3

      1002

Heap

  • 1001: ToDoItem("call Fred" ...)

  • 1002: ToDoItem("call Fred" ...)

You can hover over the location markers to see the connections between the directory and memory areas.

With the heap articulated separately from the directory, we now see the relationship between call3 and call2: they refer to the same address, which in turn means that they refer to the same value. call1, however, refers to a different address (that just happens to contain a value with the same field contents).

29.4 Variables and Equality

Given the existence of both the directory and a heao now lets us return to the question of what equality means. Considering our development of call1, call2, and call3, we have seen there are (at least) two notions of equality:

  • Two expressions (which include names) evaluate to the same address in the heap

  • Two expressions evaluate to values with the same types and with the same field contents, but may be in different addresses in the heap

The == operator that you learned in Pyret and we carried into Python checks the second condition. Two expressions can be == while referring to different addresses. If we want to know whether two names refer to the same address, we instead use an operation called is. Let’s revisit our ToDoItem creation one last time:

call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2

call1 == call2

True

call2 == call3

True

call1 is call2

False

call2 is call3

True

Both notions of equality are useful in practice, and indeed most languages will give you two different equality operators: one for equality of address, and one for equality of (component) values. In Python, we use is for address equality and == for equality of values. If you did Re-Examining Equality, you saw this difference in Pyret as <=> versus ==. Pyret also offers a temporal based notion of equality. (If you did not do anything in Re-Examining Equality, you weren’t exposed to these equality variations in Pyret.)

As we go forward, you’ll get more practice with when to use each kind of equality. The == operator is more accepting, so it is usually the right default. If you actually need to know whether two expressions refer to the same address, you can switch to is.

29.5 Basic Data in Memory

We have shown how to place structured data in our new directory/heap structure, but what about basic data? For example, what if our original program to set up the call items also included the following two lines:

x = 4
y = 5

The corresponding directory/heap contents would be as follows:

Directory

  • call1

      1001

  • call2

      1002

  • call3

      1002

  • x

      

    4

  • y

      

    5

Heap

  • 1001: ToDoItem("call Fred" ...)

  • 1002: ToDoItem("call Tina" ...)

This shows that basic data live in the directory, not the heap. The whole point of structured data is that they have both their own identity and multiple components. The heap gives access to both concepts. Basic data can’t be broken down (by definition). As such, there is nothing lost by putting them only in the directory.

What about strings, however? We’ve referred to them as basic data until now, but doesn’t they have components, in terms of the sequence of characters that make up the string? Yes, that is technically accurate. However, we are treating strings as basic data because we aren’t using operations that modify that sequence of strings. This is a fine point, one that usually comes up in later CS courses. This book will leave strings in the directory, but if you are writing programs that modify the internal characters, put them in the heap instead.

30 Modifying Variables

    30.1 Modifying Variables in Memory

    30.2 Modifying Variables Associated with Lists

    30.3 Writing Functions that Modify Variables

      30.3.1 The global annotation

    30.4 Testing Functions that Modify global Variables

      30.4.1 The Internal Structure of a Test Function

      30.4.2 Takeaways on Testing Modifications

We’ve now seen two different forms of updates in programs: updates to fields of structured data in Modifying Structured Data, and updates to the values associated with names when computing over lists with for loops in Traversing Lists. At a quick glance, these two forms of update look similar:

call3.descr = "call Tina"
run_total = run_total + fst

Both use the = operator and compute a new value on the right side. The left sides, however, are subtly different: one is a field within a value, while the other is a name in the directory. This difference turns out to be significant, as we can see by working through how these two forms work at the level of the directory and the heap. We covered this for the call3.descr update in the last chapter. Now, let’s look at the run_total update.

30.1 Modifying Variables in Memory

Here again is our earlier code for summing the numbers in a list:

run_total = 0
for num in [5, 1, 7, 3]:
   run_total = run_total + num

The first line makes an entry in the directory:

Directory

  • run_total

      

    0

The for loop also sets up a directory entry, this time for the variable num used to refer to the list elements. Thus, the directory appears as:

Directory

  • run_total

      

    0

  • num

      

    5

Inside the for loop, we compute a new value for run_total. The use of = tells Python to modify the value of run_total in the directory.

Directory

  • run_total

      

    5

  • num

      

    5

This process continues: Python advances num to the next list element

Directory

  • run_total

      

    5

  • num

      

    1

then modifies the value of run_total

Directory

  • run_total

      

    6

  • num

      

    1

This process continues until all of the list elements have been processed.

What is the takeaway from this? There are two:

  1. The = construct associates a name with a value. If the name is not already in the directory, it gets added. If the name is already in the directory, the new value replaces the old value. The old value is no longer accessible.

  2. for loops also introduce a name into the directory, specifically the one the programmer chose to refer to the individual list elements. Python modifies the value of that name as part of processing the for loop.

The ability to modify the values associated with names probably doesn’t seem like a big deal just yet, but upcoming programs will show where this ability gets interesting.

Exercise

Draw the sequence of directory contents for the following program:

score = 0
score = score + 4
score = 10

Exercise

Draw the sequence of directory contents for the following program:

count_long = 0
for word in ["here", "are", "some", "words"]:
  if len(word) > 4:
    count_long = count_long + 1

In Modifying Structured Data, we saw how a statement of the form call3 = call2 resulted in a change to call3.descr also affecting the value of call2. Does this same effect occur if we update the value of a variable directly, rather than a field? Consider the following example:

y = 5
x = y

Do Now!

What do the directory and heap look like after running this code?

Since x and y are assigned basic values, there are no values in the heap:

Directory

  • y

      

    5

  • x

      

    5

Do Now!

If we now evaluate y = 3, does the value of x change?

It does not. The value associated with y in the directory changes, but there is no connection between x and y in the directory. The = command does associate the value mapped to y with x, but that is not the same as having the two variables track each other. Any time we update the value associated with a name (as opposed to a field), only the directory changes. The resulting directory therefore looks like:

Directory

  • y

      

    3

  • x

      

    5

That the value of x did not change is because of the form of the update: name = .... It has nothing to do with the fact that x and y were associated with basic data. For example, here is another way we might have updated our earlier program involving call3 to tell us to call "Tina".

call1 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call2 = ToDoItem("call Fred", date(2020, 11, 19), ["urgent"])
call3 = call2
call3 = ToDoItem("call Tina", date(2020, 11, 19), ["urgent"])

Do Now!

Write out the directory and heap after running this program.

You should have come up with the following:

Directory

  • call1

      1001

  • call2

      1002

  • call3

      1003

Heap

  • 1001: ToDoItem("call Fred" ...)

  • 1002: ToDoItem("call Fred" ...)

  • 1003: ToDoItem("call Tina" ...)

With this version, updating the ToDoItem associated with call3 does not affect the descr field of call2. That updating via fields affects both variables while updating the variables directly does not is one of the most common confusions in introductory programming. Statements like var1 = var2 do not set up relationships that persist between variables over time. Moreover, these statements are not commutative, meaning that var1 = var2 and var2 = var1 do not make the same modifications to the directory. The variable on the left of = is reassigned to a new value in the directory, and that value comes from evaluating whatever is on the right. If the expression on the right of = is just a variable, then the value is retrieved from the directory and associated with the variable on the left.

Is there any way to get two variables to track each other’s values? No, but as we saw with our first (field-based) approach to updating call3, it is possible for two variables to reference the same structured datum, and thereby share modifications. We’ll see more examples of this as we go.

Ultimately, this book is trying to teach you how to work with various kinds of information that might show up in a real project. Up until now, we’ve focused on the structure of information (tables vs lists vs datatypes/dataclasses) and how to write programs to process those structures.

But now, we’re starting to consider an additional question: does any of our data need to be shared across different parts of the program, such that multiple parts of the program might need to change the data and see the changes? So far, we’ve seen how dataclasses work when they are shared. Can we also share lists? What does that look like? We’ll tackle those questions next.

Strategy: Rules for updating the directory and the heap

Summarizing, the rules for how the directory and memory update are as follows:

  • We add to memory when a data constructor is used

  • We update memory when a field of existing data is reassigned

  • We add to the directory when a name is used for the first time (this includes parameters and internal variables when a function is called)

  • We update the directory when a name that is already in the directory is reassigned to a different value)

30.2 Modifying Variables Associated with Lists

Let’s expand our study of updates yet again, this time looking at updating lists. We’ll start with a list of basic data: strings representing votes cast through a poll. We’ll need functions that let people cast votes, as well as functions to tally the votes cast for specific options and functions to determine which option had the most votes.

As a concrete example, let’s assume that students are responding to a poll about their favorite desserts. We’ll start by setting up an initial (empty) list of the votes that have been cast:

theVotes = []

This would create an entry in the directory:

Directory

  • theVotes

      

    []

Do Now!

Write a sequence of 2 commands: one to add "fruit" to theVotes and another to add "churros".

Your code should look something like:

theVotes = ["fruit"] + theVotes
theVotes = ["churros"] + theVotes

Do Now!

Draw the directory after both of these modifications have been made.

Our directory now contains a single list with all of our items:

Directory

  • theVotes

      

    ["churros", "fruit"]

Do Now!

Why are new items at the front of the list? What would need to change to put new votes at the end of the list instead?

Exercise

Write a function count_votes_for that takes the name of a dessert and returns the number of times that the given dessert appears in theVotes.

Exercise

Imagine that you were now asked to determine which option received the most votes. Develop two different task plans (Task Plans) that would perform this computation. You don’t have to write the code, just the plans.

30.3 Writing Functions that Modify Variables

Our two modification commands are (not surprisingly) the same except for the vote that we are casting. Seeing as we expect many votes to be cast, it would make sense to turn this repeated code into a function. As we did in Pyret, we create a parameter name for the information that differs across the commands, and make the remainder of the commands the body of the function. Here’s the proposed code:

def cast_vote(item: str):
    """add given item to the theVotes list"""
    theVotes = [item] + theVotes

If this function were the only code in your file, Python would not accept it. It would complain that the theVotes variable is unbound (which means it isn’t associated with a name). We need to have created an initial value of theVotes so that it exists in the directory. Earlier in this section, we started with an initial empty list of votes. We have to include that in our file.

theVotes = []

def cast_vote(item: str):
    """add given item to the theVotes list"""
    theVotes = [item] + theVotes

If we try to load this file, Python is still complaining, again that the variable theVotes is unbound. But we just defined it. What could be the problem here?

The problem is has to do with where the variable is defined. Here, we are trying to modify the variable inside a function, but we created the variable outside the function. When we wrote sum_list, the variable with the running total was created and modified within the same function. Here, the theVotes variable is defined outside the function. Why might this matter? The answer lies in the directory.

30.3.1 The global annotation

Whenever a function gets called, the programming language creates a new directory area with a space for local information. If we create an empty list of votes then call cast_vote("fruit"), the directory has the following contents just before we modify the list:

Directory

  • theVotes

      

    []

By default, Python restricts your program to modifying variables in the local directory. This is a good default, as it prevents function calls from interfering with each other. In sum_list, for example, we want to sum each list independently, without the results of summing one list affecting the sum reported for another.

Sometimes, however, we want data from one function call to affect later calls. That is, in fact, the whole point of having a variable where we maintain our current list of votes. We therefore have to tell Python that we want to modify the value of the theVotes variable that is outside of the local area. The Python keyword global achieves that:

theVotes = []

def cast_vote(item: str):
    """add given item to the theVotes list"""
    global theVotes
    theVotes = [item] + theVotes

The global annotation is used with the name of a variable that was defined outside the function that we want this function to modify. You can’t use this to modify the values of variables that were introduced in other functions, just ones that were defined at the same level as the functions in your file (the so-called “top” level).

30.4 Testing Functions that Modify global Variables

We developed cast_vote by making a function from two commands with shared content. We didn’t write tests for it though. Let’s add a testing function now.

Do Now!

Try to write a test case for cast_vote. Write down either a concrete test (in pytest form) or write down a question whose answer would help you figure out how to write a test.

Following the form of tests that we developed in Examples and Test Cases, we expect to write something like

def test_cast_vote():
  assert cast_vote("basbousa") == ...

What fills in the ...? As we saw when testing the function to update ToDoItem descriptions, we write tests that check the effects of update functions, not their returned results.

Do Now!

What effect do we expect cast_vote("basbousa") to have? It modifies theVotes, so refer to theVotes or "basbousa" in your answer.

One claim that directly follows from the problem statement would be "theVotes should contain "basbousa"". Can we turn this into code to form a test case? Yes, and here’s the code:

def test_cast_vote():
    cast_vote("basbousa")
    assert "basbousa" in theVotes

This example shows the essence of testing functions that modify variables. Rather than put the function call in the assert statement, we call the function first, then write an assert that uses the variable itself to check for the expected change.

Do Now!

Are there other expected changes that we should check? Can you come up with a cast_vote function that would pass this test but not do what we expected?

Imagine that someone wrote the following cast_vote function instead:

theVotes = []

def cast_vote(item: str):
    """add given item to the theVotes list"""
    global theVotes
    theVotes = [item]

This modifies theVotes to include the given item, but it accidentally throws away the rest of the list! Our current test_cast_vote would approve of this version, which suggests that additional tests are needed.

Do Now!

What other expectations do you have about theVotes after the function runs? Can you capture those as assert statements about the variable?

One expectation we might have is that theVotes would be 1 element longer after the call than it was before the call. Here’s how to augment test_cast_vote to include this assertion:

def test_cast_vote():
    pre_length = len(theVotes)
    cast_vote("basbousa")
    assert "basbousa" in theVotes
    assert len(theVotes) == pre_length + 1

The strategy here is to use a local variable to save the length of theVotes before calling cast_vote, then use that saved value to check the expected impact after the call.

Do Now!

Are there other impacts to consider? Could a program satisfy our current two tests and still not do what we expect?

If we start imagining worst-case scenarios, we could envision a malicious voting machine that doubles the impact of a vote by removing an old vote and replacing it with two copies of the newly cast vote. Here’s code that wouild do that (the list[1:] returns the rest of list. in general list[x:y] takes the sublist from index x up to but not including index y).

def cast_vote(item: str):
    """add given item to the theVotes list"""
    global theVotes
    theVotes = [item] + [item] + theVotes[1:]

Okay, yes, this could happen, but isn’t this sort of idea generation getting a little excessive? Yes and no. If you are writing test cases for a voting machine, you absolutely should be thinking about situations like this (because people trying to influence elections might try them). The ability to think about how adversaries might abuse a system is a valuable skill (and you can build a career around doing this well). But it does feel a bit open-ended, especially in a first programming course.

Fortunately, we have a much more systematic way to raise the robustness of our tests. Recall that we asked you to write a function called count_votes_for as an exercise. Since both cast_vote and count_votes_for work with the theVotes variable, we should check that the results of count_votes_for are what we would expect after cast_vote modifies the variable.

Do Now!

Modify test_cast_vote to also check that the results of count_votes_for change as expected.

Here’s a revised test function:

def test_cast_vote():
    pre_length = len(theVotes)
    pre_count = count_votes_for("basbousa")
    cast_vote("basbousa")
    assert "basbousa" in cast_vote
    assert len(theVotes) == pre_length + 1
    assert count_votes_for("basbousa") == pre_count + 1

To be careful, we also should test that the vote count has not changed for a dessert other than "basbousa".

Do Now!

Add a test for a dessert other than "basbousa".

One challenge here is that we don’t know what other votes have been cast. From the perspective of this test, only one vote gets cast, so a test for any other dessert might only check desserts that aren’t in the list.

A better test function would set theVotes to have specific contents, to give us more data to work with.

def test_cast_vote():
    theVotes = ["fruit", "churros", "fruit", "basbousa"]
    pre_length = len(theVotes)
    pre_count = count_votes_for("basbousa")
    pre_fruit = count_votes_for("fruit")
    cast_vote("basbousa")
    assert "basbousa" in cast_vote
    assert len(theVotes) == pre_length + 1
    assert count_votes_for("basbousa") == pre_count + 1
    assert count_votes_for("fruit") == pre_fruit

Do Now!

This code has fixed the content of theVotes. This suggests that we no longer need to save information about the results of functions before calling cast_vote. We could edit the assert statements to use manually-computed values (such as 3 in place of pre_length). Is this a good idea?

You certainly could remove the pre_ definitions and just use concrete numbers in the assert statements. As you are getting started, you may prefer to do that. However, the benefit to the current setup is that we could modify the initial value of theVotes (if we think of another important case to check) without then having to modify all of the assert statements. The current setup also lets someone else read your assert statements and clearly understand the effect being tested: having the right side of == say pre_count + 1 is more informative than a concrete number like 2. Remember: examples and code are meant to be read by other people!

30.4.1 The Internal Structure of a Test Function

This testing function seems to be getting a bit long. If we look closely, however, we’ll see it has some structure to it, which we’ll highlight by adding some blank lines and labeling strings between key testing tasks. (Remember that Python uses strings as a commenting mechanism.) These are the same labels that we added to our earlier example of a test function for updating descriptions.

def test_cast_vote():
    # SETUP
    theVotes = ["fruit", "churros", "fruit", "basbousa"]

    # SAVE CURRENT VALUES
    pre_length = len(theVotes)
    pre_count = count_votes_for("basbousa")
    pre_fruit = count_votes_for("fruit")

    # PERFORM MODIFICATIONS
    cast_vote("basbousa")

    # CHECK EFFECTS
    assert "basbousa" in cast_vote
    assert len(theVotes) == pre_length + 1
    assert count_votes_for("basbousa") == pre_count + 1
    assert count_votes_for("fruit") == pre_fruit

The first section sets up our data. The second saves any current values that are needed for the tests. The third makes the function calls that we want to test. The last checks the effects with the assert statements. Viewed this way, the long function seems more manageable. The labels also remind us of the key tasks that go into testing functions that modify variables.

30.4.2 Takeaways on Testing Modifications

What are the takeaways from all of this?

Does it feel like testing has gotten more complicated now that variables are being modified? It has. When there are no variables that are shared and modified across functions, we can test each function individually. But once one function modifies the value of a variable that other functions also use, we have to make sure that the modifying function isn’t breaking the expected behavior of the others. Variables are powerful, but that power comes with responsibility.

Strategy: Testing a Function that Modifies Data

  1. In prose, write down the effects that you expect the modification to have, both on traits of the variable’s value and on the results of other functions that use the variable. Remember to consider situations that should not be effected by the modification.

  2. Create a test-function skeleton with areas for setup, saving current values, calling the function, and checking the effects.

  3. In the setup section, set the value of the variable to a concrete value for testing.

  4. In the calling-the-function section, write the function call that you want to test.

  5. Turn each of the effects that you want to check into an assert statement, naming and storing computations of any values from before the test was run in the saving-current-values section.

31 Revisiting Lists and Variables

    31.1 Sharing List Updates

      31.1.1 Operations that Mutate Lists

    31.2 Lists in Memory

    31.3 Practice: Data for Shared Bank Accounts

    31.4 Circular References

      31.4.1 Testing Circular Data

      31.4.2 Revisiting Variables: A Function to Create Accounts for New Customers

    31.5 The Many Roles of Variables

    31.6 Managing All Accounts

31.1 Sharing List Updates

In Modifying Structured Data, we saw that we can have two names that refer to the same dataclass value in memory. In this way, changes made to one are visible in the other. Put differently, we can have two names that refer to the same datum. This same ability could also be useful for lists. For example, you and a roommate might share a shopping list, the registrar and the financial aid office might share a list of students, and so on. In both examples, an list update made by one person or office should be visible to the other. How do we implement this in Python?

More concretely, assume that Shaunae and Jonella are roommates who take turns shopping for groceries. Each of them has access to a shared shopping list (through their phones): any time one of them adds or removes an item, the other person should see the change. How could we mock this up in code?

Do Now!

Create a shopping list for Shaunae with a couple of items, then give the same list a second directory name corresponding to Jonella.

You might have written something like the following:

shaunae_list = ["bread", "coffee"]
jonella_list = shaunae_list

If you load this code at the prompt and look at both lists, you’ll see they have the same values.

Do Now!

Jonella realizes they are also out of eggs. Run an command at the prompt to add "eggs" to Jonella’s list, then make sure "eggs" appears under both names.

Based on what we learned about adding to lists in Using For Loops in Functions that Produce Lists, perhaps you wrote something like

>>> jonella_list = ["eggs"] + jonella_list
>>> jonella_list
["eggs", "bread", "coffee"]
>>> shaunae_list
["bread", "coffee"]

What happened? Only Jonella’s list changed. When we use + to combine two lists, Python creates a new list out of the elements of the existing lists. Since the left side of the = is a name in the directory, Python modifies the directory to refer to this new list. It does not maintain the relationship between jonella_list and shaunae_list.

Surely sharing lists is a common enough pattern that there is a way to do this, however. There is, but it requires us to work with lists in a different way than we first saw in Processing Lists.

31.1.1 Operations that Mutate Lists

So far, we have used + to append two lists together. This operation creates a new list. Python offers a different operation (called append) for modifying an existing list to include a new element. Here’s a different way we could have added "eggs" to Jonella’s list.

>>> jonella_list.append("eggs")
>>> jonella_list
["bread", "coffee", "eggs"]
>>> shaunae_list
["bread", "coffee", "eggs"]

In this version, Python finds the address of jonella_list in the directory, follows the . over to the list datum in the heap, then operates on the list itself to add "eggs". Unlike in the previous version, we don’t use = to assign a new value to jonella_list. The append updates the list in the heap. Since jonella_list and shaunae_list refer to the same address in memory, the addition via jonella_list also shows up when we access the list through the name shaunae_list.

Similarly, if Shaunae buys "coffee", she can remove it from the shared list using the remove operator:

>>> shaunae_list.remove("coffee")
>>> jonella_list
["bread", "eggs"]
>>> shaunae_list
["bread", "eggs"]

The point here is that both append and remove modify the contents of the list in the heap. If two names refer to the same heap address, changes made through one name appear through the other.

Does this mean we always want to modify lists using append and remove? Not at all! Sometimes, we will want to build new lists rather than modifying old ones (we’ll see some cases of this later). If you want a modification to be visible through all names that refer to a list, use append. If you only want the addition of an element to be visible through one name, use +.

31.2 Lists in Memory

In Modifying Structured Data, we drew directory-heap diagrams to show how dataclass values appear in the heap. What do lists look like in the heap? In Python, lists are stored by putting a dataclass-like datum into memory that contains the number of elements in the list, followed by individual elements in the subsequent addresses. For example, here’s what Shaunae and Jonella’s shopping lists look like after adding "eggs":

Directory

  • shaunae_list

      1001

  • jonella_list

      1001

Heap

  • 1001: List(len:3)

  • 1002: "bread"

  • 1003: "coffee"

  • 1004: "eggs"

Instructors: Python’s default list implementation is an array-based list, not a linked list. A section on linked lists will be added in a later release of the book.

There are no directory entries to the list elements, but the fact that the List datum reflects size 3 tells us that the next three addresses are part of the list.

Do Now!

Work out the memory diagram for the following program:

scores = [89,72,92]
colors = ["blue", "brown"]
scores.append(83)

Does this raise any questions?

After defining the first two lists, your memory diagram should have looked like (perhaps with a different starting address):

Directory

  • scores

      1005

  • colors

      1009

Heap

  • 1005: List(len:3)

  • 1006: 89

  • 1007: 72

  • 1008: 92

  • 1009: List(2)

  • 1010: "blue"

  • 1011: "brown"

What now happens when we append 83 to the scores list? Since lists are supposed to have all of their elements in consecutive memory slots, 83 should go into @1009, but that address is already occupied! We shouldn’t destroy the colors list just because we added to the scores list.

In practice, lists are created with some extra space to allow for later additions. Here’s a more accurate diagram:

Directory

  • scores

      1005

  • colors

      1014

Heap

  • 1005: List(len:3, slots:8)

  • 1006: 89

  • 1007: 72

  • 1008: 92

  • 1009: None

  • 1010: None

  • 1011: None

  • 1012: None

  • 1013: None

  • 1014: List(len:2, slots:8)

  • 1015: "blue"

  • 1016: "brown"

  • 1017: None

  • 1018: None

  • 1019: None

  • 1020: None

  • 1021: None

  • 1022: None

The None value is what Python uses to indicate "there’s nothing here". Note that the List datum has also changed: it how has two fields, one for the number of elements, and another for the number of slots. We also modified the directory to reflect the new address for the scores list.

There’s now room to perform the append on the scores list. The 83 would get inserted into @1009 and the len field of the List would get updated to 4.

Couldn’t a list still run out of room though? Absolutely. When that happens, the whole list gets moved to a new sequence of addresses with more room. The details of this are a more advanced topic. For now, the key takeaway for you is that lists are arranged in consecutive memory addresses. This layout will be particularly relevant when we get to Hashtables and Dictionaries.

31.3 Practice: Data for Shared Bank Accounts

Let’s apply what we’ve been learning about sharing modifications to a real-world problem: having bank accounts shared by multiple people (such as partners or spouses). A bank wants to allow multiple customers to manage the same account (meaning each person deposits to or withdraws from the same shared balance). Our task is to define data structures for managing bank customers and their (shared) accounts.

How might you set up the data about customers and their accounts? To keep things simple, we’ll limit the information about customers to their names (leaving out phone numbers, etc), and information about accounts to their balances (leaving out interest rates, etc).

In a full banking system, we might want to allow customers to be part of multiple accounts as well. This would suggest the following data structure:

@dataclass
class Account:
    id : int
    balance : int
    owners : lst  # of Accounts

@dataclass
class Customer:
    name : str
    accts : lst  # of Accounts

You can have these kinds of circular connections, in which customers connect to accounts which connect to customers. In the interest of letting us focus on basic sharing and memory, however, let’s simplify our data as follows: accounts don’t track who their customers are, and each customer has only one account:

@dataclass
class Account:
    id : int
    balance : int

@dataclass
class Customer:
    name : str
    acct : Account

Do Now!

We could consider making the acct field hold an ID number, rather than an Account datum? Should we do this? Why or why not?

This question revisits a point that we made when linking people into ancestry trees rather than store ancestors through their names: direct references to other values, rather than going through names or id numbers, speeds up our access to related data.

Do Now!

Write an expression that creates a new Customer named Tina with a new Account. Give the new account ID number 1 and initial balance of 100.

Do Now!

Assume we gave the name t_cust to the Customer datum created in the previous exercise. Draw the memory diagram that contains t_cust and the new data values.

In code, you should have written:

t_cust = Customer("Tina", Account(1, 100))

The corresponding directory-memory diagram would be:

Directory

  • t_cust

      1015

Heap

  • 1015: Customer("Tina", Account(1, 100))

Do Now!

Tina wishes to deposit $50 to her account. Write a statement using t_cust to perform the deposit. Also show the changes that this makes to the dictionary-memory diagram.

The following code performs the update:

t_cust.acct.balance = t_cust.acct.balance + 50

Since the update is to a field of t_cust (rather than to the t_cust variable itself), we make the update in memory (specifically, to the balance field).

Directory

  • t_cust

      1015

Heap

  • 1015: Customer("Tina", Account(1, 150))

Now, let’s try a more complicated situation. Maria and Jorge are new customers who want to share an account. Sharing means that either of them can make deposits or withdrawals and both will see any changes to their shared balance.

Do Now!

Draw a memory diagram with Customer data for each of Maria and Jorge and a shared Account with a balance of 250. Don’t write the code, just draw the diagram that the code should produce.

You should have ended up with something like the following (perhaps with a name for the Account, perhaps not (either works).

Directory

  • m_cust1

      1016

  • j_cust1

      1017

Heap

  • 1015: Account(id:2, balance:250)

  • 1016: Customer("Maria", 1015))

  • 1017: Customer("Jorge", 1015))

Now, we need to figure out the code to produce this diagram.

Do Now!

Here are four possible proposals for code to create the shared account. Which one(s) produce the desired memory diagram? You might find it useful to draw the memory diagrams for each approach and compare them to the one that we want.

Version 1

m_cust1 = Customer("Maria", Account(2, 250))
j_cust1 = Customer("Jorge", Account(2, 250))

Version 2

m_cust2 = Customer("Maria", Account(2, 250))
j_cust2 = Customer("Jorge", m_cust2.acct)

Version 3

new_acct = Account(2, 250)
m_cust3 = Customer("Maria", new_acct)
j_cust3 = Customer("Jorge", new_acct)

Version 4

init_bal = 250
m_cust4 = Customer("Maria", Account(2, init_bal))
j_cust4 = Customer("Jorge", Account(2, init_bal))

We want to end up with a single account in the heap. Versions 1 and 4 create two separate Account data, so we can rule them out. The difference between versions 2 and 3 lies in the directory: version 3 names the shared Account in the directory, while version 2 does not. Either way, the Account is accessible from the directory (by going through the m_cust and j_cust variables.

Do Now!

Let’s assume we set up the accounts with version 3. Jorge wants to make a deposit of 100. Which of the following lines of code are appropriate if Maria should be able to access the deposited funds?

1. new_acct = Account(2, new_acct.balance + 100)

2. new_acct.balance = new_acct.balance + 100

3. j_cust3.acct = Account(2, new_acct.balance + 100)

4. j_cust3.acct.balance = new_acct.balance + 100

5. j_cust3.acct.balance = j_cust3.acct.balance + 100

For the deposit to be visible to Maria, we have to modify the account that Maria shares with Jorge. We can’t make a new account, so lines 1 and 3 are not appropriate.

Whether lines 2 and 4 work depends on whether new_acct still refers to the memory location of Jorge’s (and Maria’s) account. If it does, both lines work (though the intent isn’t as clear with line 2, which doesn’t indicate that the goal is to update Jorge’s balance). But if new_acct now refers elsewhere, neither line works: line 2 puts the adjusted balance in the wrong account, while line 4 may compute the new balance from the wrong existing balance.

Line 5 works fine.

31.4 Circular References

At the start of this chapter, we said that we wanted each Account to also contain a reference to the Customer(s) who own the account. We deferred setting up this reference, but return to it now. Here are the desired dataclasses (to keep things simple, we’ll allow a Customer to have only one Account.

@dataclass
class Account:
    id: int
    balance: int
    owners: list # of Customer

@dataclass
class Customer:
    name: str
    acct: Account

Let’s create a new Customer with a new Account using these dataclasses:

new_acct = Account(5, 150, Customer("Elena", __________))

How do we fill in the blank in the Customer? We’d like to say new_acct but Python (and most other languages) will raise an error that new_acct isn’t defined. Why is that?

When given this assignment, Python first evaluates the right side, to get the value or memory location that should be stored in the directory for new_acct. If we filled in the blank with new_acct, Python would start by running:

Account(5, 150, Customer("Elena", new_acct))

To do this, it needs to look up new_acct in the directory, but that name isn’t in the directory yet (it only goes in after we compute the value to store for that name). Hence the error.

To get around this, we leverage the ability to update the contents of memory locations after names for data are in place. We’ll create the Account partially, but without filling in the Customer. Then we create the Customer to reference the new Account. Then we update the Account owners with the now-created Customer:

new_acct = Account(5, 150, [])  # note the empty Customer list
new_cust = Customer("Elena", new_acct)
new_acct.owners = [new_cust]

Note here that each part gets a spot in memory and an entry in the directory, but the datum hasn’t been finished yet. Once we have the datum set up in memory though, we can update the owners field to the correct value.

Here’s what this looks like at the level of memory and the directory after running the first two lines:

Directory

  • new_acct

      1016

  • new_cust

      1017

Heap

  • 1015: []

  • 1016: Account(5, 150, 1015)

  • 1017: Customer("Elena", 1016)

Then, when we run the third line, we create a new list containing new_cust and update the owner list within new_acct:

Directory

  • new_acct

      1016

  • new_cust

      1017

Heap

  • 1015: []

  • 1016: Account(5, 150, 1018)

  • 1017: Customer("Elena", 1016)

  • 1018: [1017]

Notice that the two owners lists each live in memory but aren’t associated with names in the directory. They are only reachable going through new_acct, and after the update, the empty list isn’t reachable at all.

Do Now!

Why did a new list get created (at address 1018) rather than have new_cust get added to the list at address 1015?

We used new_acct.owners = [new_cust] to finish setting up the owners field, and the right-hand side of = creates a new list. If we had instead written new_acct.owners.append(new_cust), then 1017 would have gone into the list stored at 1015, as follows:

Directory

  • new_acct

      1016

  • new_cust

      1017

Heap

  • 1015: [1017]

  • 1016: Account(5, 150, 1017)

  • 1017: Customer("Elena", 1016)

Either approach works. However, not all languages give you an operation like append that modifies the contents of an existing list (Pyret didn’t, at least not the portion that we saw). We therefore chose to show you the more general approach here.

31.4.1 Testing Circular Data

When you want to write a test involving circular data, you can’t write out the circular data manually. For example, imagine that we wanted to write out new_acct from the previous examples:

test("data test", new_acct,
     Account(5, 150, [Customer("Elena", Account(5, 150, ...)])

Because of the circularity, you can’t finish writing down the data. You have two options: write tests in terms of the names of data, or write tests on the components of the data.

Here’s an example that illustrates both. After setting up the account, we might want to check that the owner of the new account is the new customer:

test("new owner", new_acct.owner, new_cust)

Here, rather than write out the Customer explicitly, we use the name of the existing item in the directory. This doesn’t require you to write ellipses. We also focused on just the owner component, as a part of the Account value that we expected to change.

31.4.2 Revisiting Variables: A Function to Create Accounts for New Customers

What if we turned the sequence for creating dependencies between customers and their accounts into a function? We might get something like the following:

def create_acct(new_id: int, init_bal: int, cust_name: str) -> Account:
  new_acct = Account(new_id, init_bal, [])  # note the empty Customer list
  new_cust = Customer(cust_name, new_acct)
  new_acct.owners.append(new_cust)
  return new_acct

This looks useful, but it does have a flaw: we could accidentally create two accounts with the same id number. It would be better to maintain a variable containing the next unused account id, which would guaranteed that a specific id gets used only once.

Do Now!

How might you modify the code to do this? Which concepts seem relevant?

The next available id keeps changing as the program runs. This suggests that we need a variable in the directory that gets modified after each account is created. Here’s how we might set that up:

next_id = 1 # stores the next available id number

def create_acct(init_bal: int, cust_name: str) -> Account:
  global next_id
  new_acct = Account(next_id, init_bal, [])
  next_id = next_id + 1
  new_cust = Customer(cust_name, new_acct)
  new_acct.owners.append(new_cust)
  return new_acct

Here, we create the next_id variable to hold the next id number to use. When we create an Account, we update next_id to the next unused number. Note that we also took the new account’sid number out of the parameters for the function. Problem solved!

31.5 The Many Roles of Variables

At this point, we have used the single coding construct of a variable in the directory for multiple purposes. It’s worth stepping back and calling those out explicitly. In general, variables serve one of the following purposes:

  1. Tracking progress of a computation (e.g., the running value of a result in a for-loop)

  2. Sharing data across functions (e.g., casting and counting votes)

  3. Maintaining information across multiple calls to a single function (e.g., the next-id variable)

  4. Naming a local or intermediate value in a computation

Each of these uses involves a different programming pattern. The first creates a variable locally within a function. The second two create top-level variables and require using global in functions that modify the contents. The third is different from the second, however, in that the third is only meant to be used by a single function. Ideally, there would be a way to not expose the variable to all functions in the third case. Indeed, many programming languages (including Pyret) make it easy to do that. This is harder to achieve with introductory-level concepts in Python, however. The fourth is more about local names rather than variables, in that our code never updates the value after the variable is created.

We call out these three roles precisely because they invoke different code patterns, despite using the same fine-grained concept (assigning a new value to a variable). When you look at a new programming problem, you can ask yourself whether the problem involves one of these purposes, and use that to guide your choice of pattern to use.

31.6 Managing All Accounts

So far, we have created individual customers and accounts and named them in the directory. This is fine for a small number of accounts but a realistic bank manages tens of thousands (if not millions) of accounts. Naming individual accounts doesn’t scale with that many accounts. Instead, a bank would maintain something like a list or table of all of the accounts, with functions to locate specific accounts (say by their ID numbers). For example:

all_accts = [Account(8, 100, []),
             Account(2, 300, []),
             Account(10, 225, []),
             Account(3, 200, []),
             Account(1, 120, []),
             ...
             ]

Do Now!

Modify create_acct to add the newly-created account to theall_accts list.

next_id = 1    # stores the next available id number
all_accts = [] # stores all of the accounts

def create_acct(init_bal: int, cust_name: str) -> Account:
  global next_id
  new_acct = Account(next_id, init_bal, [])
  all_accts.append(new_acct) # <-- this is the new line
  next_id = next_id + 1
  new_cust = Customer(cust_name, new_acct)
  new_acct.owners.append(new_cust)
  return new_acct

Exercise

Draw the memory diagram that results from the following code:

create_acct(100, "Tina")
create_acct(250, "Elena")

When someone asks for their balance, they usually provide their account number. This suggests that we need a function that takes an account id and returns the Account value with that id. From there, we can dig into the account to get the balance.

Exercise

Write a function find_acct that takes an id number and returns the Account (from all_accts) with that id number.

Here’s one solution:

def find_acct(which_id : int) -> Account:
    """returns the account with the given id number"""
    for acct in all_accts:
        if acct.id == which_id:
            return acct
    raise ValueError("no account has id " + str(which_id))

def test_find():
    test("match num", find_acct(3).id, 3)
    test("match exactly", find_acct(1).id, 1)
    testValueError("no account", lambda: find_acct(22))

This example shows how to report errors in Python. As in Pyret, there is a raise construct for reporting errors. In Python, however, rather than raise just a string, you raise a particular type of data called a ValueError that contains the string. (There are other kinds of errors too, but ValueError will suffice for what this book will do).

Exercise

Leverage find_acct to write the following two functions:

def deposit(which_id: int, amount: double) {}

def close(which_id: int) {}

As these exercises show, the find_acct function becomes a valuable helper for many other functions. This raises a question: how fast is find_acct? Here, we have to search through all of the accounts to locate the one with a specific id before doing the operation we actually care about. With millions of accounts, this seems like it could get expensive. We’ll explore this more deeply in the next chapter.

32 Hashtables and Dictionaries

    32.1 Searching by Criteria Other than Keys

    32.2 Dictionaries with More Complex Values

    32.3 Using Structured Data as Keys

At the end of Managing All Accounts, we had created a list of Account values and written a function to search through that list to find the account with a specific ID number. We noted that finding an individual account could require us to check every account in the list (as we check accounts one at a time in the for loop), which could start to get expensive as the bank scales to more customers.

Let’s step back however, and make two observations about this problem:

  1. Every account has a unique ID number

  2. We have a function that needs to retrieve accounts based on this unique ID number

In this case, we can use a different data structure, called a hashtable or dictionary (different languages use different terms for this same concept; Python uses “dictionary”). Dictionaries are designed for precisely such a scenario: we have a set of values (the accounts) that we wish to access using a piece of information (the ID number) that is unique to each value. The piece of information that we use for access is called the key. The key does not need to be a field in the data value, but it can be (as with accounts).

To set up a dictionary, we write a series of entries of the form

key: value

Here’s what a dictionary mapping ID numbers to accounts might look like:

accts_dict = {5: Account(5, 225, []),
              3: Account(3, 200, []),
              2: Account(2, 300, []),
              4: Account(4,  75, []),
              1: Account(1, 100, [])
             }

In terms of notation, we can create a dictionary by placing a collection of key: value pairs between curly braces, separated by commas. The dictionary shown here has five account IDs that map to accounts with those same ID numbers. There is no order within a dictionary, so it doesn’t matter which order we write the pairs within the sequence.

Now, if we want to get the account with ID 1 from the dictionary, we can simply write:

accts_dict[1]

This says "in the accts_dict dictionary, get the value associated with key 1. Evaluting this expression will produce Account(1, 100, []). If the key you request isn’t in the dictionary, Python will raise an error. However, you check whether a key is in the dictionary before trying to use it, as follows:

if 6 in accts_dict:
  accts_dict(6)

If we want to change the value associated with a key, we use an assignment statement of the following form:

dictionary[key] = new_value

To add a new key (and its value) to a dictionary, we use the same notation as for updating a value, just with an unused key:

accts_dict[6] = Account(6, 150, [])

Finally, to remove a key (and its value) from the dictionary, we write

del dictionary[key]

32.1 Searching by Criteria Other than Keys

What if we wanted to find all of the accounts with balances below 100? For this, we have to search through all of the key-value pairs and check their balances. This again sounds like we need a for loop. What does that look like on a dictionary though?

Turns out, it looks much like writing a for loop on a list (at least in Python). Here’s a program that creates a list of the accounts with balances below 100:

below_100 = []

# the room variable takes on each key in the dictionary
for ID in accts_dict:
    if accts_dict[ID].balance < 100:
        below_100.append(accts_dict[ID])

Here, the for-loop iterates over the keys. Within the loop, we use each key to retrieve its corresponding Account, perform the balance check on the Account, then put the Account in our running list if it meets our criterion.

Exercise

Create a dictionary that maps names of classrooms or meeting rooms to the numbers of seats that they have. Write expressions to:

  1. Look up how many seats are in a specific room

  2. Change the capacity of a specific room to have 10 more seats than it did initially

  3. Find all rooms that can seat at least 50 students

32.2 Dictionaries with More Complex Values

Do Now!

A track-and-field tournament needs to manage the names of the players on the individual teams that will be competing. For example, “Team Red” has “Shaoming” and “Lijin”, “Team Green” contains “Obi” and ”Chinara”, and “Team Blue” has “Mateo” and “Sophia”. Come up with a way to organize the data that will allow the organizers to easily access the names of the players on each team, keeping in mind that there could be many more teams than just the three listed here.

This feels like a dictionary situation, in that we have a meaningful key (the team name) with which we want to access values (the names of the players). However, we have already said that dictionaries allow only one value per key. Consider the following code:

players = {}
players["Team Red"] = "Shaoming"
players["Team Red"] = "Lijin"

Do Now!

What would be in the dictionary after running this code? If you aren’t sure, try it out!

How do we store multiple player names under the same key? The insight here is that the collection of players, not an individual player, is what we want to associate with the team name. We should therefore store a list of players under each key, as follows:

players = {}
players["Team Red"] = ["Shaoming", "Lijin"]
players["Team Green"] = ["Obi", "Chinara"]
players["Team Blue"] = ["Mateo", "Sophia"]

The values in a dictionary aren’t limited to being basic values. They can be arbitrarily complex, including lists, tables, or even other dictionaries (and more!). There is still only one value per key, which is the requirement of a dictionary.

32.3 Using Structured Data as Keys

Using structured data as keys is also possible, but it’s more subtle than using structured data as values. This has to do with how dictionaries work. Dictionaries depend on the keys being values that cannot be internally modified. What do we mean? An entire Account datum, for example, could not be used as a key because we want to be able to modify the balance, which is a field within the account.

A dataclass value can be used as a key as long as the fields will never be modified. For example, let’s return to the earlier example that mapped room names to their numbers of seats. Imagine that the office that maintains rooms wants to be able to access rooms by ranges of seats (such as 0-29, 30-49, and 50-100). In this case, they would want to make a dataclass for a range and use that as a key.

@dataclass
class Range:
    low: int
    high: str

{ Range(0,29): ["West Wing 200", "North Wing 110"],
  Range(30,49): ["North Wing 150", "South Wing 320"],
  ...
}

Since the fields of a range aren’t going to change, this is a reasonable datum to use as a key. To make Python accept this dictionary, however, we have to indicate that we will not be modifying the field values of Range. You do this with an additional piece on the dataclass annotation:

@dataclass(frozen=True)
class Range:
    low: int
    high: str

The frozen annotation says "the components of this class cannot be changed". If you try to assign to a component of a frozen class such as Flight, Python will produce an error. There actually are ways to use values with changing components as keys, but they are more advanced than this book covers. If you want to read details for yourself, look up __hash__ methods in Python.