### 14Trees

#### 14.1Data Design Problem – Ancestry Data

Imagine that we wanted to manage ancestry information for purposes of a medical research study. Specifically, we want to record people’s birthyear, eye colors, and genetic parents. Here’s a sample table of such data, with one row for each person:

ancestors = table: name, birthyear, eyecolor, female-parent, male-parent
row: "Anna", 1997, "blue", "Susan", "Charlie"
row: "Susan", 1971, "blue", "Ellen", "Bill"
row: "Charlie", 1972, "green", "", ""
row: "Ellen", 1945, "brown", "Laura", "John"
row: "John", 1922, "brown", "", "Robert"
row: "Laura", 1922, "brown", "", ""
row: "Robert", 1895, "blue", "", ""
end

For our research, we want to be able to answer questions such as the following:

• Who are the genetic grandparents of a specific person?

• How frequent is each eye color?

• Is one specific person an ancestor of another specific person?

• How many generations do we have information for?

• Does one’s eye color correlate with the ages of their genetic parents when they were born?

Do Now!

How would you compute a list of the known grandparents for a given person? For purposes of this chapter, you may assume that each person has a unique name (while this isn’t realistic in practice, it will simplify our computations for the time being; we will revisit it later in the chapter).

(Hint: Make a task plan. Does it suggest any particular helper functions?)

Our task plan has two key steps: find the names of the genetic parents of the named person, then find the names of the parents of each of those people. Both steps share the need to compute the known parents from a name, so we should create a helper function for that (we’ll call it parents-of). Since this sounds like a routine table program, we can use it for a bit of review:

##### 14.1.1Computing Genetic Parents from an Ancestry Table

How do we compute a list of someone’s genetic parents? Let’s sketch a task plan for that:

• filter the table to find the person

• extract the name of the female parent

• extract the name of the male parent

• make a list of those names

These are tasks we have seen before, so we can translate this plan directly into code:

fun parents-of(t :: Table, who :: String) -> List<String>:
doc: "Return list of names of known parents of given name"
matches = filter-with(t, lam(r): r["name"] == who end)
if matches.length() > 0:
person-row = matches.row-n(0)
[list:
person-row["female-parent"],
person-row["male-parent"]]
else:
empty
end
where:
parents-of(ancestors, "Anna")
is [list: "Susan", "Charlie"]
parents-of(ancestors, "Kathi") is empty
end

Do Now!

Are you satisfied with this program? With the examples included in the where block? Write down any critiques you have.

There are arguably some issues here. How many of these did you catch?

• The examples are weak: none of them consider people for whom we are missing information on at least one parent.

• The list of names returned in the case of an unknown parent includes the empty string, which isn’t actually a name. This could cause problems if we use this list of names in a subsequent computation (such as to compute the names of someone’s grandparents).

• If empty strings are not part of the output list, then we’d get the same result from asking for the parents of "Robert" (who is in the table) as for "Kathi" (who is not). These are fundamentally different cases, which arguably demand different outputs so we can tell them apart.

To fix these problems, we need to remove the empty strings from the produced list of parents and return something other than the empty list when a name is not in the table. Since the output of this function is a list of strings, it’s hard to see what to return that couldn’t be confused for a valid list of names. Our solution for now is to have Pyret throw an error (like the ones you get when Pyret is not able to finish running your program). Here’s a solution that handles both problems:

fun parents-of(t :: Table, who :: String) -> List<String>:
doc: "Return list of names of known parents of given name"
matches = filter-with(t, lam(r): r["name"] == who end)
if matches.length() > 0:
person-row = matches.row-n(0)
names =
[list: person-row["female-parent"],
person-row["male-parent"]]
L.filter(lam(n): not(n == "") end, names)
else:
raise("No such person " + who)
end
where:
parents-of(ancestors, "Anna") is [list: "Susan", "Charlie"]
parents-of(ancestors, "John") is [list: "Robert"]
parents-of(ancestors, "Robert") is empty
parents-of(ancestors, "Kathi") raises "No such person"
end

The raise construct tells Pyret to halt the program and produce an error message. The error message does not have to match the expected output type of the program. If you run this function with a name that is not in the table, you’ll see an error appear in the interactions pane, with no result returned.

Within the where block, we see how to check whether an expression will yield an error: instead of using is to check the equality of values, we use raises to check whether the provided string is a sub-string of the actual error produced by the program.

##### 14.1.2Computing Grandparents from an Ancestry Table

Once we have the parents-of function, we should be able to compute the grandparents by computing parents of parents, as follows:

fun grandparents-of(anc-table: Table, person: String) -> List[String]:
doc: "compute list of known grandparents in the table"
# glue together lists of mother's parents and father's parents
plist = parents-of(anc-table, person) # gives a list of two names
parents-of(anc-table, plist.first) +
parents-of(anc-table, plist.rest.first)
where:
grandparents("Anna") is [list: "Laura", "John"]
grandparents("Laura") is [list:]
grandparents("Kathi") is [list:]
end

Do Now!

Look back at our sample ancestry tree: for which people would this correctly compute the list of grandparents?

This grandparents-of code works fine for someone who has both parents in the table. For someone without two parents, however, the plist will have fewer than two names, so the expression plist.rest.first (if not plist.first) will yield an error.

Here’s a version that checks the number of parents before computing the set of grandparents:

fun grandparents-of(anc-table :: Table, name :: String) -> List<String>:
doc: "compute list of known grandparents in the table"
# glue together lists of mother's parents and father's parents
plist = parents-of(anc-table, name) # gives a list of two names
if plist.length == 2:
parents-of(anc-table, plist.first) + parents-of(anc-table, plist.rest.first)
else if plist.length == 1:
parents-of(anc-table, plist.first)
else: empty
end
end

What if we now wanted to gather up all of someone’s ancestors? Since we don’t know how many generations there are, we’d need to use recursion. This approach would also be expensive, since we’d end up filtering over the table over and over, which checks every row of the table in each use of filter.

Look back at the ancestry tree picture. We don’t do any complicated filtering there – we just follow the line in the picture immediately from a person to their mother or father. Can we get that idea in code instead? Yes, through datatypes.

##### 14.1.3Creating a Datatype for Ancestor Trees

For this approach, we want to create a datatype for Ancestor Trees that has a variant (constructor) for setting up a person. Look back at our picture – what information makes up a person? Their name, their mother, and their father (along with birthyear and eyecolor, which aren’t shown in the picture). This suggests the following datatype, which basically turns a row into a person value:

data AncTree:
| person(
name :: String,
birthyear :: Number,
eye :: String,
mother :: ________,
father :: ________
)
end

For example, anna’s row might look like:

anna-row = person("Anna", 1997, "blue", ???, ???)

What type do we put in the blanks? A quick brainstorm yields several ideas:
• person

• List<person>

• some new datatype

• AncTree

• String

Which should it be?

If we use a String, we’re back to the table row, and we don’t end up with a way to easily get from one person to another. We should therefore make this an AncTree.

data AncTree:
| person(
name :: String,
birthyear :: Number,
eye :: String,
mother :: AncTree,
father :: AncTree
)
end

Do Now!

Write the AncTree starting from Anna using this definition.

Did you get stuck? What do we do when we run out of known people? To handle that, we must add an option in the AncTree definition to capture people for whom we don’t know anything.

data AncTree:
| noInfo
| person(
name :: String,
birthyear :: Number,
eye :: String,
mother :: AncTree,
father :: AncTree
)
end

Here’s Anna’s tree written in this datatype:

anna-tree =
person("Anna", 1997, "blue",
person("Susan", 1971, "blue",
person("Ellen", 1945, "brown",
person("Laura", 1920, "blue", noInfo, noInfo),
person("John", 1920, "green",
noInfo,
person("Robert", 1893, "brown", noInfo, noInfo))),
person("Bill", 1946, "blue", noInfo, noInfo)),
person("Charlie", 1972, "green", noInfo, noInfo))

We could also have named each person data individually.

robert-tree = person("Robert", 1893, "brown", noInfo, noInfo)
laura-tree = person("Laura", 1920, "blue", noInfo, noInfo)
john-tree = person("John", 1920, "green", noInfo, robert-tree)
ellen-tree = person("Ellen", 1945, "brown", laura-tree, john-tree)
bill-tree = person("Bill", 1946, "blue", noInfo, noInfo)
susan-tree = person("Susan", 1971, "blue", ellen-tree, bill-tree)
charlie-tree = person("Charlie", 1972, "green", noInfo, noInfo)
anna-tree2 = person("Anna", 1997, "blue", susan-tree, charlie-tree)

The latter gives you pieces of the tree to use as other examples, but loses the structure that is visible in the indentation of the first version. You could get to pieces of the first version by digging into the data, such as writing anna-tree.mother.mother to get to the tree starting from "Ellen".

Here’s the parents-of function written against AncTree:

fun parents-of-tree(tr :: AncTree) -> List<String>:
cases (AncTree) tr:
| noInfo => empty
| person(n, y, e, m, f) => [list: m.name, f.name]
# person bit more complicated if parent is missing
end
end

#### 14.2Programs to Process Ancestor Trees

How would we write a function to determine whether anyone in the tree had a particular name? To be clear, we are trying to fill in the following code:

fun in-tree(at :: AncTree, name :: String) -> Boolean:
doc: "determine whether name is in the tree"
...

How do we get started? Add some examples, remembering to check both cases of the AncTree definition:

fun in-tree(at :: AncTree, name :: String) -> Boolean:
doc: "determine whether name is in the tree"
...
where:
in-tree(anna-tree, "Anna") is true
in-tree(anna-tree, "Ellen") is true
in-tree(ellen-tree, "Anna") is false
in-tree(noInfo, "Ellen") is false
end

What next? When we were working on lists, we talked about the template, a skeleton of code that we knew we could write based on the structure of the data. The template names the pieces of each kind of data, and makes recursive calls on pieces that have the same type. Here’s the template over the AncTree filled in:

fun in-tree(at :: AncTree, name :: String) -> Boolean:
doc: "determine whether name is in the tree"
cases (AncTree) at:     # comes from AncTree being data with cases
| noInfo => ...
| person(n, y, e, m, f) => ... in-tree(m, name) ... in-tree(f, name)
end
where:
in-tree(anna-tree, "Anna") is true
in-tree(anna-tree, "Ellen") is true
in-tree(ellen-tree, "Anna") is false
in-tree(noInfo, "Ellen") is false
end

To finish the code, we need to think about how to fill in the ellipses.

• When the tree is noInfo, it has no more people, so the answer should be false (as worked out in the examples).

• When the tree is a person, there are three possibilities: we could be at a person with the name we’re looking for, or the name could be in the mother’s tree, or the name could be in the father’s tree.

We know how to check whether the person’s name matches the one we are looking for. The recursive calls already ask about the name being in the mother’s tree or father’s tree. We just need to combine those pieces into one Boolean answer. Since there are three possibilities, we should combine them with or

Here’s the final code:

fun in-tree(at :: AncTree, name :: String) -> Boolean:
doc: "determine whether name is in the tree"
cases (AncTree) at:     # comes from AncTree being data with cases
| noInfo => false
| person(n, y, e, m, f) => (name == n) or in-tree(m, name) or in-tree(f, name)
# n is the same as at.name
# m is the same as at.mother
end
where:
in-tree(anna-tree, "Anna") is true
in-tree(anna-tree, "Ellen") is true
in-tree(ellen-tree, "Anna") is false
in-tree(noInfo, "Ellen") is false
end

#### 14.3Summarizing How to Approach Tree Problems

We design tree programs using the same design recipe that we covered on lists:

Strategy: Writing a Program Over Trees

• Write the datatype for your tree, including a base/leaf case

• Write examples of your trees for use in testing

• Write the function name, parameters, and types (the fun line)

• Write where checks for your code

• Write the template, including the cases and recursive calls. Here’s the template again for an ancestor tree, for an arbitrary function called treeF:

fun treeF(name :: String, t :: AncTree) -> Boolean:
cases (AncTree) anct:
| unknown => ...
| person(n, y, e, m, f) =>
... treeF(name, m) ... treeF(name, f)
end
end

• Fill in the template with details specific to the problem

#### 14.4Study Questions

• Think of writing in-tree on a table (using filter-by) vs writing it on a tree. How many times might each approach compare the name being sought against a name in the table/tree?

• Why do we need to use a recursive function to process the tree?

• In what order will we check the names in the tree version?

For practice, try problems such as

• How many blue-eyed people are in the tree?

• How many people are in the tree?

• How many generations are in the tree?

• How many people have a given name in a tree?

• How many people have names starting with "A"?

• ... and so on