27 Appendices
27.1 Pyret for Racketeers and Schemers
If you’ve programmed before in a language like Scheme or the student levels of Racket (or the WeScheme programming environment), or for that matter even in certain parts of OCaml, Haskell, Scala, Erlang, Clojure, or other languages, you will find many parts of Pyret very familiar. This chapter is specifically written to help you make the transition from (student) Racket/Scheme/WeScheme (abbreviated “RSW”) to Pyret by showing you how to convert the syntax. Most of what we say applies to all these languages, though in some cases we will refer specifically to Racket (and WeScheme) features not found in Scheme.
In every example below, the two programs will produce the same results.
27.1.1 Numbers, Strings, and Booleans
Numbers are very similar between the two. Like Scheme, Pyret implements arbitrary-precision numbers and rationals. Some of the more exotic numeric systems of Scheme (such as complex numbers) aren’t in Pyret; Pyret also treats imprecise numbers slightly differently.
RSW | Pyret |
1 |
|
RSW | Pyret |
1/2 |
|
RSW | Pyret |
#i3.14 |
|
Strings are also very similar, though Pyret allows you to use single-quotes as well.
RSW | Pyret |
"Hello, world!" |
|
RSW | Pyret |
"\"Hello\", he said" |
|
RSW | Pyret |
"\"Hello\", he said" |
|
Booleans have the same names:
RSW | Pyret |
true |
|
RSW | Pyret |
false |
|
27.1.2 Infix Expressions
Pyret uses an infix syntax, reminiscent of many other textual programming languages:
RSW | Pyret |
(+ 1 2) |
|
RSW | Pyret |
(* (- 4 2) 5) |
|
RSW | Pyret |
(/ 1 2 3 4) |
|
27.1.3 Function Definition and Application
RSW | Pyret |
(dist 3 4) |
|
RSW | Pyret | |||
|
|
27.1.4 Tests
RSW | Pyret |
(check-expect 1 1) |
|
RSW | Pyret | ||
|
|
The second way is this: as an alias for check
we can also write
examples
. The two are functionally identical, but they capture
the human difference between examples (which explore the
problem, and are written before attempting a solution) and
tests (which try to find bugs in the solution, and are written
to probe its design).
where
block to accompany a function
definition. For instance:
fun double(n):
n + n
where:
double(0) is 0
double(10) is 20
double(-1) is -2
end
check "squaring always produces non-negatives":
(0 * 0) is 0
(-2 * -2) is 4
(3 * 3) is 9
end
Just as in Racket, there are many testing operators in Pyret (in
addition to is
). See
the
documentation.
27.1.5 Variable Names
RSW | Pyret |
this-is-a-name |
|
this-name
(a variable) from
this - name
(a subtraction expression) because the -
in
the latter must be surrounded by spaces.
(define e^i*pi -1)
27.1.6 Data Definitions
RSW | Pyret |
(define-struct pt (x y)) |
|
|
in the middle:
data Point: | pt(x, y) end
|
, resulting
in the more readable
data Point: pt(x, y) end
;; A Point is either ;; - (pt number number), or ;; - (pt3d number number number)
data Point:
| pt(x, y)
| pt3d(x, y, z)
end
RSW | Pyret |
(pt 1 2) |
|
RSW | Pyret |
(pt? x) | is-pt(x) |
cases
):
RSW | Pyret |
(pt-x v) |
|
.x
extracts an x
field of any value
that has such a field, without attention to how it was
constructed. Thus, we can use .x
on a value whether it was
constructed by pt
or pt3d
(or indeed anything else with
that field). In contrast, cases
does pay attention to this
distinction.27.1.7 Conditionals
There are several kinds of conditionals in Pyret, one more than in the Racket student languages.
General conditionals can be written using if
, corresponding to
Racket’s if
but with more syntax.
RSW | Pyret | |||
|
|
RSW | Pyret | |||||
|
|
if
includes else if
, which makes it possible
to list a collection of questions at the same level of indentation,
which if in Racket does not have. The corresponding code in
Racket would be written
(cond [full-moon "howl"] [new-moon "bark"] [else "meow"])
ask
, designed to parallel cond
:
ask:
| full-moon then: "howl"
| new-moon then: "bark"
| otherwise: "meow"
end
cond
to dispatch on a datatype:
(cond [(pt? v) (+ (pt-x v) (pt-y v))] [(pt3d? v) (+ (pt-x v) (pt-z v))])
ask:
| is-pt(v) then: v.x + v.y
| is-pt3d(v) then: v.x + v.z
end
if is-pt(v):
v.x + v.y
else if is-pt3d(v):
v.x + v.z
end
cases (Point) v:
| pt(x, y) => x + y
| pt3d(x, y, z) => x + z
end
v
is a Point
, provides a clean
syntactic way of identifying the different branches, and makes
it possible to give a concise local name to each field position
instead of having to use selectors like .x
. In general, in
Pyret we prefer to use cases
to process data
definitions. However, there are times when, for instance, there many
variants of data but a function processes only very few of them. In
such situations, it makes more sense to explicitly use predicates and
selectors.27.1.8 Lists
In Racket, depending on the language level, lists are created using
either cons or list, with empty for the empty
list. The corresponding notions in Pyret are called link
,
list
, and empty
, respectively. link
is a
two-argument function, just as in Racket:
RSW | Pyret |
(cons 1 empty) |
|
RSW | Pyret |
(list 1 2 3) |
|
Note that the syntax [1, 2, 3]
, which represents lists in many
languages, is not legal in Pyret: lists are not privileged with
their own syntax. Rather, we must use an explicit constructor:
just as [list: 1, 2, 3]
constructs a list, [set: 1, 2,
3]
constructs a set instead of a list.In fact, we can
create our own constructors
and use them with this syntax.
Exercise
Try typing
[1, 2, 3]
and see the error message.
cases
. There are two variants, empty
and link
(which we used to construct the lists):
RSW | Pyret | |||||
|
|
f
and r
(for
“first” and “rest”). Of course, this convention does not work if
there are other things by the same name; in particular, when writing a
nested destructuring of a list, we conventionally write fr
and
rr
(for “first of the rest” and “rest of the rest”).27.1.9 First-Class Functions
lam
:
RSW | Pyret |
(lambda (x y) (+ x y)) |
|
27.1.10 Annotations
; square: Number -> Number ; sort-nums: List<Number> -> List<Number> ; sort: List<T> * (T * T -> Boolean) -> List<T>
fun square(n :: Number) -> Number: ...
fun sort-nums(l :: List<Number>) -> List<Number>: ...
fun sort<T>(l :: List<T>, cmp :: (T, T -> Boolean)) -> List<T>: ...
27.1.11 What Else?
If there are other parts of Scheme or Racket syntax that you would like to see translated, please let us know.
27.2 Pyret vs. Python
For the curious, we offer a few examples here to justify our frustration with Python for early programming.
Python | Pyret |
Python exposes machine arithmetic by default. Thus, by default,
| Pyret implements exact arithmetic, including rationals, by default. In
Pyret, |
Python | Pyret |
Understanding the difference between creating a variable and updating its value is a key learning outcome, along with understanding variables’ scopes. Python explicitly conflates declaration with update, and has a tangled history with scope. | Pyret is statically scoped, and goes to great lengths— |
Python | Pyret |
Python has a weakly-defined, optional mechanism of annotations that was added late in the language’s design, which conflates values and types. | Drawing on lessons learned from our several prior research projects on adding types to languages after-the-fact, Pyret was designed with typability from the start, with several subtle design choices to enable this. Pyret also has support (currently dynamic) for refinement-type annotations. |
Python | Pyret |
Python’s annotation mechanism has no notion of refinements. | To prepare students for modern programming languages with rich type systems, Pyret’s annotation syntax supports refinements. However, these are checked dynamically, so that students do not need to satisfy the vagaries of any particular proof assistant. |
Python | Pyret |
Python has weak built-in support for testing. While it has extensive professional libraries to test software, these impose a non-trivial burden on learners, as a result of which most introductory curricula do not use them. | First, a curriculum that proclaims reliability must put testing at its
heart. Second, our pedagogy places heavy emphasis on the use of examples, and
in particular the building-up of abstractions from concrete instances. For both
these reasons, Pyret has extensive support in the language itself— |
Python | Pyret |
Modern testing goes well beyond unit-tests. Furthermore, property-based testing is a very useful gateway to thinking about formal properties. In Python, this is only available through libraries. | Pyret has convenient language features— |
Python | Pyret |
State is ubiquitous in libraries. | State is an important but also complicated part of programming. Pyret nudges students to program without state while still permitting the full range of stateful programming. This comes with safeguards both linguistic (e.g., variables are immutable unless declared otherwise) and in output (e.g., mutable fields are displayed to alert the student that the value may change or may even have already changed). |
Python | Pyret |
Equality comparison is simplistic and in line with most other professional languages. | Equality is in fact subtle, and useful as a pedagogic device. Therefore, Pyret has a carefully-designed family of equality operators that are not only of practical value but also have pedagogic use. |
Python | Pyret |
Images are not values in the language. You can write a program to produce an image, but you can’t just view it in your programming environment. | Images are values. Pyret can print an image just like it can a string or a number (and why not?). Images are fun values, but they aren’t frivolous: they are especially useful for demystifying and explaining important but abstract issues like function composition. |
Python | Pyret |
The language doesn’t have a built-in notion of reactive programs. | Reactivity is a core concept in the language, and the subject of both design and implementation research. |
Python | Pyret |
Python’s error messages are not added with novices as a primary audience. | Novices make many errors. They can be especially intimidated by error reports, and can feel discouraged about causing errors. Thus, Pyret’s error messages are the result of nearly a decade of research. In fact, some educators have created pedagogic techniques that explicitly rely on the nature and presentation of information in Pyret’s errors. |
Python | Pyret |
Python has begun to suffer from complexity creep that we believe serves professionals at the expense of novices. For example, the result of map in Python is actually a special generator value. This can lead to outcomes requiring extra explanation, like map(str, [1, 2, 3]) producing <map object at 0x1045f4940>. Type hints (discussed above) are another example. | Since Pyret’s target audience is novice programmers programming in the style of this book, our primary goal when adding any feature is to preserve the early experience and avoid surprises. |
Python | Pyret |
Data definitions are central to computer science, but Python over-relies on built-in data structures (especially dictionaries) and makes user-defined ones unwieldy to create. | Pyret borrows from the rich tradition of languages like Standard ML, OCaml, and Haskell to provide algebraic datatypes, whose absence often forces programmers to engage in unwieldy (and inefficient) encoding tricks. |
Python | Pyret |
Python has several more rough corners that can lead to unexpected and
undesirable outcomes. For instance, | Pyret is designed from the ground-up to avoid all these problems. |
27.3 Comparing This Book to HtDP
This book (DCIC) is often compared to How to Design Programs (HtDP), from which it draws enormous inspiration. Here we briefly describe how the two books compare.
Both are built around the centrality of data structure. Both want to provide methods for designing programs. Both start with functional programming but transition to (and take very seriously) stateful imperative programming.
Both are built around languages carefully designed with education in mind. The languages provide special support for writing examples and tests; error reporting designed for beginners; built-in images and reactivity. The languages eschew weird gotchas (in a way that Python does not: see Pyret vs. Python or, if you want to read much more, this paper).
The most obvious is that DCIC is in Pyret. HtDP has tons of good ideas, all ignored because it uses Racket, whose syntax some people (especially some educators) dislike. We built Pyret to embody good ideas we’d learned from the Racket student languages and other good ideas of our own, but package them in a familiar syntax. But as you can see, the two languages are not actually that far apart: Pyret for Racketeers and Schemers.
The next most obvious thing is that DCIC also includes Python. HtDP has a (not formally published) follow-up that teaches program design in Java. In contrast, we wanted to integrate the transition to Python into DCIC itself. There’s much to be learned from the contrast! In particular, Pyret and its environment were carefully designed around pedagogic ideas for teaching state. Python was not, despite the ubiquity and difficulty of state! So there’s a lot to be gained, when introducing state, to contrast them.
Next, DCIC has a lot algorithmic content, whereas HtDP has almost none. DCIC covers, for instance, Big-O analysis [Predicting Growth]. It even has a section on amortized analysis [Halloween Analysis]. It goes up through some graph algorithms. This is far more advanced material than HtDP covers.
HtDP is built around a beautiful idea: the data structures shown grow in complexity in set-theoretic terms. Therefore it begins with atomic data, then has fixed-size data (structures), then unbounded collections (lists) of atomic data, pairs of lists, lists of structures, and so on. All built up, systematically, in a neat progression.
However, this has a downside. You have to imagine what the data represent (this number is an age, that string is a name, that list is of GDPs), but they’re idealized. In a way the most real data are actually images! After that (which come early), all the data are “virtualized” and imaginary.
Our view is that the most interesting data are lists of structures. (Remember those? They’re complicated and come some ways down the progression.) You might find this surprising; if so, we give you another name for them: tables. Tables are ubiquitous. Even companies process and publish them; even primary school students recognize and use them. They are perhaps our most important universal form of structured data.
Even better, lots of real-world data are provided as tables. You don’t have to
imagine things or make up fake GDPs like 1, 2, and 3. You can get actual
GDPs or populations or movie revenues or sports standings or whatever interests
you. (Ideally, cleansed and curated.) We believe that just about every
student—
Buut there’s a big catch! A key feature of HtDP is that for every level of datatype, it provides a Design Recipe for programming over that datatype. Lists-of-structs are complex. So is their programming recipe. And we want to put them near the beginning! Furthermore, the Design Recipe is dangerous to ignore. Students struggle with blank pages and often fill them up with bad code, which they then get attached to. The Design Recipe provides structure, scaffolding, reviewability, and much more. It’s cognitively grounded in schemas.
So over the past few years, we’ve been working on different program design methods that address the same ends through different means. A lot of our recent education research has been putting new foundations in place. It’s very much work in progress. And DCIC is the distillation of those efforts. As we have new results, we’ll be weaving them into DCIC (and probably HtDP too). Stay tuned!
27.4 Release Notes
This is a summary of updates made with each release of the book (excluding typos and other minor fixes).
Version 2023-02-21
The book has been broken down into a collection of booklets, to give it a clearer structure and organization. See Organization of the Material for details.
Several huge chapters in the earlier version have been broken down into smaller chapters and split across booklets.
The Introduction to Programming material has been substantially revised and expanded.
The ordering of materials has changed. The material on Algorithm Analysis has been moved to after the Python material.
The From Pyret to Python transition has been improved substantially.
There is now a chapter on using Pandas that builds off the corresponding Pyret example on tables.
Python dictionaries have moved into From Pyret to Python. Even though we don’t cover Pyret’s dictionaries, having the dictionaries material come before state is a more natural flow with regards to students learning Python. We then develop the implementation of dictionaries using state in Hashes, Sets, and Key-Values.
There is now a whole new unified approach to Programming With State that shows the Pyret and Python versions side-by-side. Seeing this material in two different languages helps focus on similarities and differences and can improve transfer between languages.
Material that depends on algorithm analysis has now been separated from material that does not. Furthermore, in keeping with the book’s theme, the focus is (again) on data structures. The result, in Data Structures with Analysis, refactors a lot of prior material to present it much more cleanly.
Some material has further been refactored into Advanced Topics. In addition, there are several more new advanced goodies!
Version 2022-08-28
Besides numerous small improvements, we added some new bonus material.
Version 2022-01-25
Consistently renamed the definitions and interactions window to the definitions and interactions pane.
Moved the material on working with variables out of the intro to Python section and into the Programming with State section. Mutation of structured data moved before variable mutation within the Programming with State section.
Added a comparison to How to Design Programs.
The include line for the DCIC libraries at this version is
include shared-gdrive( "dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep")
Version 2021-08-21
The original release! Based on the prior book Programming and Programming Languages.
27.5 Glossary
The bandwidth between two network nodes is the quantity of data that can be transferred in a unit of time between the nodes.
A cache is an instance of a ☛ space-time tradeoff: it trades space for time by using the space to avoid recomputing an answer. The act of using a cache is called caching. The word “cache” is often used loosely; we use it only for information that can be perfectly reconstructed even if it were lost: this enables a program that needs to reverse the trade—
i.e., use less space in return for more time— to do so safely, knowing it will lose no information and thus not sacrifice correctness.
Coinduction is a proof principle for mathematical structures that are equipped with methods of observation rather than of construction. Conversely, functions over inductive data take them apart; functions over coinductive data construct them. The classic tutorial on the topic will be useful to mathematically sophisticated readers.
An idempotent operator is one whose repeated application to any value in its domain yields the same result as a single application (note that this implies the range is a subset of the domain). Thus, a function \(f\) is idempotent if, for all \(x\) in its domain, \(f(f(x)) = f(x)\) (and by induction this holds for additional applications of \(f\)).
Invariants are assertions about programs that are intended to always be true (“in-vary-ant”—
never varying). For instance, a sorting routine may have as an invariant that the list it returns is sorted.
The latency between two network nodes is the time it takes for packets to go between the nodes.
A metasyntactic variable is one that lives outside the language, and ranges over a fragment of syntax. For instance, if we write “for expressions
e1
ande2
, the sume1 + e2
”, we do not mean the programmer literally wrote “e1
” in the program; rather we are usinge1
to refer to whatever the programmer might write on the left of the addition sign. Therefore,e1
is metasyntax.
At the machine level, a packed representation is one that ignores traditional alignment boundaries (in older or smaller machines, bytes; on most contemporary machines, words) to let multiple values fit inside or even spill over the boundary.
For instance, say we wish to store a vector of four values, each of which represents one of four options. A traditional representation would store one value per alignment boundary, thereby consuming four units of memory. A packed representation would recognize that each value requires two bits, and four of them can fit into eight bits, so a single byte can hold all four values. Suppose instead we wished to store four values representing five options each, therefore requiring three bits for each value. A byte- or word-aligned representation would not fundamentally change, but the packed representation would use two bytes to store the twelve bits, even permitting the third value’s three bits to be split across a byte boundary.
Of course, packed representations have a cost. Extracting the values requires more careful and complex operations. Thus, they represent a classic ☛ space-time tradeoff: using more time to shrink space consumption. More subtly, packed representations can confound certain run-time systems that may have expected data to be aligned.
Parsing is, very broadly speaking, the act of converting content in one kind of structured input into content in another. The structures could be very similar, but usually they are quite different. Often, the input format is simple while the output format is expected to capture rich information about the content of the input. For instance, the input might be a linear sequence of characters on an input stream, and the output might be expected to be rich and tree-structured according to some datatype: most program and natural-language parsers are faced with this task.
Reduction is a relationship between a pair of situations—
problems, functions, data structures, etc.— where one is defined in terms of the other. A reduction R is a function from situations of the form P to ones of the form Q if, for every instance of P, R can construct an instance of Q such that it preserves the meaning of P. Note that the converse strictly does not need to hold.
Suppose you have an expensive computation that always produces the same answer for a given set of inputs. Once you have computed the answer once, you now have a choice: store the answer so that you can simply look it up when you need it again, or throw it away and re-compute it the next time. The former uses more space, but saves time; the latter uses less space, but consumes more time. This, at its heart, is the space-time tradeoff. Memoization [Avoiding Recomputation by Remembering Answers] and using a ☛ cache are both instances of it.
Type variables are identifiers in the type language that (usually) range over actual types.
A notation used to transmit data across, as opposed to within, a closed platform (such as a virtual machine). These are usually expected to be relatively simple because they must be implemented in many languages and on weak processes. They are also expected to be unambiguous to aid simple, fast, and correct parsing. Popular examples include XML, JSON, and s-expressions.