If you’ve programmed before in a language like Scheme or the student levels of Racket (or the WeScheme programming environment), or for that matter even in certain parts of OCaml, Haskell, Scala, Erlang, Clojure, or other languages, you will find many parts of Pyret very familiar. This chapter is specifically written to help you make the transition from (student) Racket/Scheme/WeScheme (abbreviated “RSW”) to Pyret by showing you how to convert the syntax. Most of what we say applies to all these languages, though in some cases we will refer specifically to Racket (and WeScheme) features not found in Scheme.
In every example below, the two programs will produce the same results.
Numbers are very similar between the two. Like Scheme, Pyret implements arbitrary-precision numbers and rationals. Some of the more exotic numeric systems of Scheme (such as complex numbers) aren’t in Pyret; Pyret also treats imprecise numbers slightly differently.
Strings are also very similar, though Pyret allows you to use single-quotes as well.
"\"Hello\", he said"
"\"Hello\", he said"
Booleans have the same names:
Pyret uses an infix syntax, reminiscent of many other textual programming languages:
(+ 1 2)
(* (- 4 2) 5)
(/ 1 2 3 4)
(dist 3 4)
(check-expect 1 1)
The second way is this: as an alias for
check we can also write
examples. The two are functionally identical, but they capture
the human difference between examples (which explore the
problem, and are written before attempting a solution) and
tests (which try to find bugs in the solution, and are written
to probe its design).
whereblock to accompany a function definition. For instance:
fun double(n): n + n where: double(0) is 0 double(10) is 20 double(-1) is -2 end
check "squaring always produces non-negatives": (0 * 0) is 0 (-2 * -2) is 4 (3 * 3) is 9 end
Just as in Racket, there are many testing operators in Pyret (in
this-name(a variable) from
this - name(a subtraction expression) because the
-in the latter must be surrounded by spaces.
(define e^i*pi -1)
(define-struct pt (x y))
|in the middle:
data Point: | pt(x, y) end
|, resulting in the more readable
data Point: pt(x, y) end
;; A Point is either ;; - (pt number number), or ;; - (pt3d number number number)
data Point: | pt(x, y) | pt3d(x, y, z) end
(pt 1 2)
xfield of any value that has such a field, without attention to how it was constructed. Thus, we can use
.xon a value whether it was constructed by
pt3d(or indeed anything else with that field). In contrast,
casesdoes pay attention to this distinction.
There are several kinds of conditionals in Pyret, one more than in the Racket student languages.
General conditionals can be written using
if, corresponding to
if but with more syntax.
else if, which makes it possible to list a collection of questions at the same level of indentation, which if in Racket does not have. The corresponding code in Racket would be written
(cond [full-moon "howl"] [new-moon "bark"] [else "meow"])
ask, designed to parallel
ask: | full-moon then: "howl" | new-moon then: "bark" | otherwise: "meow" end
condto dispatch on a datatype:
(cond [(pt? v) (+ (pt-x v) (pt-y v))] [(pt3d? v) (+ (pt-x v) (pt-z v))])
ask: | is-pt(v) then: v.x + v.y | is-pt3d(v) then: v.x + v.z end
if is-pt(v): v.x + v.y else if is-pt3d(v): v.x + v.z end
cases (Point) v: | pt(x, y) => x + y | pt3d(x, y, z) => x + z end
Point, provides a clean syntactic way of identifying the different branches, and makes it possible to give a concise local name to each field position instead of having to use selectors like
.x. In general, in Pyret we prefer to use
casesto process data definitions. However, there are times when, for instance, there many variants of data but a function processes only very few of them. In such situations, it makes more sense to explicitly use predicates and selectors.
In Racket, depending on the language level, lists are created using
either cons or list, with empty for the empty
list. The corresponding notions in Pyret are called
link is a
two-argument function, just as in Racket:
(cons 1 empty)
(list 1 2 3)
Note that the syntax
[1, 2, 3], which represents lists in many
languages, is not legal in Pyret: lists are not privileged with
their own syntax. Rather, we must use an explicit constructor:
[list: 1, 2, 3] constructs a list,
[set: 1, 2,
3] constructs a set instead of a list.In fact, we can
create our own constructors
and use them with this syntax.
[1, 2, 3]and see the error message.
cases. There are two variants,
link(which we used to construct the lists):
r(for “first” and “rest”). Of course, this convention does not work if there are other things by the same name; in particular, when writing a nested destructuring of a list, we conventionally write
rr(for “first of the rest” and “rest of the rest”).
(lambda (x y) (+ x y))
; square: Number -> Number ; sort-nums: List<Number> -> List<Number> ; sort: List<T> * (T * T -> Boolean) -> List<T>
fun square(n :: Number) -> Number: ... fun sort-nums(l :: List<Number>) -> List<Number>: ... fun sort<T>(l :: List<T>, cmp :: (T, T -> Boolean)) -> List<T>: ...
If there are other parts of Scheme or Racket syntax that you would like to see translated, please let us know.
For the curious, we offer a few examples here to justify our frustration with Python for early programming.
Python exposes machine arithmetic by default. Thus, by default,
Pyret implements exact arithmetic, including rationals, by default. In
Understanding the difference between creating a variable and updating its value is a key learning outcome, along with understanding variables’ scopes. Python explicitly conflates declaration with update, and has a tangled history with scope.
Pyret is statically scoped, and goes to great lengths—
Python has a weakly-defined, optional mechanism of annotations that was added late in the language’s design, which conflates values and types.
Drawing on lessons learned from our several prior research projects on adding types to languages after-the-fact, Pyret was designed with typability from the start, with several subtle design choices to enable this. Pyret also has support (currently dynamic) for refinement-type annotations.
Python has weak built-in support for testing. While it has extensive professional libraries to test software, these impose a non-trivial burden on learners, as a result of which most introductory curricula do not use them.
First, a curriculum that proclaims reliability must put testing at its
heart. Second, our pedagogy places heavy emphasis on the use of examples, and
in particular the building-up of abstractions from concrete instances. For both
these reasons, Pyret has extensive support in the language itself—
Images are not values in the language. You can write a program to produce an image, but you can’t just view it in your programming environment.
Images are values. Pyret can print an image just like it can a string or a number (and why not?). Images are fun values, but they aren’t frivolous: they are especially useful for demystifying and explaining important but abstract issues like function composition.
The language doesn’t have a built-in notion of reactive programs.
Python’s error messages are not added with novices as a primary audience.
Novices make many errors. They can be especially intimidated by error reports, and can feel discouraged about causing errors. Thus, Pyret’s error messages are the result of nearly a decade of research. In fact, some educators have created pedagogic techniques that explicitly rely on the nature and presentation of information in Pyret’s errors.
Python has begun to suffer from complexity creep that we believe serves professionals at the expense of novices. For example, the result of map in Python is actually a special generator value. This can lead to outcomes requiring extra explanation, like map(str, [1, 2, 3]) producing <map object at 0x1045f4940>. Type hints (discussed above) are another example.
Since Pyret’s target audience is novice programmers programming in the style of this book, our primary goal when adding any feature is to preserve the early experience and avoid surprises.
Data definitions are central to computer science, but Python over-relies on built-in data structures (especially dictionaries) and makes user-defined ones unwieldy to create.
Pyret borrows from the rich tradition of languages like Standard ML, OCaml, and Haskell to provide algebraic datatypes, whose absence often forces programmers to engage in unwieldy (and inefficient) encoding tricks.
Python has several more rough corners that can lead to unexpected and
undesirable outcomes. For instance,
Pyret is designed from the ground-up to avoid all these problems.
This book (DCIC) is often compared to How to Design Programs (HtDP), from which it draws enormous inspiration. Here we briefly describe how the two books compare.
Both are built around the centrality of data structure. Both want to provide methods for designing programs. Both start with functional programming but transition to (and take very seriously) stateful imperative programming.
Both are built around languages carefully designed with education in mind. The languages provide special support for writing examples and tests; error reporting designed for beginners; built-in images and reactivity. The languages eschew weird gotchas (in a way that Python does not: see Pyret vs. Python or, if you want to read much more, this paper.
The most obvious is that DCIC is in Pyret. HtDP has tons of good ideas, all ignored because it Racket, whose syntax some people (especially some educators) dislike. We built Pyret to embody good ideas we’d learned from the Racket student languages and other good ideas of our own, but package them in a familiar syntax. But as you can see, the two languages are not actually that far apart: Pyret for Racketeers and Schemers.
The next most obvious thing is that DCIC also includes Python. HtDP has a (not formally published) follow-up that teaches program design in Java. In contrast, we wanted to integrate the transition to Python into DCIC itself. There’s much to be learned from the contrast! In particular, Pyret and its environment were carefully designed around pedagogic ideas for teaching state. Python was not, despite the ubiquity and difficulty of state! So there’s a lot to be gained, when introducing state, to contrast them.
Next, DCIC has a lot algorithmic content, whereas HtDP has almost none. DCIC covers, for instance, Big-O analysis [Predicting Growth]. It even has a section on amortized analysis [Halloween Analysis]. It goes up through some graph algorithms. This is far more advanced material than HtDP covers.
HtDP is built around a beautiful idea: the data structures shown grow in complexity in set-theoretic terms. Therefore it begins with atomic data, then has fixed-size data (structures), then unbounded collections (lists) of atomic data, pairs of lists, lists of structures, and so on. All built up, systematically, in a neat progression.
However, this has a downside. You have to imagine what the data represent (this number is an age, that string is a name, that list is of GDPs), but they’re idealized. In a way the most real data are actually images! After that (which come early), all the data are “virtualized” and imaginary.
Our view is that the most interesting data are lists of structures. (Remember those? They’re complicated and come some ways down the progression.) You might find this surprising; if so, we give you another name for them: tables. Tables are ubiquitous. Even companies process and publish them; even primary school students recognize and use them. They are perhaps our most important universal form of structured data.
Even better, lots of real-world data are provided as tables. You don’t have to
imagine things or make up fake GDPs like 1, 2, and 3. You can get actual
GDPs or populations or movie revenues or sports standings or whatever interests
you. (Ideally, cleansed and curated.) We believe that just about every
Buut there’s a big catch! A key feature of HtDP is that for every level of datatype, it provides a Design Recipe for programming over that datatype. Lists-of-structs are complex. So is their programming recipe. And we want to put them near the beginning! Furthermore, the Design Recipe is dangerous to ignore. Students struggle with blank pages and often fill them up with bad code, which they then get attached to. The Design Recipe provides structure, scaffolding, reviewability, and much more. It’s cognitively grounded in schemas.
So over the past few years, we’ve been working on different program design methods that address the same ends through different means. A lot of our recent education research has been putting new foundations in place. It’s very much work in progress. And DCIC is the distillation of those efforts. As we have new results, we’ll be weaving them into DCIC (and probably HtDP too). Stay tuned!
This is a summary of updates made with each release of the book (excluding typos and other minor fixes).
Consistently renamed the definitions and interactions window to the definitions and interactions pane.
Moved the material on working with variables out of the intro to Python section and into the Programming with State section. Mutation of structured data moved before variable mutation within the Programming with State section.
Added a comparison between DCIC and HtDP.
The include line for the DCIC libraries at this version is
include shared-gdrive( "dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep")
Version 2021-08-21 – the original release
The bandwidth between two network nodes is the quantity of data that can be transferred in a unit of time between the nodes.
A cache is an instance of a ☛ space-time tradeoff: it trades space for time by using the space to avoid recomputing an answer. The act of using a cache is called caching. The word “cache” is often used loosely; we use it only for information that can be perfectly reconstructed even if it were lost: this enables a program that needs to reverse the trade—
i.e., use less space in return for more time— to do so safely, knowing it will lose no information and thus not sacrifice correctness.
Coinduction is a proof principle for mathematical structures that are equipped with methods of observation rather than of construction. Conversely, functions over inductive data take them apart; functions over coinductive data construct them. The classic tutorial on the topic will be useful to mathematically sophisticated readers.
An idempotent operator is one whose repeated application to any value in its domain yields the same result as a single application (note that this implies the range is a subset of the domain). Thus, a function \(f\) is idempotent if, for all \(x\) in its domain, \(f(f(x)) = f(x)\) (and by induction this holds for additional applications of \(f\)).
Invariants are assertions about programs that are intended to always be true (“in-vary-ant”—
never varying). For instance, a sorting routine may have as an invariant that the list it returns is sorted.
The latency between two network nodes is the time it takes for packets to go between the nodes.
A metasyntactic variable is one that lives outside the language, and ranges over a fragment of syntax. For instance, if we write “for expressions
e2, the sum
e1 + e2”, we do not mean the programmer literally wrote “
e1” in the program; rather we are using
e1to refer to whatever the programmer might write on the left of the addition sign. Therefore,
At the machine level, a packed representation is one that ignores traditional alignment boundaries (in older or smaller machines, bytes; on most contemporary machines, words) to let multiple values fit inside or even spill over the boundary.
For instance, say we wish to store a vector of four values, each of which represents one of four options. A traditional representation would store one value per alignment boundary, thereby consuming four units of memory. A packed representation would recognize that each value requires two bits, and four of them can fit into eight bits, so a single byte can hold all four values. Suppose instead we wished to store four values representing five options each, therefore requiring three bits for each value. A byte- or word-aligned representation would not fundamentally change, but the packed representation would use two bytes to store the twelve bits, even permitting the third value’s three bits to be split across a byte boundary.
Of course, packed representations have a cost. Extracting the values requires more careful and complex operations. Thus, they represent a classic ☛ space-time tradeoff: using more time to shrink space consumption. More subtly, packed representations can confound certain run-time systems that may have expected data to be aligned.
Parsing is, very broadly speaking, the act of converting content in one kind of structured input into content in another. The structures could be very similar, but usually they are quite different. Often, the input format is simple while the output format is expected to capture rich information about the content of the input. For instance, the input might be a linear sequence of characters on an input stream, and the output might be expected to be rich and tree-structured according to some datatype: most program and natural-language parsers are faced with this task.
Reduction is a relationship between a pair of situations—
problems, functions, data structures, etc.— where one is defined in terms of the other. A reduction R is a function from situations of the form P to ones of the form Q if, for every instance of P, R can construct an instance of Q such that it preserves the meaning of P. Note that the converse strictly does not need to hold.
Suppose you have an expensive computation that always produces the same answer for a given set of inputs. Once you have computed the answer once, you now have a choice: store the answer so that you can simply look it up when you need it again, or throw it away and re-compute it the next time. The former uses more space, but saves time; the latter uses less space, but consumes more time. This, at its heart, is the space-time tradeoff. Memoization [Avoiding Recomputation by Remembering Answers] and using a ☛ cache are both instances of it.
Type variables are identifiers in the type language that (usually) range over actual types.
A notation used to transmit data across, as opposed to within, a closed platform (such as a virtual machine). These are usually expected to be relatively simple because they must be implemented in many languages and on weak processes. They are also expected to be unambiguous to aid simple, fast, and correct parsing. Popular examples include XML, JSON, and s-expressions.