I Introduction
1 Overview
1.1 What This Book is About
This book is an introduction to computer science. It will teach you to program, and do so in ways that are of practical value and importance. However, it will also go beyond programming to computer science, a rich, deep, fascinating, and beautiful intellectual discipline. You will learn many useful things that you can apply right away, but we will also show you some of what lies beneath and beyond.
Most of all, we want to give you ways of thinking about solving problems using computation. Some of these ways are technical methods, such as working from data and examples to construct solutions to problems. Others are scientific methods, such as ways of making sure that programs are reliable and do what they claim. Finally, some are social, thinking about the impacts that programs have on people.
1.2 The Values That Drive This Book
Software is not written only to be run. It must also be written to be read and maintained by others. Often, that “other” person is you, six months later, who has forgotten what they did and why.
Programmers are responsible for their software meeting its desired goals and being reliable. This is reflected in a variety of disciplines inside computer science, such as testing and verification.
Programs ought to be be amenable to prediction. We need to know, as much as possible, before a program runs, how it will behave. This behavior includes not only technical characteristics such as running time, space, power, and so on, but also social impacts, benefits, and harms. Programmers have been notoriously poor at thinking about the latter.
1.3 Our Perspective on Data
These concerns intersect with our belief about how computer science has evolved as a discipline. It is a truism that we live in a world awash with data, but what consequence does that have?
At a computational level, data have had a profound effect. Traditionally, the only way to make a program better was to improve the program directly, which often meant making it more complicated and impacting the values we discuss above. But there are classes of programs for which there is another method: simply give the same program more or better data, and the program can improve. These data-driven programs lie at the heart of many innovations we see around us.
In addition to this technical effect, data can have a profound pedagogic
impact, too. Most introductory programming is plagued by artificial data
that have no real meaning, interest, or consequence (and often, artificial
problems to accompany them). With real data, learners can personalize their
education, focusing on problems they find meaningful, enriching, or just plain
fun—
These phenomena have given rise to whole new areas of study, typically
called data science. However, typical data science curricula also have many
limitations. They pay little attention to what we know about the difficulties
of learning to program. They have little emphasis on software reliability. And
they fail to recognize that their data are often quite limited in their
structure. These limitations, where data science typically ends, are where
computer science begins. In particular, the structure of data serve as a point
of departure for thinking about and achieving some of the values above—
1.4 What Makes This Book Unique
data centric = data science + data structures
Second, computing education talks a great deal about
notional machines—
Third, we weave content on socially-responsible computing into the text. Unlike other efforts that focus on exposing students to ethics or the pitfalls of technology in general, we aim to show students how the constructs and concepts that they are turning into code right now can lead to adverse impacts unless used with care. In keeping with our focus on testing and concrete examples, we introduce several topics by getting students to think about assumptions at the level of concrete data. This material is called out explicitly throughout the book.
Finally, this book is deeply informed by recent and ongoing research results. Our choices of material, order of presentation, programming methods, and more are driven by what we know from the research literature. In many cases, we ourselves are the ones doing the research, so the curriculum and research live in a symbiotic relationship. You can find our papers (some with each other, others not) on our respective pages.
1.5 Who This Book is For
This book is written primarily for students who are in the early stages of
computing education at the tertiary level (college or university). However,
many—
1.6 The Structure of This Book
Unlike some other textbooks, this one does not follow a top-down narrative. Rather it has the flow of a conversation, with backtracking. We will often build up programs incrementally, just as a pair of programmers would. We will include mistakes, not because we don’t know better, but because this is the best way for you to learn. Including mistakes makes it impossible for you to read passively: you must instead engage with the material, because you can never be sure of the veracity of what you’re reading.
At the end, you’ll always get to the right answer. However, this non-linear path is more frustrating in the short term (you will often be tempted to say, “Just tell me the answer, already!”), and it makes the book a poor reference guide (you can’t open up to a random page and be sure what it says is correct). However, that feeling of frustration is the sensation of learning. We don’t know of a way around it.
We use visual formatting to highlight some of these points. Thus, in several places you will encounter this:
Exercise
This is an exercise. Do try it.
This is a traditional textbook exercise. It’s something you need to do on your own. If you’re using this book as part of a course, this may very well have been assigned as homework. In contrast, you will also find exercise-like questions that look like this:
Do Now!
There’s an activity here! Do you see it?
When you get to one of these, stop. Read, think, and formulate
an answer before you proceed. You must do this because this is
actually an exercise, but the answer is already in the
book—
Specific strategies for program design and development get highlighted in boxes that look like this:
Strategy: How to ...
here’s a summary of how to do something.
Finally, we also call out content on socially-responsible computing with visually distinctive regions like this:
Responsible Computing: Did you consider ...
Here are social pitfalls from using material naively.
1.7 Organization of the Material
This book contains four parts:
Foundations: A introduction to programming for beginners that teaches programming and rudimentary data analysis. It introduces core programming concepts through composing images and processing tables, before covering lists, trees, and writing reactive programs, all through a data-centric lens. The notional machine throughout this section is based on substitution.
Algorithms: Covers asymptotic complexity, recurrences, and fundamental graph algorithms.
Programming with State: Covers working with mutable variables and mutable structured data, building up to understanding (and working with) mutable lists and hashtables. This section transitions to Python. It extends testing to cover the nuances of programs with mutation. The notional machine in this section separate the naming environment (here called the directory) from a heap of structured data values.
Advanced Topics: Returns to algorithms topics that build on an understanding of state and stateful data structures.
An introductory course can use Foundations and Programming with State (without Algorithms) to cover the data-centric view of computer science and leaving students with basic skills in Python.
A more advanced course that assumes students already know some beginning functional programming (e.g., from the early parts of How to Design Programs) could start directly in Algorithms, perhaps with select sections of Foundations either to cover missing material (such as working with tables). This course could continue into Programming with State, followed by Advanced Topics.
These correspond, respectively, to CSCI 0111 and CSCI 0190 at Brown University. The course pages archive all prior instances of the courses, which include all the assignments and related materials. Readers are welcome to use these in their own courses.
Many of these courses will have entering students who have programmed with state before (in Python, Java, Scratch, or other languages). In our experience, most of these students have been given either vastly incomplete, or outright misleading, explanations of and metaphors for state (e.g., “a variable is a box”). Thus, they have a poor understanding of it beyond the absolute basics, especially when they get to important topics like aliasing. As a result, many of these students have found it both novel and insightful to properly understand how state really works through our notional machine. For that reason, we recommend going through that material slowly and carefully.
We of course invite readers to create their own mashups of the chapters within the sections. We would love to hear about others’ designs.
1.8 Our Programming Language Choice
If we wanted to get rich, we’d have written this book entirely in Python. As of this writing, Python is enjoying its instructional-use heyday (just like Java before it, C++ before that, C before that, Pascal earlier, and so on). And there are, indeed, many attractive aspects of Python, not least its presence next to bullet points on job listings. However, we’ve been repeatedly frustrated by Python as an entrypoint into learning programming.
As a result, this book features two programming languages. It starts with a language, called Pyret, that we designed to address our needs and frustrations. It has been expressly designed for the style of programming in this book, so the two can grow in harmony. It draws on Python, but also on many other excellent programming languages. Beginning programmers can therefore rest in the knowledge they are being cared for, while programmers with past acquaintance of the language menagerie, from serpents to dromedaries, should find Pyret familiar and comfortable.
Then, recognizing the value of Python both as a standard language of communication and for its extensive libraries, the Programming with State part of this book explicitly covers Python. Rather than starting from scratch in Python, we present a systematic and gradual transition to it from the earlier material. We believe this will make you learn general programming better than if you had seen only one programming language. However, we believe this will help you understand Python better, too: just like you learn to appreciate your own language, country, or culture better once you’ve stepped outside and been exposed to other ones.
Programming Tools: The programming environment for Pyret is called CPO (an abbreviation of code.pyret.org). It runs entirely in the browser and uses Google Drive for authentication and file storage. We do not recommend any particular Python environment. Any Python editor that allows you to use pytest and load external data files should work fine.
1.9 Sending Feedback, Errors, and Comments
As you work through the book, you may spot typos, notice points where we could have been clearer, or have a suggestion for a future release. You can pass these along to us by filing an issue on our public GitHub site. Thanks in advance!
2 Acknowledgments
This book has benefited from the attention of many.
Special thanks to the students at Brown University, who have been
drafted into acting as a crucible for every iteration of this
book. They have supported it with unusual grace, creating a welcoming
and rewarding environment for pedagogic effort. Thanks also to our
academic homes—
Abhabongse Janthong, Alex Hayworth, Alex Kleiman, Athyuttam Eleti, Benjamin S. Shapiro, Cheng Xie, Danil Braun, Dave Lee, Doug Kearns, Ebube Chuba, Evelyn Mitchell, frodokomodo (on github), Graeme McCutcheon, gregshubert (on github), Harrison Pincket, Igor Moreno Santos, Iuliu Balibanu, Jason Bennett, John (Spike) Hughes, Jon Sailor, Josh Paley, Kelechi Ukadike, Kendrick Cole, Kishore Vancheeshwaran, Marc Smith, Mehmet Fatih Köksal, Michael Morehouse, Noah Tye, Rafał Gwoździński, rawalplawit (on github), Raymond Plante, Samuel Ainsworth, Samuel Kortchmar.
Dorai Sitaram, John Palmer, Kartik Singhal, Kenichi Asai, Lev Litichevskiy.
Sorawee Porncharoenwase.
This book is completely dependent on Pyret, and thus on the many people who have created and sustained it.
We thank Matthew Butterick for his help with book styling (though the ultimate style is ours, so don’t blame him!).
Many, many years ago, Alejandro Schäffer introduced SK to the idea of nature as a fat-fingered typist. Alejandro’s fingerprints are over many parts of this book, even if he wouldn’t necessarily approve of what has come of his patient instruction.
We are deeply inspired by the work and ideas of Matthias Felleisen, Matthew Flatt, and Robby Findler. Matthias, in particular, inspired our ideas on program design. Even where we disagree, he continues to engage with and challenge our ideas in ways that force us to grow and improve. Our work is better than it would be in incalculable ways due to his influence.
The chapter on Interactive Games as Reactive Systems is translated from How to Design Worlds, and owes thanks to all the people acknowledged there.
This book is written in Scribble, the authoring tool of choice for the discerning programmer.
We thank cloudconvert for their free conversion tools.