On this page:
A Data-Centric Introduction to Computing
Tuesday, February 21st, 2023 4:43:42pm

A Data-Centric Introduction to Computing

Kathi Fisler, Shriram Krishnamurthi, Benjamin S. Lerner, Joe Gibbs Politz

    I Introduction

      1 Overview

        1.1 What This Book is About

        1.2 The Values That Drive This Book

        1.3 Our Perspective on Data

        1.4 What Makes This Book Unique

        1.5 Who This Book is For

        1.6 The Structure of This Book

        1.7 Organization of the Material

        1.8 Our Programming Language Choice

        1.9 Sending Us Feedback, Errors, and Comments

        1.10 Staying Up-To-Date

      2 Acknowledgments

    II Introduction to Programming

      3 Basic Data

        3.1 Getting Started

          3.1.1 Motivating Example: Flags

          3.1.2 Numbers

          3.1.3 Expressions

          3.1.4 Terminology

          3.1.5 Strings

          3.1.6 Images

            3.1.6.1 Combining Images

            3.1.6.2 Making a Flag

          3.1.7 Stepping Back: Types, Errors, and Documentation

            3.1.7.1 Types and Contracts

            3.1.7.2 Format and Notation Errors

            3.1.7.3 Finding Other Functions: Documentation

        3.2 Naming Values

          3.2.1 The Definitions Pane

          3.2.2 Naming Values

            3.2.2.1 Names Versus Strings

            3.2.2.2 Expressions versus Statements

          3.2.3 The Program Directory

            3.2.3.1 Understanding the Run Button

          3.2.4 Using Names to Streamline Building Images

        3.3 From Repeated Expressions to Functions

          3.3.1 Example: Similar Flags

          3.3.2 Defining Functions

            3.3.2.1 How Functions Evaluate

            3.3.2.2 Type Annotations

            3.3.2.3 Documentation

          3.3.3 Functions Practice: Moon Weight

          3.3.4 Documenting Functions with Examples

          3.3.5 Functions Practice: Cost of pens

          3.3.6 Recap: Defining Functions

        3.4 Conditionals and Booleans

          3.4.1 Motivating Example: Shipping Costs

          3.4.2 Conditionals: Computations with Decisions

          3.4.3 Booleans

            3.4.3.1 Other Boolean Operations

            3.4.3.2 Combining Booleans

          3.4.4 Asking Multiple Questions

          3.4.5 Evaluating by Reducing Expressions

          3.4.6 Composing Functions

            3.4.6.1 How Function Compositions Evaluate

            3.4.6.2 Function Composition and the Directory

          3.4.7 Nested Conditionals

          3.4.8 Recap: Booleans and Conditionals

      4 Tabular Data

        4.1 Introduction to Tabular Data

          4.1.1 Creating Tabular Data

          4.1.2 Extracting Rows and Cell Values

          4.1.3 Functions over Rows

          4.1.4 Processing Rows

            4.1.4.1 Finding Rows

            4.1.4.2 Ordering Rows

            4.1.4.3 Adding New Columns

            4.1.4.4 Calculating New Column Values

          4.1.5 Examples for Table-Producing Functions

        4.2 Processing Tables

          4.2.1 Cleaning Data Tables

            4.2.1.1 Loading Data Tables

            4.2.1.2 Dealing with Missing Entries

            4.2.1.3 Normalizing Data

            4.2.1.4 Normalization, Systematically

              4.2.1.4.1 Using Programs to Detect Data Errors

          4.2.2 Task Plans

          4.2.3 Preparing Data Tables

            4.2.3.1 Creating bins

            4.2.3.2 Splitting Columns

          4.2.4 Managing and Naming Data Tables

          4.2.5 Visualizations and Plots

          4.2.6 Summary: Managing a Data Analysis

      5 Lists

        5.1 From Tables to Lists

          5.1.1 Basic Statistical Questions

          5.1.2 Extracting a Column from a Table

          5.1.3 Understanding Lists

            5.1.3.1 Lists as Anonymous Data

            5.1.3.2 Creating Literal Lists

          5.1.4 Operating on Lists

            5.1.4.1 Built-In Operations on Lists of Numbers

            5.1.4.2 Built-In Operations on Lists in General

            5.1.4.3 An Aside on Naming Conventions

            5.1.4.4 Getting Elements By Position

            5.1.4.5 Transforming Lists

            5.1.4.6 Recap: Summary of List Operations

          5.1.5 Lambda: Anonymous Functions

          5.1.6 Combining Lists and Tables

        5.2 Processing Lists

          5.2.1 Making Lists and Taking Them Apart

          5.2.2 Some Example Exercises

          5.2.3 Structural Problems with Scalar Answers

            5.2.3.1 my-len: Examples

            5.2.3.2 my-sum: Examples

            5.2.3.3 From Examples to Code

          5.2.4 Structural Problems that Transform Lists

            5.2.4.1 my-doubles: Examples and Code

            5.2.4.2 my-str-len: Examples and Code

          5.2.5 Structural Problems that Select from Lists

            5.2.5.1 my-pos-nums: Examples and Code

            5.2.5.2 my-alternating: Examples and Code

          5.2.6 Structural Problems Over Relaxed Domains

            5.2.6.1 my-max: Examples

            5.2.6.2 my-max: From Examples to Code

          5.2.7 More Structural Problems with Scalar Answers

            5.2.7.1 my-avg: Examples

          5.2.8 Structural Problems with Accumulators

            5.2.8.1 my-running-sum: First Attempt

            5.2.8.2 my-running-sum: Examples and Code

            5.2.8.3 my-alternating: Examples and Code

          5.2.9 Dealing with Multiple Answers

            5.2.9.1 uniq: Problem Setup

            5.2.9.2 uniq: Examples

            5.2.9.3 uniq: Code

            5.2.9.4 uniq: Reducing Computation

            5.2.9.5 uniq: Example and Code Variations

            5.2.9.6 uniq: Why Produce a List?

          5.2.10 Monomorphic Lists and Polymorphic Types

        5.3 Recursive Data

          5.3.1 Functions to Process Recursive Data

          5.3.2 A Template for Processing Recursive Data

      6 Structured Data

        6.1 Introduction to Structured Data

          6.1.1 Understanding the Kinds of Compound Data

            6.1.1.1 A First Peek at Structured Data

            6.1.1.2 A First Peek at Conditional Data

          6.1.2 Defining and Creating Structured and Conditional Data

            6.1.2.1 Defining and Creating Structured Data

            6.1.2.2 Annotations for Structured Data

            6.1.2.3 Defining and Creating Conditional Data

          6.1.3 Programming with Structured and Conditional Data

            6.1.3.1 Extracting Fields from Structured Data

            6.1.3.2 Telling Apart Variants of Conditional Data

            6.1.3.3 Processing Fields of Variants

        6.2 Collections of Structured Data

          6.2.1 Lists as Collective Data

          6.2.2 Sets as Collective Data

            6.2.2.1 Picking Elements from Sets

            6.2.2.2 Computing with Sets

          6.2.3 Combining Structured and Collective Data

          6.2.4 Data Design Problem: Representing Quizzes

      7 Trees

        7.1 Trees

          7.1.1 Data Design Problem – Ancestry Data

            7.1.1.1 Computing Genetic Parents from an Ancestry Table

            7.1.1.2 Computing Grandparents from an Ancestry Table

            7.1.1.3 Creating a Datatype for Ancestor Trees

          7.1.2 Programs to Process Ancestor Trees

          7.1.3 Summarizing How to Approach Tree Problems

          7.1.4 Study Questions

      8 Foundations: Bonus Materials

        8.1 Functions as Data

          8.1.1 A Little Calculus

          8.1.2 A Helpful Shorthand for Anonymous Functions

          8.1.3 Streams From Functions

          8.1.4 Combining Forces: Streams of Derivatives

        8.2 Queues from Lists

          8.2.1 Using a Wrapper Datatype

          8.2.2 Combining Answers

          8.2.3 Using a Picker

          8.2.4 Using Tuples

          8.2.5 A Picker Method

        8.3 Examples, Testing, and Program Checking

          8.3.1 From Examples to Tests

          8.3.2 More Refined Comparisons

          8.3.3 When Tests Fail

          8.3.4 Oracles for Testing

    III From Pyret to Python

      9 From Pyret to Python

        9.1 From Pyret to Python

          9.1.1 Expressions, Functions, and Types

          9.1.2 Returning Values from Functions

          9.1.3 Examples and Test Cases

          9.1.4 An Aside on Numbers

          9.1.5 Conditionals

          9.1.6 Creating and Processing Lists

            9.1.6.1 Filters, Maps, and Friends

          9.1.7 Data with Components

            9.1.7.1 Accessing Fields within Dataclasses

          9.1.8 Traversing Lists

            9.1.8.1 Introducing For Loops

              9.1.8.1.1 An Aside on Order of Processing List Elements

            9.1.8.2 Using For Loops in Functions that Produce Lists

            9.1.8.3 Summary: The List-Processing Template for Python

        9.2 Dictionaries

          9.2.1 Creating and Using a Dictionary

          9.2.2 Searching Through the Values in a Dictionary

          9.2.3 Dictionaries with More Complex Values

          9.2.4 Dictionaries versus Dataclasses

            Summary

        9.3 Arrays

          9.3.1 Two Memory Layouts for Ordered Items

          9.3.2 Iterating Partly through an Ordered Datum

      10 Tables in Python via Pandas

        10.1 Introduction to Pandas

          10.1.1 Pandas Table Basics

            10.1.1.1 Core Datatypes: DataFrame and Series

            10.1.1.2 Creating and Loading DataFrames

            10.1.1.3 Using Labels and Indices to Access Cells

          10.1.2 Filtering Rows

          10.1.3 Cleaning and Normalizing Data

            10.1.3.1 Clearing out unknown values

            10.1.3.2 Repairing Values and Column Types

          10.1.4 Computing New Columns

          10.1.5 Aggregating and Grouping Columns

          10.1.6 Wide Versus Tall Data

            Converting Between Wide and Tall Data

          10.1.7 Plotting Data

          10.1.8 Takeaways

        10.2 Reshaping Tables

          10.2.1 Binning Rows

          10.2.2 Wide versus Tall Datasets

    IV Programming With State

      11 Programming with State (in Both Pyret and Python)

        11.1 State, Change, and Testing

          11.1.1 Example: Bank Accounts

          11.1.2 Testing Functions that Mutate Data

          11.1.3 Aliasing

          11.1.4 Data Mutation and the Directory

            11.1.4.1 Introducing the Heap

            11.1.4.2 Basic Data and the Heap

        11.2 Understanding Equality

          11.2.1 Equality of Data

          11.2.2 Different Equality Operations

            11.2.2.1 Equality in Python

            11.2.2.2 Equality in Pyret

        11.3 Arrays and Lists in Memory

        11.4 Cyclic Data

          11.4.1 Creating Cyclic Data

          11.4.2 Testing Cyclic Data

          11.4.3 Cycles in Practice

      12 More Programming with State: Python

        12.1 Modifying Variables

          12.1.1 Modifying Variables in Memory

          12.1.2 Variable Updates and Aliasing

          12.1.3 Updating Variables versus Updating Data Fields

          12.1.4 Updating Parameters in Function Calls

          12.1.5 Updating Top-Level Variables within Function Calls

          12.1.6 The Many Roles of Variables

        12.2 Mutable Lists

    V Algorithm Analysis

      13 Predicting Growth

        13.1 A Little (True) Story

        13.2 The Analytical Idea

        13.3 A Cost Model for Pyret Running Time

        13.4 The Size of the Input

        13.5 The Tabular Method for Singly-Structurally-Recursive Functions

        13.6 Creating Recurrences

        13.7 A Notation for Functions

        13.8 Comparing Functions

        13.9 Combining Big-Oh Without Woe

        13.10 Solving Recurrences

      14 Halloween Analysis

        14.1 A First Example

        14.2 The New Form of Analysis

        14.3 An Example: Queues from Lists

          14.3.1 List Representations

          14.3.2 A First Analysis

          14.3.3 More Liberal Sequences of Operations

          14.3.4 A Second Analysis

          14.3.5 Amortization Versus Individual Operations

        14.4 Reading More

    VI Data Structures with Analysis

      15 Directed Acyclic Graphs

        15.1 Sharing and Equality

          15.1.1 Re-Examining Equality

          15.1.2 The Cost of Evaluating References

          15.1.3 Notations for Equality

          15.1.4 On the Internet, Nobody Knows You’re a DAG

          15.1.5 It’s Always Been a DAG

          15.1.6 From Acyclicity to Cycles

        15.2 The Size of a DAG

          15.2.1 Stage 1

          15.2.2 Stage 2

          15.2.3 Stage 3

          15.2.4 Stage 4

          15.2.5 Stage 5

          15.2.6 What We’ve Learned

          15.2.7 More on Value Printing: An Aside from Racket

      16 Graphs

        16.1 Introducing Graphs

          16.1.1 Understanding Graphs

          16.1.2 Representations

            16.1.2.1 Links by Name

            16.1.2.2 Links by Indices

            16.1.2.3 A List of Edges

            16.1.2.4 Abstracting Representations

          16.1.3 Measuring Complexity for Graphs

        16.2 Basic Graph Traversals

          16.2.1 Reachability

            16.2.1.1 Simple Recursion

            16.2.1.2 Cleaning up the Loop

            16.2.1.3 Traversal with Memory

            16.2.1.4 A Better Interface

          16.2.2 Depth- and Breadth-First Traversals

        16.3 Graphs With Weighted Edges

        16.4 Shortest (or Lightest) Paths

        16.5 Moravian Spanning Trees

          16.5.1 The Problem

          16.5.2 A Greedy Solution

          16.5.3 Another Greedy Solution

          16.5.4 A Third Solution

          16.5.5 Checking Component Connectedness

      17 Several Variations on Sets

        17.1 Representing Sets as Lists

          17.1.1 Representation Choices

          17.1.2 Time Complexity

          17.1.3 Choosing Between Representations

          17.1.4 Other Operations

        17.2 Making Sets Grow on Trees

          17.2.1 Using Binary Trees

          17.2.2 Checking the Complexity

          17.2.3 A Fine Balance: Tree Surgery

            17.2.3.1 Left-Left Case

            17.2.3.2 Left-Right Case

            17.2.3.3 Any Other Cases?

        17.3 Union-Find

          17.3.1 Implementing with State

          17.3.2 Optimizations

          17.3.3 Analysis

        17.4 Hashes, Sets, and Key-Values

          17.4.1 A Hash Function for Strings

          17.4.2 Sets from Hashing

          17.4.3 Arrays

          17.4.4 Sets from Hashing and Arrays

          17.4.5 Collisions

          17.4.6 Resolving Collisions

          17.4.7 Complexity

          17.4.8 Bloom Filters

          17.4.9 Generalizing from Sets to Key-Values

        17.5 Equality, Ordering, and Hashing

          17.5.1 Converting Values to Ordered Values

          17.5.2 Hashing in Practice

          17.5.3 Equality and Ordering

        17.6 Sets as a Case Study

          17.6.1 Nature of the Data

          17.6.2 Nature of the Operations

          17.6.3 Nature of the Guarantee

    VII Advanced Topics

      18 State and Equality

        18.1 Boxes: A Canonical Mutable Structure

        18.2 Mutation and Types

        18.3 Mutation and Equality

        18.4 Another Equality Predicate

        18.5 A Hierarchy of Equality

        18.6 Space and Time Complexity

        18.7 What it Means to be Identical

        18.8 Comparing Functions

      19 Recursion and Cycles from Mutation

        19.1 Partial Definitions

        19.2 Recursive Functions

        19.3 Premature Evaluation

        19.4 Cyclic Lists Versus Streams

      20 Detecting Cycles

        20.1 A Running Example

        20.2 Types

        20.3 A First Checker

        20.4 Complexity

        20.5 A Fabulous Improvement

        20.6 Testing

      21 Avoiding Recomputation by Remembering Answers

        21.1 An Interesting Numeric Sequence

          21.1.1 Using State to Remember Past Answers

          21.1.2 From a Tree of Computation to a DAG

          21.1.3 The Complexity of Numbers

          21.1.4 Abstracting Memoization

        21.2 Edit-Distance for Spelling Correction

        21.3 Nature as a Fat-Fingered Typist

        21.4 Dynamic Programming

          21.4.1 Catalan Numbers with Dynamic Programming

          21.4.2 Levenshtein Distance and Dynamic Programming

        21.5 Contrasting Memoization and Dynamic Programming

      22 Partial Domains

        22.1 A Non-Solution

        22.2 Exceptions

        22.3 The Option Type

        22.4 Total Domains, Dynamically

        22.5 Total Domains, Statically

        22.6 Summary

        22.7 A Note on Notation

      23 Staging

        23.1 Problem Definition

        23.2 Initial Solution

        23.3 Refactoring

        23.4 Separating Parameters

        23.5 Context

      24 Factoring Numbers

      25 Deconstructing Loops

        25.1 Setup: Two Functions

        25.2 Abstracting a Loop

        25.3 Is It Really a Loop?

        25.4 Re-Examining for

        25.5 Rewriting Pollard-Rho

        25.6 Nested Loops

        25.7 Loops, Values, and Customization

    VIII Interactive Programs

      26 Interactive Games as Reactive Systems

        26.1 About Reactive Animations

        26.2 Preliminaries

        26.3 Version: Airplane Moving Across the Screen

          26.3.1 Updating the World State

          26.3.2 Displaying the World State

          26.3.3 Observing Time (and Combining the Pieces)

        26.4 Version: Wrapping Around

        26.5 Version: Descending

          26.5.1 Moving the Airplane

          26.5.2 Drawing the Scene

          26.5.3 Finishing Touches

        26.6 Version: Responding to Keystrokes

        26.7 Version: Landing

        26.8 Version: A Fixed Balloon

        26.9 Version: Keep Your Eye on the Tank

        26.10 Version: The Balloon Moves, Too

        26.11 Version: One, Two, ..., Ninety-Nine Luftballons!

    IX Appendices

      27 Appendices

        27.1 Pyret for Racketeers and Schemers

          27.1.1 Numbers, Strings, and Booleans

          27.1.2 Infix Expressions

          27.1.3 Function Definition and Application

          27.1.4 Tests

          27.1.5 Variable Names

          27.1.6 Data Definitions

          27.1.7 Conditionals

          27.1.8 Lists

          27.1.9 First-Class Functions

          27.1.10 Annotations

          27.1.11 What Else?

        27.2 Pyret vs. Python

        27.3 Comparing This Book to HtDP

        27.4 Release Notes

        27.5 Glossary