On this page:
17.6.1 Nature of the Data
17.6.2 Nature of the Operations
17.6.3 Nature of the Guarantee

17.6 Sets as a Case Study

    17.6.1 Nature of the Data

    17.6.2 Nature of the Operations

    17.6.3 Nature of the Guarantee

We have spent a lot of time on sets. That is not only because they are useful in their own right, but also because they offer a window into a variety of possible designs. In particular, they illustrate several tradeoffs that we can make in the design of data structures, based on our needs.

There are several dimensions along which we can divide our designs.

17.6.1 Nature of the Data

If the data cannot even be comparable for quality, then we can’t construct sets out of them, because equality is central to the definition of a set.

If the data can be compared for equality but not for ordering, then we can only construct list-sets [Representing Sets as Lists], with their linear-time complexity. However, if we can hash the values [Converting Values to Ordered Values], then we can construct trees [Making Sets Grow on Trees] and hashtables [Sets from Hashing and Arrays]. Trees give us logarithmic complexity for the most expensive atomic operations, while hashtables give us constant-to-linear complexity.

17.6.2 Nature of the Operations

Another dimension of variation is the collection of operations we need. We began with a fairly ambitious, but standard, collection of operations [<set-operations>], but gradually ignored many of them. In particular, some interpretations of sets, like Union-Find, achieve excellent complexity at the cost of most of these operations. Bloom Filters provide another instance of this. There is a general computer science principle at work here: the fewer operations we need to support, the better we can (sometimes) make the complexity of the remaining ones.

17.6.3 Nature of the Guarantee

Most subtly, there was another distinction: whether or not we needed reliable results. Most of our set representations are reliable. However, we also saw one situation [Bloom Filters] where we intentionally abandoned complete reliability, replacing it with a statistical guarantee. In return, this gave us (potentially) much higher performance.

Thus, sets provide a useful microcosm of computer science itself.