On the taxonomy of types

# Jan 6, 2021

Some time between 1992 and 1994 I wrote an article for MacTech Journal about protocol-oriented programming. The general idea was to explain how you can get most of the advantages of object-oriented programming, without some of the disadvantages, and without relying on the language or runtime support offered by object-oriented languages. I adopted an approach that emphasized protocols, rather than classes or prototypes.

Philip Wadler had described a somewhat similar approach in a paper titled “How to make ad-hoc polymorphism less ad hoc” in 1988, the paper that proposed the concept of typeclasses. Typeclasses were later folded into the Haskell programming language.

I didn’t know about his paper at the time. I was just barely aware of functional programming languages, and had just begun to fool around with Concurrent Clean and Standard ML. I was writing about practices that had emerged organically in my day-to-day work as an adaptation to working in a mixed-language programming environment, and didn’t see any connection to functional programming at the time.

A few years later I immersed myself in OCaml and then Haskell, and I ran into Haskell’s typeclasses. It was sort of like meeting a group of congenial strangers who bear an uncanny resemblance to your own family.

A few years later I was pining for the lost glory of early Dylan (back when it was still an old-fashioned Lisp), and I started working on interpreters and compilers meant to give me back some of my pleasure in programming. Dylan had evolved into a language that didn’t interest me.

By that time I was well aware of Haskell’s typeclasses, and had figured out that they were doing something analogous to what I was talking about in that old article.

I created a Lisp named “Bard” that was intended to be a very simple version of Dylan–that is to say, a subset of Scheme, but with a type system based on a subset of CLOS. I never build anything without messing around with its design, though, and this simple project almost immediately turned into a long-running series of experiments in language design and implementation. I wanted to get the goodness of early Dylan back again, but I also wanted to honor and explore some things I had learned from Haskell and the ML family, including typeclasses. And, of course, I just wanted to try some things out.

Along the way, I also learned Clojure and used it for a number of projects. I liked it without reservation at first, but gradually lost some of my enthusiasm as I got to know it better. Its design and its current implementations lack some affordances that I consider essential (mostly in the area of robust support for Programming as Teaching, but it did go onto my list of favored tools.

Clojure has a unique approach to ad hoc polymorphism. I wouldn’t say that I actually like it, but it did influence me to experiment with alternate approaches, myself. That influence, plus the influences of Haskell and of my own article from years before led me to write an experimental object system named “Categories”.

Categories is an object system that tries to separate the three concepts that I called out way back in my MacTech article: representation, behavior, and taxonomy.

Representation means the structure of data, in the sense of how parts are fitted together to form values.

Behavior means what you can do with values; that is, what operations they support.

Taxonomy means how values are related to other values. Is this value the same kind of thing as that one? Is it an extension of that one? Do they share some behavior? If a value is an instance of this, does that mean it’s also an instance of that?

Most programming languages tend to conflate these concepts to some degree, though they are logically distinct. I tried to separate them in the MacTech article, though it took a few years of trying before I arrived at a succinct expression of the idea.

Haskell’s typeclasses got it, though. They are pretty much exactly what I had in mind originally. They cleanly separate representation, behavior, and taxonomy into distinct concerns.

Haskell types handle representation. Typeclasses declare taxonomic relationships among types and the functions that operate on them, and function definitions on particular types–instances in Haskell parlance–define behavior.

In Categories I tried defining a dynamic-language object system that did something similar, but that also enabled you to dynamically choose the kind of type system you want. Categories defines representations as schemas (concrete data structures), behaviors as protocols (sets of polymorphic functions), and taxonomony using domains (data structures that represent dispatching strategies and the taxonomic relations used to implement them).

Categories worked, though the implementations were not particularly efficient. There isn’t anything to prevent further optimization, and there are some well-known strategies that could be used to make performance better, but the design of categories was just a little unaesthetic by my standards. The abstract definitions of the concepts intrude a little bit too much into the practical tools. To put it another way, it forces you to declare things that you probably don’t care about and don’t want to know when you’re working on something.

The last time I actively worked on Categories was about eight years ago. I sort of moved on, looking for ways to simplify it, to make it more concrete and less about its own internals, and to simplify Bard, the Lisp that I built in parallel with Categories.

I wrote a Common Lisp library based on Bard and Categories, named folio. In response to some feedback from users, I overhauled it to make it easier to use pieces of it without importing the whole thing, and to adopt a package-naming convention less likely to lead to name collisions. The current version, folio2, is presently my most popular open-source library, though I doubt that many of its users known anything about its heritage in Categories or Bard.

There is a folio3 in development, and a Bard 0.7, and both of them represent attempts to further simplify these concepts and make them more convenient, without losing the separation of representation, behavior, and taxonomy. The particular balance I’m grappling with right now is trying to find a middle ground between a clean separation of concepts and a convenient surface language for writing programs.

Let me give an example: in Bard 0.7, a class is a named collection of types. A type is either a class or a structure. A structure is a concrete description of how to construct a value.

In the current version, classes have no internal structure and no taxonomic relation to other types. They’re just named containers for groups of types that participate in polymorphic functions in a particular role.

An example might be a protocol function like:

(function add-first [Element List])

In the vocabulary of Bard 0,7, this expression declares a function named add-first that accepts two arguments of classes Element and List. From this declaration we know nothing about Element and List, except that in order to be an instance of one of them, there has to be some method on add-first that works on them.

For example, suppose there’s also a method definition like this:

(define method (add-first item list)
  where: {item: <small-integer>
          list: <vector>}
  ...)

This method definition means that I can call add-first with arguments of type <small-integer> and <vector>. It also means that <small-integer> is a member of class Element, and <vector> is a member of class List.

Classes, remember, are not data structures. They’re named collections of types that fulfill some defined role in some protocol. There’s some protocol that defines the function add-first, which accepts as inputs an Element and a List. By defining the above method, I establish that instances of <small-integer> are also instances of Element, and instances of <vector> are also instances of List.

This approach works, and is not entirely different from how Haskell handles typeclasses. I’m not quite satisfied, though, with how it works out in practice, especially in the context of an interactive programming environment.

For example, one of the best things about interactive programming is that you can start with what you know and build some naive structures and functions that work with that partial knowledge. You can use that partial construction to discover more details and refine your approach. You can then revise your program (as it runs!) to reflect your improved understanding.

The approach to types reflected in the above descrption works fine if you start with protocols and define all of their functions and the abstract classes they operate on. Then you just need to fill in the concrete types needed to make the classes into real data, and write the method definitions that specialize them properly.

It also works fine if you start out with a sketch of an idea and then, as you refine your knowledge, you rewrite your definitions and rebuild your whole program from the ground up with the new knowledge embodied in the changed definitions. That’s how Haskell tends to work.

But that’s pretty much the opposite of exploratory interactive programming, which is the singular strength of old-fashioned Lisp and Smalltalk environments. Those environments embody the way I prefer to work. I’m much more likely to want to start with the concrete values and methods that I understand and connect them together bottom-up, figuring out the proper relations as I go, building and changing the program as it runs. That means I need Bard to respond reasonably when I evaluate define method with no relevant classes defined yet. I need it to do the right thing when I discover a class and define it, fitting the appropriate structures into the new classes or, better yet, discovering potential classes for me and suggesting them.

It’s not a problem of the code working right. I have implementations in which the structures and methods and functions all do the right thing. It’s a matter of making it convenient and comfortable to derive the taxonomies as I discover them. I can of course go through my code and set up the right taxonomies and make all of the definitions reconcile, but that feels a lot like drudgery. I’d like it if the environment could infer the right taxonomic relationships for me (and optimize them when that turns out to be possible!) without my having to explicitly define the relations. I’m not sure it works, though. Not yet.

Inheritance is a possible solution. Currently, Bard does not include inheritance, because I’ve been trying to see how far I could get without it. But it does make it easier to figure out where in the taxonomic graph a new structure should fit. The tradeoff is that the traditional approaches to inheritance conflate representation with taxonomy in a way I don’t care for.

Inheritance is also handy for conveniences like defining a default method that gets run when the arguments to a function don’t match any more specific case, or for when you want to do the same thing in a lot of different cases without repeating yourself. There are other ways to do those things, but inheritance is a pretty clean way to handle them.

I’m still thinking things over. One of Categories' domains implemented multiple inheritance and it worked fine, so I can fall back on that, if I decide I need to.

Maybe Bard will end up with inheritance. Maybe not.

engine of joy

On the taxonomy of types