Familiarity with a statically-typed functional language makes this post easier to follow. The examples are written in PureScript.
Some introductions to datatype-generic programming begin with technical details
That can be confusing if the premise isn’t clear.
This is an attempt to explain datatype-generic programming or “generics” in
terms of the big picture, addressing my own points of confusion.
What does “generic” mean?
The term “generic” might evoke associations with parametric polymorphism.
forall a. a -> a are sometimes said to be generic.
But datatype-generic programming is not a reference to parametric polymorphism.
They are only related as two varieties of polymorphism, two ways to generalize
Parametric and ad hoc are perhaps the most common varieties of polymorphism.
Parametric polymorphism expresses a universal generalization.
In the type
forall a. a -> a, the variable
a stands for any type.
Ad hoc polymorphism is a more limited generalization, where
a is any type that
satisfies a constraint.
Differences of generalization reflect differences of abstraction.
A function of any type
a abstracts over logic that does not need to know (and
cannot know) the type of
The function is implemented once for all types.
Type constraints abstract over interfaces but not implementations.
The implementations are type-specific.
Ad hoc and parametric are not the only varieties of polymorphism. Some logic is concerned exclusively with the structure of a datatype. That logic is not universally polymorphic; it depends on some information about the type. However, the implementation need not be type-specific, since datatypes all share the same structural components. Neither ad hoc nor parametric, this is a third variety called structural polymorphism.1
What is structural polymorphism?
The beauty of all of these patterns of computation is the direct relationship between their shape and that of the data they manipulate…—Jeremy Gibbons2
show function is often an example of latent structural polymorphism.
show returns a string representation of a datatype.
Those strings often follow the same pattern: the name of the data constructor
followed by its arguments.
This logic, while not universally general, is not type-specific. It depends only on knowing the structure of the type; not the type itself.
Datatype-generic programming is an approach to writing functions, like
that abstract over the structure of a datatype.
To do this, we need a “generic” representation of that structure.
A generic representation decouples information about the structure from the type
All types given a generic representation can share abstractions over structure.
What is a generic structural representation?
This is more approachable than it might sound. We already use generic language to describe datatypes. We say that:
- Datatypes have constructors
- Constructors have names
- Some constructors have one argument
- Some constructors have more than one argument
- Some constructors have no arguments
- Some datatypes have more than one constructor
With this limited and generic vocabulary, we can describe the structure of any datatype.
A toy example
We want to implement
show once for values of type
Maybe String and values
Tuple Number String.
We begin with a generic representation for values of those types, a
generalization of their structure.
The structure of any value inhabiting either type can be expressed in terms of
Here are some examples:
Rep enables us to write functions that are structurally polymorphic
Maybe String and
Tuple Number String, and any other type described
in terms of the
Our purpose is to write a generic show function, so next we write a
Now we can stringify any value given this generic representation.
But the question remains: how to write a show function that is polymorphic for
values of types
Maybe String and
Tuple Number String?
We’ve only implemented a monomorphic show function for
The trick is to convert from the original types to
Rep, and then leverage the
Generic class specifies a conversion interface.
This interface enables us to convert at the boundary of our generic functions without knowing what type we’re converting from.
And thus we have a structurally polymorphic implementation of
This example illustrates the principle of datatype-generic programming. The approach taken by libraries is often a little different.
Datatype-generic programming libraries
A generics library has two essential components: a generic representation of a
datatype, and conversions to and from the data and the representation.
In the last example, we used
Rep for the representation.
But this is not the only, or even the most popular, approach.
A more common approach is to encode structure on the type level.
Rep contains no structural information.
Structural information is only found on the value level, encoded by the
When structure is encoded on the type level, the type of a generic
representation describes the structure of the datatype it represents.
Here’s a bit of the last example to demonstrate this approach using PureScript’s generics-rep library.
Type-level encoding is achieved by giving each structural component its own type. Type constructors, like data constructors, are named for the corresponding structural components. Those type constructors are combined, just as the data constructors are combined, to describe the structure of a datatype.
Notice that the generic representations of
Maybe String and
Tuple Number String now have different types.
Maybe is represented as type
Tuple as type
This changes two things: the
Generic class and the implementation of our
Generic class assumed one representation type,
Now there are several representation types.
Depending on the structure of the datatype, the type of
from could be
or it could be
from must be a function from
a to any generic type.
So we must parameterize
Generic not only by the datatype but also by the
a -> rep states that there is only one type
rep for each type
Our implementation of
genericShow also expected one type,
With multiple types, we can no longer use one implementation for all
We must implement
genericShow for each structural type we support.
Then a typeclass is used to wrap all implementations in one generic interface.
In the last example, the type of
In this approach, that type is modified to include a
An instance of
GenericShow defines how to stringify a structural type.
Product all have instances of
While more than one, that is still far fewer than one implementation for each
These are implementation details.
The strategy remains the same—convert from a datatype to its generic
representation and then pass the generic representation to the structurally
Generic programming is made convenient by the compiler
Datatype-generic programming, as a way of reducing boilerplate, would not succeed without help from the compiler. Generic representations are verbose and tedious to write by hand. Doing so would mean exchanging one kind of boilerplate for another. Fortunately, this is not required.
class specifies a functional dependency between datatype and generic
This enables the compiler to infer the generic representation from the datatype.
Thus we can derive the
Generic instance without specifying the generic type
The boilerplate associated with generics is work we can delegate to the
Generic programming is structural programming
All datatypes are formed of the same structural elements. Logic concerned with structure should not be specific to one datatype. The same logic applies to all datatypes because they all share the same structural components. Such logic is structurally polymorphic.
Structural polymorphism is achieved by giving a generic representation to the structure of a datatype. A generic representation describes structure in general terms like “constructor” and “argument”. Generic functions are implemented for any datatype given this generic representation. The result is a new level of expressivity.
Thanks to Vaibhav for reading a draft of this post.
The approach extends the familiar notion of parametric polymorphism by allowing the definition of functions which are generic with respect to data structures as well as to individual types. For example, structural polymorphism accommodates generalizations of the usual length and map functions which may be applied not only to lists, but also to trees, binary trees or similar algebraic structures. Under traditional polymorphic type systems, these functions may be defined for arbitrary component types, but must be (laboriously) re-defined for every distinct data structure.