Familiarity with a statically-typed functional language makes this post easier to follow. The examples are written in PureScript.
Some introductions to datatype-generic programming begin with technical details
like the Generic
typeclass.
That can be confusing if the premise isn’t clear.
This is an attempt to explain datatype-generic programming or “generics” in
terms of the big picture, addressing my own points of confusion.
What does “generic” mean?
The term “generic” might evoke associations with parametric polymorphism.
Types like forall a. a -> a
are sometimes said to be generic.
But datatype-generic programming is not a reference to parametric polymorphism.
They are only related as two varieties of polymorphism, two ways to generalize
about types.
Parametric and ad hoc are perhaps the most common varieties of polymorphism.
Parametric polymorphism expresses a universal generalization.
In the type forall a. a -> a
, the variable a
stands for any type.
Ad hoc polymorphism is a more limited generalization, where a
is any type that
satisfies a constraint.
Differences of generalization reflect differences of abstraction.
A function of any type a
abstracts over logic that does not need to know (and
cannot know) the type of a
.
The function is implemented once for all types.
Type constraints abstract over interfaces but not implementations.
The implementations are type-specific.
Ad hoc and parametric are not the only varieties of polymorphism. Some logic is concerned exclusively with the structure of a datatype. That logic is not universally polymorphic; it depends on some information about the type. However, the implementation need not be type-specific, since datatypes all share the same structural components. Neither ad hoc nor parametric, this is a third variety called structural polymorphism.1
What is structural polymorphism?
The beauty of all of these patterns of computation is the direct relationship between their shape and that of the data they manipulate…
—Jeremy Gibbons2
The show
function is often an example of latent structural polymorphism.
show
returns a string representation of a datatype.
Those strings often follow the same pattern: the name of the data constructor
followed by its arguments.
This logic, while not universally general, is not type-specific. It depends only on knowing the structure of the type; not the type itself.
Datatype-generic programming is an approach to writing functions, like show
,
that abstract over the structure of a datatype.
To do this, we need a “generic” representation of that structure.
A generic representation decouples information about the structure from the type
itself.
All types given a generic representation can share abstractions over structure.
What is a generic structural representation?
This is more approachable than it might sound. We already use generic language to describe datatypes. We say that:
- Datatypes have constructors
- Constructors have names
- Some constructors have one argument
- Some constructors have more than one argument
- Some constructors have no arguments
- Some datatypes have more than one constructor
With this limited and generic vocabulary, we can describe the structure of any datatype.
A toy example
We want to implement show
once for values of type Maybe String
and values
of type Tuple Number String
.
We begin with a generic representation for values of those types, a
generalization of their structure.
The structure of any value inhabiting either type can be expressed in terms of
this Rep
type.
Here are some examples:
Rep
enables us to write functions that are structurally polymorphic
for types Maybe String
and Tuple Number String
, and any other type described
in terms of the Rep
constructors.
Our purpose is to write a generic show function, so next we write a Show
instance for Rep
.
Now we can stringify any value given this generic representation.
But the question remains: how to write a show function that is polymorphic for
values of types Maybe String
and Tuple Number String
?
We’ve only implemented a monomorphic show function for Rep
.
The trick is to convert from the original types to Rep
, and then leverage the
Show
instance.
The Generic
class specifies a conversion interface.
This interface enables us to convert at the boundary of our generic functions without knowing what type we’re converting from.
And thus we have a structurally polymorphic implementation of show
.
This example illustrates the principle of datatype-generic programming. The approach taken by libraries is often a little different.
Datatype-generic programming libraries
A generics library has two essential components: a generic representation of a
datatype, and conversions to and from the data and the representation.
In the last example, we used Rep
for the representation.
But this is not the only, or even the most popular, approach.
A more common approach is to encode structure on the type level.
The type Rep
contains no structural information.
Structural information is only found on the value level, encoded by the Rep
constructors.
When structure is encoded on the type level, the type of a generic
representation describes the structure of the datatype it represents.
Here’s a bit of the last example to demonstrate this approach using PureScript’s generics-rep library.
Type-level encoding is achieved by giving each structural component its own type. Type constructors, like data constructors, are named for the corresponding structural components. Those type constructors are combined, just as the data constructors are combined, to describe the structure of a datatype.
Notice that the generic representations of Maybe String
and
Tuple Number String
now have different types.
Maybe
is represented as type Sum
and Tuple
as type Constructor
.
This changes two things: the Generic
class and the implementation of our
genericShow
function.
Our Generic
class assumed one representation type, Rep
.
Now there are several representation types.
Depending on the structure of the datatype, the type of from
could be
or it could be
from
must be a function from a
to any generic type.
So we must parameterize Generic
not only by the datatype but also by the
representation.
The notation a -> rep
states that there is only one type rep
for each type
a
.
Our implementation of genericShow
also expected one type, Rep
.
With multiple types, we can no longer use one implementation for all
representations.
We must implement genericShow
for each structural type we support.
Then a typeclass is used to wrap all implementations in one generic interface.
In the last example, the type of genericShow
was:
In this approach, that type is modified to include a GenericShow
constraint:
An instance of GenericShow
defines how to stringify a structural type.
NoConstructors
, Constructor
, and Product
all have instances of
GenericShow
.
While more than one, that is still far fewer than one implementation for each
datatype.
These are implementation details.
The strategy remains the same—convert from a datatype to its generic
representation and then pass the generic representation to the structurally
polymorphic genericShow
.
Generic programming is made convenient by the compiler
Datatype-generic programming, as a way of reducing boilerplate, would not succeed without help from the compiler. Generic representations are verbose and tedious to write by hand. Doing so would mean exchanging one kind of boilerplate for another. Fortunately, this is not required.
The Generic
class specifies a functional dependency between datatype and generic
representation.
This enables the compiler to infer the generic representation from the datatype.
Thus we can derive the Generic
instance without specifying the generic type
explicitly.
The boilerplate associated with generics is work we can delegate to the
compiler.
Generic programming is structural programming
All datatypes are formed of the same structural elements. Logic concerned with structure should not be specific to one datatype. The same logic applies to all datatypes because they all share the same structural components. Such logic is structurally polymorphic.
Structural polymorphism is achieved by giving a generic representation to the structure of a datatype. A generic representation describes structure in general terms like “constructor” and “argument”. Generic functions are implemented for any datatype given this generic representation. The result is a new level of expressivity.
Thanks to Vaibhav for reading a draft of this post.
-
The approach extends the familiar notion of parametric polymorphism by allowing the definition of functions which are generic with respect to data structures as well as to individual types. For example, structural polymorphism accommodates generalizations of the usual length and map functions which may be applied not only to lists, but also to trees, binary trees or similar algebraic structures. Under traditional polymorphic type systems, these functions may be defined for arbitrary component types, but must be (laboriously) re-defined for every distinct data structure.
-
Datatype-Generic Programming by Jeremy Gibbons, page 18. ↵