Typesafe builders with GADTs
15 Feb 2022I wrote something a while back about typesafe builders (builders which let you add information in any order you like, but fail to compile if you try and build the result without having added all the required information). The implementation was in Java, so I used subtyping to represent types that do include information, don’t include information, or are indifferent to whether they include information. But I’ve been wondering, how would you implement this in a language without subtyping? It turns out one answer is generalised abstract data types (GADTs).
Returning to the example I used in the previous post, some kind of API client where you supply various configuration options before creating the client. In haskell, you might use it something like:
let client = defaultConfiguration &
withRegion "us-east-1" &
withCredentials (Credentials {token = "username", secret = "password "}) &
build
One option is to use a Maybe
type to represent the presence or absence of a
configuration value, e.g.:
data ClientConfiguration = ClientConfiguration
{ confRegion :: Maybe String,
confCredentials :: Maybe Credentials
}
defaultConfiguration :: ClientConfiguration
defaultConfiguration = ClientConfiguration Nothing Nothing
withRegion :: String -> ClientConfiguration -> ClientConfiguration
withRegion region configuration = configuration {confRegion = Just region}
However, in this case you don’t know at compile time whether a value has been
specified, so your build
function has to take into account that it might not
have all the information it needs, and so would fail at compile time
build :: ClientConfiguration -> Maybe Client
build configuration = case configuration of
ClientConfiguration (Just region) (Just credentials) ->
Just (createClient region credentials)
_ -> Nothing
What we want instead is for the build
function to fail to compile if we
haven’t added the required configuration values. Instead of a Maybe
type, one
type which expresses the fact that we don’t know whether we have a value or not,
we want there to be a difference at the type level depending on whether we have
a value or not. One way to do this is:
data PresentT
data AbsentT
data Option p t = Present t | Absent
This is known as a “phantom type”. The type variable p
doesn’t appear in the
body of the type definition, so a value of type Option
doesn’t actually
contain a value of type p
; instead, we fill in p
with either PresentT
or
AbsentT
, to make a type level difference between Option
s that have a value,
or don’t. We would use it like:
data ClientConfiguration regionPresence credentialsPresence =
ClientConfiguration {
confRegion :: Option regionPresence String,
confCredentials :: Option credentialsPresence Credentials
}
Here, regionPresence
and credentialsPresence
are type variables we fill in
with PresentT
or AbsentT
to denote the presence or absence of the value in
the configuration:
defaultConfiguration :: ClientConfiguration AbsentT AbsentT
defaultConfiguration = ClientConfiguration Absent Absent
withRegion :: String ->
ClientConfiguration p credentialsPresence ->
ClientConfiguration PresentT credentialsPresence
withRegion region configuration = configuration {confRegion = Present region}
Here defaultConfiguration
has no values, so fills in both variables with
AbsentT
. withRegion
adds the region, so fills in regionPresence
with
PresentT
, while leaving credentialsPresence
unchanged (if we’ve already
added credentials, it will be PresentT
; if not, AbsentT
).
Then, we can make the build
function only accept ClientConfiguration
s where
both type variables are PresentT
.
build :: ClientConfiguration PresentT PresentT -> Client
build (ClientConfiguration (Present region) (Present credentials)) =
createClient region credentials
Now, build defaultConfiguration
will fail at compile time, because
defaultConfiguration
has type ClientConfiguration AbsentT AbsentT
, and build
needs a ClientConfiguration PresentT PresentT
.
This fulfills our requirements, but there are some problems with the Option
type. Although we’ve been filling in the p
type variable with PresentT
and
AbsentT
, nothing requires us to use only these two types. We could use any
type, giving meaningless values like
meaningless :: Option Int String
meaningless = Present "This is meaningless"
This is a bit confusing, but maybe not terrible; as build
requires,
specifically, a ClientConfiguration PresentT PresentT
, we could treat anything
other than PresentT
as meaning the same as AbsentT
. However, we can also
create contradictory instances of Option
, which is more of a problem:
contradictory :: Option PresentT String
contradictory = Absent
We’ve made an Option
which, at the type level, purports to contain a value,
but at the value level, does not. So we could have a call to build
which type
checks, but which would fail at run time. One way around this is to add “smart
constructors” - functions which will only allow us to create meaningful
instances of Option
:
present :: t -> Option PresentT t
present x = Present x
absent :: Option AbsentT t
absent = Absent
Then, if we put Option
in a module and only export the smart constructors, and
not the regular constructors, we can prevent users of Option
from creating
meaningless or contradictory instances.
There’s one more problem, though. Looking again at the definition of build
:
build :: ClientConfiguration PresentT PresentT -> Client
build (ClientConfiguration (Present region) (Present credentials)) =
createClient region credentials
There’s a pattern match in the function definition here, and when we compile it,
haskell will warn us that the pattern match is missing cases for
ClientConfiguration
s with Absent
values. We know that the only way to get an
Option PresentT t
is to call the present
smart constructor, so we know that
Present
is the only possible case for an Option PresentT t
. But the compiler
doesn’t know that; it’s still worrying about the contradictory
example above.
However, we can tell the compiler this that the smart constructors are the only
constructors for Option
, by changing the definition to
data Option p t where
Present :: t -> Option PresentT t
Absent :: Option AbsentT t
Now the compiler knows that, if the Present
constructor is used, the result
will have type Option PresentT t
; so it’s impossible for the Present
constructor to match an Option AbsentT t
. And, the Absent
constructor will
never match an Option PresentT t
. The compiler now knows what we know: that
Option PresentT t
must be Present
, and Option AbsentT t
must be Absent
.
These types, with constrained constructors so that only certain combinations of type variables are possible, are called generalised abstract data types. They allow us to give the compiler more information about the space of possible values in our program, and so allow us to express more complicated situations, type safely.