Typesafe builders with GADTs

I wrote something a while back about typesafe builders (builders which let you add information in any order you like, but fail to compile if you try and build the result without having added all the required information). The implementation was in Java, so I used subtyping to represent types that do include information, don’t include information, or are indifferent to whether they include information. But I’ve been wondering, how would you implement this in a language without subtyping? It turns out one answer is generalised abstract data types (GADTs).

Returning to the example I used in the previous post, some kind of API client where you supply various configuration options before creating the client. In haskell, you might use it something like:

let client = defaultConfiguration & 
      withRegion "us-east-1" & 
      withCredentials (Credentials {token = "username", secret = "password "}) &
      build

One option is to use a Maybe type to represent the presence or absence of a configuration value, e.g.:

data ClientConfiguration = ClientConfiguration
  { confRegion :: Maybe String,
    confCredentials :: Maybe Credentials
  }

defaultConfiguration :: ClientConfiguration
defaultConfiguration = ClientConfiguration Nothing Nothing

withRegion :: String -> ClientConfiguration -> ClientConfiguration
withRegion region configuration = configuration {confRegion = Just region}

However, in this case you don’t know at compile time whether a value has been specified, so your build function has to take into account that it might not have all the information it needs, and so would fail at compile time

build :: ClientConfiguration -> Maybe Client
build configuration = case configuration of
  ClientConfiguration (Just region) (Just credentials) -> 
      Just (createClient region credentials)
  _ -> Nothing

What we want instead is for the build function to fail to compile if we haven’t added the required configuration values. Instead of a Maybe type, one type which expresses the fact that we don’t know whether we have a value or not, we want there to be a difference at the type level depending on whether we have a value or not. One way to do this is:

data PresentT
data AbsentT

data Option p t = Present t | Absent

This is known as a “phantom type”. The type variable p doesn’t appear in the body of the type definition, so a value of type Option doesn’t actually contain a value of type p; instead, we fill in p with either PresentT or AbsentT, to make a type level difference between Options that have a value, or don’t. We would use it like:

data ClientConfiguration regionPresence credentialsPresence = 
  ClientConfiguration { 
    confRegion :: Option regionPresence String,
    confCredentials :: Option credentialsPresence Credentials
  }

Here, regionPresence and credentialsPresence are type variables we fill in with PresentT or AbsentT to denote the presence or absence of the value in the configuration:

defaultConfiguration :: ClientConfiguration AbsentT AbsentT
defaultConfiguration = ClientConfiguration Absent Absent

withRegion :: String -> 
    ClientConfiguration p credentialsPresence -> 
    ClientConfiguration PresentT credentialsPresence
withRegion region configuration = configuration {confRegion = Present region}

Here defaultConfiguration has no values, so fills in both variables with AbsentT. withRegion adds the region, so fills in regionPresence with PresentT, while leaving credentialsPresence unchanged (if we’ve already added credentials, it will be PresentT; if not, AbsentT).

Then, we can make the build function only accept ClientConfigurations where both type variables are PresentT.

build :: ClientConfiguration PresentT PresentT -> Client
build (ClientConfiguration (Present region) (Present credentials)) = 
    createClient region credentials

Now, build defaultConfiguration will fail at compile time, because defaultConfiguration has type ClientConfiguration AbsentT AbsentT, and build needs a ClientConfiguration PresentT PresentT.

This fulfills our requirements, but there are some problems with the Option type. Although we’ve been filling in the p type variable with PresentT and AbsentT, nothing requires us to use only these two types. We could use any type, giving meaningless values like

meaningless :: Option Int String
meaningless = Present "This is meaningless"

This is a bit confusing, but maybe not terrible; as build requires, specifically, a ClientConfiguration PresentT PresentT, we could treat anything other than PresentT as meaning the same as AbsentT. However, we can also create contradictory instances of Option, which is more of a problem:

contradictory :: Option PresentT String
contradictory = Absent

We’ve made an Option which, at the type level, purports to contain a value, but at the value level, does not. So we could have a call to build which type checks, but which would fail at run time. One way around this is to add “smart constructors” - functions which will only allow us to create meaningful instances of Option:

present :: t -> Option PresentT t
present x = Present x

absent :: Option AbsentT t
absent = Absent

Then, if we put Option in a module and only export the smart constructors, and not the regular constructors, we can prevent users of Option from creating meaningless or contradictory instances.

There’s one more problem, though. Looking again at the definition of build:

build :: ClientConfiguration PresentT PresentT -> Client
build (ClientConfiguration (Present region) (Present credentials)) = 
    createClient region credentials

There’s a pattern match in the function definition here, and when we compile it, haskell will warn us that the pattern match is missing cases for ClientConfigurations with Absent values. We know that the only way to get an Option PresentT t is to call the present smart constructor, so we know that Present is the only possible case for an Option PresentT t. But the compiler doesn’t know that; it’s still worrying about the contradictory example above.

However, we can tell the compiler this that the smart constructors are the only constructors for Option, by changing the definition to

data Option p t where
  Present :: t -> Option PresentT t
  Absent :: Option AbsentT t

Now the compiler knows that, if the Present constructor is used, the result will have type Option PresentT t; so it’s impossible for the Present constructor to match an Option AbsentT t. And, the Absent constructor will never match an Option PresentT t. The compiler now knows what we know: that Option PresentT t must be Present, and Option AbsentT t must be Absent.

These types, with constrained constructors so that only certain combinations of type variables are possible, are called generalised abstract data types. They allow us to give the compiler more information about the space of possible values in our program, and so allow us to express more complicated situations, type safely.