Language Feature

Exploring the power of union types in Scala

Illustrated by Leandro Lassmar

One obvious advantage of typed languages is that they force you to consider all your data: where it came from and where it is going. If you try to force a String down a path it's not meant to go, the program will not compile. You'll need to retreat or change more code. The compiler is equivalent to millions of little unit tests that you don't get to opt out of.

However, sometimes it can get overbearing. You may have a UserId, RequestId, ProductId, and sundry others. You know that in some circumstances all of these types are equivalent (perhaps designing an API structure around them) and in some cases they are not equivalent (putting them in a database).

A traditional approach to solving the problem of shared functionality between disparate types would be method overloading: the same method defined n times, once for each type. However, that's a lot of boilerplate. It's difficult to refactor, error prone if the definitions diverge, and how do you overload a class properly anyway?

Union types can help.


A union type would generally be considered some constructed type T, which, when examined, could resolve to one of several types A, B, C, etc.

You can think of it as the opposite of the tuple, or product. A tuple T is a constructed type, which, when examined, resolves to one of each of several types A, B, C, etc.

More formally, a union type is a coproduct, the dual of a product.

In Scala, you might immediately be drawn to the Either[_, _] type as an example. And it is indeed a poor man's union type---we can do the following:

type T = Either[Int, String]
val t: T = ???
t match {
  case Left(_) => "It's an int!"
  case Right(_) => "It's ~a boy~ a string!"
}

You see we have a value of type T, which is (roughly) of type Int or String, and upon examination we find out which one it is and get some result back accordingly. This, for future reference, is a fold on the structure of Either.

So, what's wrong with it?

The first, and most major, impedance is that it is quite clearly intended to be limited to just two types. There aren't many places to go once you've used up left and right. Up and down, forward and backward. Spinward, hubward, rimward,
and widdershins, perhaps.

A tuple, for all its faults (and there are many---I imagine I'll be writing about those faults in the future), is at least available in more than just a pair:

type T2 = (Int, String)
type T3 = (Int, String, Boolean)
type T4 = (Int, String, Boolean, Char)

Scala, for reasons lost to the mists of time, lets you go up to a tuple with twenty-two elements and no further.

You could of course do some nesting of an Either, to emulate more options than just left or right:

type Or[A, B, C] = Either[Either[A, B], C]

val t: Or[Int, String, Boolean] = ???

t match {
  case Left(Left(_)) => "An integer"
  case Left(Right(_)) => "A string"
  case Right(_) => "A boolean"
}

It's pretty horrid though. You have to know that Left(Right(_)) is a possibility, but not Right(Left(_)). And imagine further nesting: Where do you put the extra Either? On the outside? Around the C? Around the B?

No, it's a disaster. Your future user (the next coder who has to use an Or7) won't thank you. And God forbid someone comes along and defines Or8 in the wrong way.

Thankfully, Scala does not leave you hanging. With type-level programming---programming that purely manipulates type structures with no values involved at all---you can program in extra rules on top of the compiler to enable new functionality.

Typeclasses

We're functional developers! Let's write a typeclass.

What do we want out of it? Let's start by saying we want a way of telling whether a type T is an Int or a String.

So:

sealed trait IntOrString[T]

object IntOrString {
  implicit val intInstance: IntOrString[Int] = new IntOrString[Int] {}
  implicit val stringInstance: IntOrString[String] = new IntOrString[String] {}
}

The trait IntOrString is sealed to prevent a mischievous client from defining an instance of IntOrString[Boolean] in their code. A Boolean is, by definition, not an int or a string.

It looks...ok. You can easily imagine how to extend it to more things: Adding in OrBoolean is as simple as an extra instance (and a rename!).

This is how IntOrString might be used:

def foo[T: IntOrString](t: T): String = t match {
  case i: Int => "Integer!"
  case s: String => "String!"
}

foo(5)      // compiles
foo("bar")  // compiles
foo(true)   // does not compile!

Now, the pattern match here is not checked for exhaustivity by the compiler. T is a free type, and it is constrained not by the type system but by our implementations on top of it (our typeclass). Therefore the compiler can't help us.

We could add extra structure to IntOrString to extract our value if we wanted to. Perhaps something like this:

sealed trait IntOrString[T] {
  def asInt(t: T): Option[Int]
  def asString(t: T): Option[String]
}

But these approaches get unwieldy to use, since there will always be an impossible case to handle
(the case where both asInt and asString are both None is impossible, but the compiler does not know this).

It's a trade-off between exhausting boilerplate, and losing exhaustivity matching. Let's choose the latter.

Going Further

The eagle-eyed among you will notice that above we just defined a union type for Int and String. That is almost useless for our purposes; we want something generic. Such fixed approaches are fine for limited use cases, but if you have more than a few, you'll quickly get sick of them.

What we want to do is abstract over Int and String: the types themselves and the number of them.

Abstracting the Types

Let's give it a shot. The first thing to do is simply replace Int and String with generic types. We quickly hit a roadblock:

trait XorY[T]

...what are X and Y? They have to be added somewhere. The fact we were using Int or String before---things already known to us at the coding level and known to the compiler at compile time---hid the fact that we need some new structure.

We need a typeclass, which, when given three types, has instances when the first is equal to one of the latter two.

Let's give it a go:

trait IsOneOf[T, X, Y]

object IsOneOf {
  def apply[T, X, Y](implicit ev: IsOneOf[T, X, Y]): IsOneOf[T, X, Y] = ev
  implicit def xInstance[X, Y]: IsOneOf[X, X, Y] = null
  implicit def yInstance[X, Y]: IsOneOf[Y, X, Y] = null
}

This is simply saying IsOneOf[T, X, Y] has two instances, one where T = X and one where T = Y.

We can take it for a spin:

IsOneOf[Int, Int, Boolean]  // Compiles
IsOneOf[Boolean, Int, Boolean]  // Compiles
IsOneOf[Float, Int, Boolean]  // Does not compile, Float is  neither Int nor Boolean

You might use it like this:

def foo[T: IsOneOf[*, Int, String]](id: T): String = id match {
  case i: Int => "We got an Int"
  case i: String => "We got a String"
}

foo(5)
foo("hello")
foo(true)  // bad!

(Note the use of * in type-position in how we used IsOneOf, which requires the excellent 'kind-projector' compiler plugin. Before scala 2.13, ? was used instead).

Again, note the pattern match is not exhaustive. If you leave out String, it won't warn you.

But, it works! The Int and String we chose are completely arbitrary. They could be any types we wanted at all, for hardly any more work.

Abstracting the Arity

The above was easy. This next bit could be easy, but deeply unsatisfying.

You'll notice that IsOneOf is hard-coded to two possibilities---T must be one of X or Y. It has two implicit instances.

We could, of course, just define one for three possibiltiies:

trait IsOneOf3[T, X, Y, Z]

or five:

trait IsOneOf5[T, A1, A2, A3, A4, A5]

but...each has to have a separate definition. A separate name, separate implicits (IsOneOfN needs N implicits).

We also have a new special consideration when dealing with higher arities: If T is X or Y, then T is certainly X or Y or Z.

Implementing these relationships thoroughly in N different traits will get old, fast. We're back to tuples by a different name. You would stop at twenty-two, too, if you had to write all this stuff out.

No, we need a different, more robust approach.

Abstracting the Tuple

We will be using HLists.

If you've not come across them before, they are just lists of types, implemented just like List is for values.

Each element of type HList has a head, which is a defined type, and a tail, which is another HList. All except the emptyHListthat is, usually denotedHNil`.

We can define our own very simply:

sealed trait HList

trait ::[+H, +T <: HList] extends HList

sealed trait HNil extends HList

You see ::, the cons operation, completely analagous to normal Lists ::. Our definition is a bit simpler than you might find elsewhere, since we're not interested in the value level. The type we're dealing with T is completely free, and the HList is just going to act as information for the compiler. We never will construct a value of type HList---true type-level programming.

Here's how you might define an HList type:

type Possibilities = Int :: String :: HNil

Ideally, we want to be able to define this:

trait IsOneOf[T, L <: HList]

object IsOneOf {
  def apply[T, L <: HList](implicit ev: T IsOneOf L): T IsOneOf L = ev

  // implicits here
}

// where:
IsOneOf[Int, Int :: String :: Boolean :: HNil]  // compiles
IsOneOf[Boolean, Boolean :: HNil]  // compiles
IsOneOf[Unit, Int :: Char :: HNil]  // does not compile

That syntax is everything we want---dynamic lengths, dynamic types. We just need to implement whatever implicits drive IsOneOf.

Let's start with an IsMemberOf typeclass. The reason why this is separate and not just operating on our original IsOneOf will become clear. The purpose of this IsMemberOf[T, L <: HList] typeclass is to be available if and only if T is an element of L.

Here's the definition:

trait IsMemberOf[T, L <: HList]

How would you write this algorithm at value-level? How would you decide whether 5 was an element of x: List[Int]?

A particularly functional and terse way of writing this value-level algorithm would be:

def isMemberOf(value: Int, list: List[Int]): Boolean = list match {
  case Nil => false
  case head :: tail if value == head => true
  case _ :: tail => isMemberOf(value, tail)
}

Three cases to consider:

  • If the list is empty, 5 cannot be a member of it.
  • If the head of the list is equal to 5, then of course the list contains it.
  • Otherwise, just test the tail and recurse.

These are completely analagous to the type-level algorithm:

(a) The false case means there is no implicit avaialable (so a bad example woulnd't compile).
(b) true means there does exist an implicit satisfying that case.
(c) The recursive case is just a recursive implicit call.

Well, let's give the simple case a go:

implicit def headCase[T, Tail <: HList]: IsMemberOf[T, T :: Tail] = null

This reads as "In the case where our HList takes the form T :: SomethingElse, T is a member of it". Read the return type to figure out which run-time case match it corresponds too.

Simple! The recursive case almost writes itself too:

implicit def recursveCase[T, Head, Tail <: HList](
  implicit recurse: T IsMemberOf Tail
): IsMemberOf[T, Head :: Tail] = null

This reads as "IfTis a member ofTail, thenTis a member ofHead :: Tail, no matter whatHead` is".

These are all we need. You'll notice that neither implicit will satisfy T IsMemberOf HNil (Both require our HList to have a tail)---our third and final requirement.

IsMemberOf[String, Int :: String :: HNil]  // compiles
IsMemberOf[Boolean, Int :: String :: HNil]  // does not compile

We're so close to being done! We're unfortunately missing one component I mentioned above: subunions.

Suppose we had this setup:

def foo[T: IsMemberOf[*, Int :: String :: HNil]](value: T): String = bar(value)
def bar[T: IsMemberOf[*, Int :: String :: Boolean :: HNil]]: String = ???

At the moment, this will not compile. When we call bar(value), we have an implicit IsMemberOf[T, Int :: String :: HNil].

But to call bar, we need an implicit IsMemberOf[T, Int :: String :: Boolean :: HNil]. These types aren't equal---we need some more implicit massaging to make them fit into place.

Subunions

We need a typeclass SubListOf. This typeclass will take two HLists, and be available implicitly if the first is a subHList of the second:

trait IsSubListOf[X <: HList, Y <: HList]

What does SubList here mean? It means simply that every element of X is in Y, somewhere.

Let's again defer to a value-level argument to make it clear how we will proceed. We need a function that takes two lists and says whether one is a "sublist" of the other:

def isSubListOf(x: List[Int], y: List[Int]): Boolean = (x, y) match {

  // Remember we wrote isMemberOf above! We can delegate to it here

  // The recursive case: The head must be a member, and the tail must be a sublist
  case (h :: tail, list) if isMemberOf(h, list) && isSubListOf(tail, list) => true

  // The termination case: An empty list is always a sublist
  case (Nil, _) => true

  // Every other case returns false:
  case _ => false

}

This doesn't really look much more complex than IsMemberOf. Let's do the empty-list case:

implicit def hnilCase[Y <: HList]: IsSubListOf[HNil, Y] = null

This simply says HNil is a sublist of Y, for any Y <: HList

And the second case:

implicit def recursiveCase[X <: HList, H, Y <: HList](
  implicit member: H IsMemberOf Y,
  sublist: X IsSubListOf Y
): IsSubListOf[H :: X, Y] = null

Again, it reads as an ordinary logical statement:

  • If your argument is of the form H :: X,
  • And H is a member of Y,
  • And X is a sublist of Y,
  • Then H::X (your original argument) is a sublist of Y.

...and that's all we need. The third case, the false case, is again translated into compile-time recursive implicits by simply not being there.

Here it is working:

type A = Int :: String :: HNil
type B = String :: Boolean :: Int :: HNil
type C = Char :: String :: HNil

IsSubListOf[A, B]  // compiles - even though the types are in different orders!
IsSubListOf[A, C]  // does not compile, C is missing Int
IsSubListOf[B, A]  // does not compile, Boolean is not present in A

Putting It All Together

We're finally ready to wrap everything up inside IsOneOf:

trait IsOneOf[T, L <: HList]

object IsOneOf {
  def apply[T, L <: HList](implicit ev: T IsOneOf L): T IsOneOf L = ev

  implicit def isMemberOfInstance[A, L <: HList](implicit m: IsMemberOf[A, L]): IsOneOf[A, L] = null

  implicit def isSubListInstance[X, Y <: HList, Z <: HList](
    implicit o: IsOneOf[X, Y],
    s: IsSubListOf[Y, Z]
  ): IsOneOf[X, Z] = null

}

A quick step through:

  • isMemberOfInstance just delegates to the IsMemberOf typeclass we defined above. It's the simple case where we're just considering simple elements.
  • isSubListInstance provides the transitivity: X is one of Y and Y is a sublist of Z, implies X is one of Z.

And, now, everything works beautifully:

def testing1[T: IsOneOf[*, Int :: String :: Boolean :: HNil]](t: T): String = t match {
  case i: Int => i.toString
  case b: Boolean => b.toString
  case s: String => s
}

// These compile:
testing1(5)
testing1(true)
testing1("testing!")

// This fails:
testing1('c')

It even works with multiple unions and their intersections! Observe:

def testing2[
  T: IsOneOf[*, Int :: String :: HNil]
   : IsOneOf[*, Boolean :: Int :: HNil]
](t: T): String = ???

testing2(17)  // compiles
testing2("string")  // does not compile: String is not Boolean or Int
testing2(true)  // does not compile: Boolean is not Int or String

And, finally, proof that it works for subunions:

  // Compiles:
  def bar[T: IsOneOf[*, Int :: String :: Boolean :: HNil]](t: T): String = ???
  def foo[T: IsOneOf[*, Int :: String :: HNil]](value: T): String = bar(value)

You see bar requires a wider union than foo, and so foo is able to call it successfully. This will
work for unions of any size, as long as foos are strictly contained within bars.


So what have we achieved?

With these capabilities, union intersections and union widening, we have an unboxed, arbitrary arity union type.

You can program your unions in a completely abstract way, coding to a L <: HList rather than a fixed union. The subunion and intersection relations displayed above ensure that you can rely on the compiler machinery to automatically validate all union logic in your program, based entirely on how you define them at the very end of the world---for free.

James Phillips

author

James is a functional and scala enthusiast with a keen interest in type-level programming. He runs a consultancy in London and Brighton, and occasionally cycles to Paris.

Leandro Lassmar

illustrator

Leandro Lassmar is an illustrator living in Minas Gerais, Brazil.
He worked in animation studios, currently works for magazines, books and advertising.
It won the SND (society of news design) and ÑH - Lo Mejor del Diseño Periodístico awards.