Associated Types and Haskell
Or how to write a type-class containing a function returning a value of another related type-class.
Lets assume you’re using type-classes extensively to make your business logic independent of your data-representation. Some may argue whether this is a good idea, but that’s not relevant to what follows. This article explains why you’ll encounter a bug, and how to use the TypeFamilies
extension to work around it.
Assume you have a type-class that defines a few functions:
type Position = (Int, Int)
class Player p where
playerPosition :: p -> Position
playerMoveTo :: Position -> p -> p
-- etc. and lots more
Now lets say this is all part of your game state, which you also define via a type-class
class GameState g where
getPlayer :: (Player p) => g -> p
getMonsterPositions :: g -> [Position]
So far, so good. All of this compiles. Next we create the implementations:
data PlayerData = PlayerData { _pos :: Position, ... }
instance Player PlayerData where
playerPosition = _pos
playerMoveTo pos player = player { _pos = pos }
-- etc..
data GameStateData = GameStateData PlayerData [Position]
instance GameState GameStateData where
getPlayer (GameStateData p _) = p
getMonsterPositions (GameStateData _ mPoses) = mPoses
We want use this to write a wholly generic function like follows
checkForCollisions :: GameState s => s -> [Position] -> Bool
checkForCollisions s ps =
let
p = getPlayer s
pPos = playerPosition p
in
pPos `elem` ps
The problem is that none of this compiles!
- The implementation of
GameState
forGameStateData
won’t compile, you’ll get an error ofCould not deduce (p ~ PlayerData)
. - The code for checkForCollisions won’t compile, with a similar error:
Could not deduce (Player p0) arising from a use of ‘getPlayer’
Polymorphism vs Monomorphism
To someone coming from the imperative object-orientated world, this seems mysterious, as one could trivially achieve the same effect in imperative OOP languages using interfaces.
The issue is the difference between polymorphism and monomorphism.
Consider the following Java code
interace MyInterface {
public int someFunction();
}
public static MyInterface genericFunc(int a) { ... }
public static void int callGenericFunc() {
MyInterface mi = genericFunc(42);
return mi.someFunction();
}
What we are saying is that
- There is a family of types that implement
MyInterface
genericFunc
will return a value of a single, specific type in that family[1]callGenericFunc
must be able to handle a value of any type in theMyInterface
family, regardless of the underlying representation (i.e. it must be polymorphic)- It is
genericFunc
that chooses the particular type.
The following Haskell code looks very similar:
class MyTypeClass t where
someFunction :: t -> Int
genericFunc :: (MyTypeClass t) => Int -> t
genericFunc = ...
callGenericFunc :: Int
callGenericFunc =
let mt = genericFunc 42 in
someFunction mt
In the Haskell version, we are saying that
- There is a family of types that implement
MyTypeClass
genericFunc
can return a value of any type in that family.
Whereas the Java version returned a value of a single specific type.callGenericFunc
will only work on one specific type in that family.
Whereas the Java version worked on values of any type in the family- It is
callGenericFunc
which decides which typegenericFunc
should return.
Whereas the Java version hadgenericFunc
make the decision
So while the Java compiler renders this generic code polymorphic, by adapting callGenericFunc
to work with any value in the MyInterface
family, Haskell makes the code monomorphic by choosing a single specific type in the MyTypeClass
family and generating variants of genericFunc
and callGenericFunc
which work on that type.
There are a few advantages to this. On a machine level, forcing everything to a single concrete type allows for static dispatch and therefore function-inlining which is a performance optimisation[2]. This is is why you see monomorphism appearing in recent ML-derivatives like Swift and Rust.
The second is that it means the typeclass code you write is incredibly generic, and can work in whichever way the caller requires.
However in order for monomorphism to work, the compiler needs to be able to identify the particular type that callGenericFunc
will use.
Associated Types to the Rescue
If we look at our example again, we can see the problem
checkForCollisions :: GameState s => s -> [Position] -> Bool
checkForCollisions s ps =
let
p = getPlayer s
pPos = playerPosition p
in
pPos `elem` ps
GameState
is a generic typeclass, so the compiler will inspect the code that calls checkForCollisions
to choose the specific implementation.
Once it’s chosen an implementation in the GameState
family, the typechecker looks at checkForCollision
and sees getPlayer
returns a value of another generic typeclass Player
.
Remember it’s not the implementation of GameState
that must determine the type, it’s checkForCollisions
, so that’s where the type-checker looks.
Unfortunately, all the code in checkForCollisions
is completely generic, so it can’t choose a single concrete type: hence Could not deduce (Player p0)
.
The solution to this to allow the implementation of GameState
to additionally specify the particular type in the Player
family to use.
To this this we use the TypeFamilies
extension.
First we alter our type-class to add a type placeholder called PlayerType
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}
class Player (PlayerType s) => GameState s where
type PlayerType s :: *
getPlayer :: s -> PlayerType s
getMonsterPositions :: s -> [Position]
Essentially PlayerType
is a variable that contains a type rather than a value. Consequently it’s annotated with a kind (recall, that the “type of a type” is called a kind). In this case the single asterisk means that this should be a concrete type.
Associated types must be tied (i.e. associated with) the type defined in the type class, which is why it’s PlayerType s
and not just PlayerType
.
However we can still constrain the associated type to be in our Player
type-class as you can see. We need the FlexibleContexts
extension in order to perform this constraint however.
You can have as many associated types as you want by the way, I’ve just used one in this example for simplicity.
The next step is to assign a particular concrete type in our implementation:
{-# LANGUAGE TypeFamilies #-}
instance GameState GameStateData where
type PlayerType GameStateData = PlayerData
getPlayer (GameStateData p _) = p
getMonsterPositions (GameStateData _ mPoses) = mPoses
This then solves our problem. The code that called checkForCollisions
has already chosen the particular type in the GameState
family, and in this example lets assume the type is GameStateData
.
The compiler next looks at checkForCollisions
, but now it knows from the GameStateData
implementation that the associated type of Player
used for getPlayer
is PlayerData
. Hence the code type-checks, and the compiler has the information it needs to monomorphise it.
And we’ve managed to do this while keeping checkForCollisions
completely generic.
Final Thoughts
This only really rears it’s head once you start making extensive use of type-classes. Since defining and altering types is so easy in Haskell, you can argue that there’s no need for typeclasses. In fact, since abstraction bulks out code, and so can make it harder to read, there’s an argument to be made against the use of typeclasses for abstraction.
However there are many Haskellers that use typeclasses as a way to write code their monadic code in an “effectful” style (emulating the effects features in pure functional languages such as PureScript) and it’s here that they can run into issues, as I did.
In my case, I had a function like
goto :: (HasLevel r, MonadReader r m, HasPlayer p, MonadState p s)
=> Direction -> m MoveResult
And in this case I’d defined HasLevel
to return a value in a Level
type-class so the game engine could work equally well with different sorts of Level
implementations. As it turned out, in the end, I only had the one implementation, so this was an unnecessary and premature abstraction.
In short, I wouldn’t encourage developers to use this on a regular basis. It is a useful trick to know, however, particularly since it’s begun to appear in other, more mainstream ML-derivatives like Swift and Rust.
1. This is not strictly true, you can have an if-statement in Java code which will return one type in one branch, and another type in another branch. The point being the Java code can only return values from a small subset of types in the MyInterface
family whereas the Haskell code can return a value of any of the types in the MyTypeClass
family.
2. Polymorphic code looks up a table for every function call, then calls the function. Static dispatch calls the function directly, without a run-time lookup and so is faster. Inlining skips the function call overhead entirely, copying the function body into the calling code, and so is faster still. However as this copying makes the overall size of your code greater, it can overflow the cache, which will make your code run much much slower. As a result, inlining is not an automatic win.