18 January, 2018

Understand Cassandra by Implementing It

Over the years Cassandra has begun to look more and more like a normal SQL database, which is why people are increasingly confused using it, as it’s various limitations on what can be used in WHERE clauses; peculiar performance characteristics; and odd data-modelling advice, are all contrary to ones expectations of what a SQL RDMS should look like.

In this post I’ll go through how one would implement a database like Cassandra, in very high-level terms, which should hopefully explain why it works the way it does, and give a better insight into how data should be modelled.

Basic Tables and Queries

At the simplest level, Cassandra is a map (or dictionary) from a key to a value. Using some Scala-ish pseudocode here’s how we’d define a simple table

type CdbTable = HashMap[K <: Hashable,V]

Where K is the type of our key, and V is the type of our value. In this pseudo code we have to explicitly say things are Hashable, and we presumble Hashable things are Equatable (i.e. have an equalsTo method)

So we could say simply store the word-counts for books as

wordCounts : CdbTable[String,Int] = new HashMap()

But lets say we wanted to have more than one lookup value: lets say we wanted word counts for each book by each author. To do this we have to create a composite key type

class HKeySet(keys : List[Hashable]) : Hashable {

	override def hash() : Int {
		// return the hash of the hashes the keys
	}
}

type CdbTable = HashMap[HKeySet,V]

Now our word-count database is

wordCounts : CdbTable[HKeySet,Int] = new HashMap()

With this model we can do lookups by existence: e.g. fetch the word-count for this book by that author; however we can’t do range queries. If we had a third item in our key, chapter say, and we wanted to get the word counts for chapters 2 <= c <= 10 we would have to manually enumerate all values and do a lookup for each one, which is costly¹.

What if we want to delete something? Well one option is just to remove it entirely from the HashMap, but lets say – for the sake of argument – that we’re using some fancy thread-safe concurrent HashMap that doesn’t support deletion. In that case we need a marker to say when a value is gone. So we write an “envelope” for our value with this metadata

class Envelope[V] {
	var wrappedValue : V
	var timeStamp : TimeStamp
	var ttl : Option<TimePeriod> // maybe delete this after a while
	var deleted : Bool
	
	Envelope(theWrappedValue) {
		wrappedValue = theWrappedValue
		timeStamp    = now()
		deleted      = false
		ttl          = None
	}
}

The timestamp is updated on every change and is for… explaining later. The time-to-live (TTL) is a special Cassandra feature. Both are unimportant.

Our table of word-counts by book and author thus becomes

wordCounts : CdbTable[HKeySet,Envelope[Int]] = new HashMap()

And if we delete something we can just set the deleted property to true. Of course this leaves a key in the HashMap recording the existence of our long-lost value, like a tombstone in a graveyard, which we hope will be eventually removed by some background process².

But lets go back to the chapter problem. Lets say we decide to add a second layer of keys to our database, that allow ranges. This time we choose to use a Map that keeps things compactly ordered on disk and in memory. The easiest implementation is a sorted associative array

class ArrayMap[K <: Comparable,V] {
	var entries : ArrayList[Tuple2[K,V]]
		
	def insert(key : K, value : V) {
		// binary search for key's position i
		// shift things right to make room
		entries[i] = (key, value)
	}
	
	def get(key : K) : Option[V] {
		// binary search for key's position
		// return the value or None if its absent
	}
}

In order for this to work, our keys have to be comparable, which is ideal for range queries.

Our table is now:

type CdbTable = HashMap[HKeySet,ArrayMap[CKeySet, Envelope[V]]]

The type CKeySet is a list of keys like HKeySet, except in this case it’s Comparable rather than Hashable, and it compares from the outside in i.e. the map is sorted by the first key, then the second, then the third, and so on.

With values ordered in memory, it should now be a lot easier to do range queries, as we can just use the HashMap lookup to fetch a block of values (a partition of the total values) and then iterate through the second ArrayMap.

Hence in Cassandra when one writes


CREATE TABLE word_counts (
	title   Text,
	author  Text,
	edition Int,
	chapter Int,
	count   Int
	PRIMARY KEY ((title, author), edition, chapter)
)

INSERT INTO word_counts(
		title, author, edition, chapter, count)
	VALUES ("Misery", "S. King", 2, 12, 3040)

You should now understand that this is represented as

word_counts : CdbTable = ....
word_counts.getOrDefault(new HKeySet(["Misery", "S. King"])
           .put(new CKeySet([2, 12]), new Envelope(3040))

In Cassandra terminology the “partition” key is the HKeySet in the HashMap and identifies a particular block (“partition”) of rows³ and the “cluster key” is the CKeySet in the ArrayMap that identifies the specific subset of rows in that partition. Together these form the primary key.

With this insight it should be clear why:

It’s strongly advised to use all parts of a partition key when doing a lookup
It’s often better to use cluster key for range queries. (However note that if your Hash function is ordered with respect to values, you can do a range query on hashes using the TOKEN function which returns the hash value of input)
There are so many edge cases in Cassandra where certain where-clauses (e.g. IN) are allowed on the last column only.
You can only order on keys; why it’s recommended to use the cluster-keys for ordering; and why you can specify whether the cluster-key ordering is ascending or descending when creating the table⁴

It should also be clear that doing a range query, even on a cluster-key, can involve iterating through an entire partition, which can be costly. Hence Cassandra often explicitly requests that you add ALLOW FILTERING to range-based WHERE queries to force the developer to indicate to the server that they realise the query could take a long time to execute.

Distributed Computing

At this point we have a solid basis for a simple database that supports range queries and existence queries.

The Cassandra developers went one step further of course, they wanted to distribute this across several machines.

At the time Cassandra was written, several distributes HashMaps already existed (ehCache, Memcache etc). So it was straightforward for the Cassandra developers to use this framework. Hence our model is

type CdbTable = DistributedHashMap[HKeySet,ArrayMap[CKeySet, Envelope]]

And now different HashMap keys’ values – the partitions – are stored on different machines.

But what if a machine dies? To work around such cases, writes are replicated, usually to three machines, with control typically returning when a majority, or “quorum”, of those writes have succeeded.

When reading, you query all the machines, and once a quorum of machines have returned values⁵, return that one which has the most recent timestamp⁶.

This means of course that it’s possible to obtain a stale value since a machine busy doing a write may not return before the time-limit is up. This is why Cassandra offers guaranteed availability and partition-tolerance but not data-consistency (the idea that the most recent read reflects the most recent write). By contrast Redis offers consistency and partition-tolerance, but therefore cannot guarantee availability within a time-limit. For more on these trade-offs the CAP Theorem page on Wikipedia is a good place to start.

This also is why map-reduce works on Cassandra. Since the data is replicated in partitions (equivalent to “blocks” in HDFS), it’s possible to dispatch code to each machine, which works on the blocks stored there, and then report the individual results back to a single machine for reduction to the final output.

This approach of dispatching (“scattering”) requests out to several machines, and then collecting (“gathering”) the results is usually referred to as a scatter/gather pass when discussing Cassandra performance.

Column Families, not Column Orientation

I mentioned storage when talking about distribution. Storage in Cassandra is something that’s worth looking into

Lets say that instead of a single value like “word_count” we actually have many different values in our table, say

CREATE TABLE authors (
	id         UUID,
	name       Text,
	email_user Text,
	email_host Text,
	roles      Set<Text>,
	town       Text,
	age        Int
	PRIMARY KEY (id)
)

-- Note we skip a few columns here
INSERT INTO authors (id, name, email_user, email_host, roles)
	VALUES(uuid(), "S. King", "sking", "gmail.com", { "Writer" } )

A common misconception with Cassandra is that it is a column-orientated database, which is to say that data is stored as a collection of multi-row columns, rather than the usual approach of multi-column rows. Column orientation does make compression very easy – one can imagine 90% of the email_host column above is just gmail.com – and is ideal for OLAP⁷ workloads where you read in a large portion of the data for a small subset of columns.

It should be clear, since the primary key, in both its partition and cluster key parts, identifies a single row, that Cassandra is row-orientated. However Cassandra rows don’t store all columns, they only store a subset of columns: in the example above we skipped age and town for example. The subset of columns is called a column-family. Since we only store a subset, we need the column names as well as values, which is just a HashMap[ColName,Any]

Hence our true representation is really

type CdbTable = DistributedHashMap[HKeySet,ArrayMap[CKeySet,HashMap[ColName,Envelope[Any]]]]

Note that each column value has its own envelope, and consequently its own deletion marker and its own time-stamp⁸.

This does lead to a special issue however. Cassandra stores NULL values as tombstones: i.e. empty envelopes with the deletion marker set to true. So for example the following stores 5 values in a single column family

-- note age and town columns are absent in the column list
INSERT INTO authors (id, name, email_user, email_host, roles)
	VALUES(uuid(), "S. King", "sking", "gmail.com", { "Writer" } )

whereas the query below stores 7 values in a single column, two of which are tombstones which will be removed later in a compaction.

INSERT INTO authors (id, name, email_user, email_host, roles, town, age)
	VALUES(uuid(), "S. King", "sking", "gmail.com", { "Writer" }, NULL, NULL )

As a result, when using ORMs, it’s important to pay attention to how they handle nulls, to avoid excessively bloating out the size of your database, and increasing the time taken in compactions.

Another thing that should be clear from this is that adding columns to a Cassandra table is very very⁹ cheap. All it does is change the schema.

Indexes

At this stage it should be clear now why you can only have partition and cluster keys as WHERE clauses in your query. If you want to add a field in the column family you need to add an index. Indexes in Cassandra are stored as part of the structure that holds the ArrayMap in a single partition, which is identified by the partition-key part of the primary key.

class Partition (
	contents : ArrayMap[CKeySet,Map[ColName, Envelope[Any]]], 
	indexes : Map[ColName, Map[Any, CKeySet]]
) {

}

type CdbTable = DistributedHashMap[HKeySet, Partition]

A quick thing to note is that if you query on something that’s not a part of the partition key, you will go through every index on every single partition . This is hugely costly, particularly in a distributed HashMap where it means hitting and locking every single machine. In such cases, as with range-queries and IN queries, it’s best to create a second table with the same data using a different primary key structure tailord to the query. Historically this was done manually, but in very recent versions Cassandra can automate this using materialized views

What an SSTable actually is

An SSTable, or “sorted-strings table”, is how Cassandra stores information on disk. An SSTable is

Immutable, it cannot be changed
Represents only a fixed number of rows¹⁰

If it’s immutable, how does Cassandra handle mutations? The answer is that it stores changes. So for example

UPDATE authors where UUID="xxxx-uuid" set name="Stephen King"

Will just store an additional record noting that the name field is now Stephen King. This means our true (and thankfully final) representation is

class Partition (
	contents : ArrayMap[CKeySet,List[Map[ColName, Envelope[Any]]]], 
	indexes : Map[ColName, Map[Any, CKeySet]]
) {

}

type CdbTable = DistributedHashMap[HKeySet, Partition]

and the series of SQL commands

INSERT INTO authors (id, name, email_user, email_host, roles)
	VALUES("xxx-uuid", "S. King", "sking", "gmail.com", { "Writer" } )
UPDATE authors where UUID="xxxx-uuid" set name="Stephen King"

corresponds to the following code

// no cluster key
authors.getOrDefault("xxxx-uuid")
       .contents
       .getOrDefault(NULL_CLUSTER_KEY)
       .add ({
			"name"        -> new Envelope ("S. King"),
			"email_user"  -> new Envelope ("sking"),
			"email_host"  -> new Envelope ("gmail.com"),
			"roles"       -> new Envelope ({ "Writer" })	
		})
authors.get("xxxx-uuid")
       .contents
       .get(NULL_CLUSTER_KEY)
       .add ({
       	"name" -> new Envelope ("Stephen King")
       })

So to retrieve a single record Cassandra has to go through the list of updates, keeping track of individual column-values’ timestamps, until it has recovered a complete record.

It is this list of updates that is periodically serialized from memory to an SSTable on disk.

Outdated information is eventually discarded by a background compaction service operating on these SSTables. This happens automatically (e.g. see the gc_grace_seconds configuration flag) or can be triggered automatically using the nodetool compact command.

It’s this process of aggregating delta-updates in memory, and periodically serializing them, that makes Cassandra so fast at writes. Of course if a machine dies before it’s had the opportunity to flush updates to disk then that information is lost, but only on that machine. Ideally the other machines to which data has been replicated will not also crash before flushing the update to disk.

If you want to have a look at an SSTable you can convert to human-readable JSON using SSTabledump. To get the current state of the database you can use nodetool flush to force a flush to disk. Finally You can also see how much of your table consists of uncompacted tombstone markers using SSTablemetadata

Conclusions

I hope this helps explain all the peculiar ways that Cassandra differs from standard SQL databases.

The idea of a collection of nested HashMaps is probably the best way of thinking about how Cassandra works, how to model data within it, and why certain operations are fast and other operations are slow.

It goes without saying that the true implementation of Cassandra is radically different to this, but this should give an insight into its architecture.

Addendum: Collections & Counters

I’ve mentioned already that in Cassandra usage it’s pretty common to denormalised tables, and have a table for every possible query¹¹. However one final thing that needs mentioning is that in the layout I’ve mentioned thus far, the type of value is unbounded, it can be anything.

In practice, Cassandra’s support for datatypes is very rich. As well as the usual number, text, timestamp and UUID types, it also supports maps, sets, lists and tuples.

Hence in place of the many-to-many tables relational DBs often set up, you can just often just create one to many tables of IDs, e.g.

CREATE TABLE song_tags (
	song_id  UUID
	tags     SET<TEXT>
	PRIMARY KEY (song_id)
)

If you ever want to use a set, list, map or tuple of values as a partition-key or (less advisedly) a cluster key you need to “freeze” it into a blob.

Another esoteric type offered by Cassandra is the COUNTER type. This can only be updated, and the update can only increment it by some amount. In a table with a counter, every other column has to be part of the primary key. Counters cannot be indexed.

Of course this is where quick and space-efficient key-existence checks like Bloom filters come in handy↩
The real reason for tombstones is a combination of the fact that Cassandra uses immutable files for storage, that it replicates and distributes updates over several machines for safety, and that it offers eventual consistency when reading potentially outdated versions from a random subset of those machines. Due to all this, the only safe way to delete something is to write a deletion marker and wait for it to be replicated across the appropriate machines. Note that Cassandra stores a marker for each column separately, so setting a single column to null will create a column-specific tombstone. For more on tombstones, read this ↩
Stored on a particular machine↩
A particularly neat trick is that one can use the fact that data are explicitly ordered by cluster-keys and so just use LIMIT clauses instead of ORDER BY clauses. It’s very easy to arrange for a table to have the most recent entries at the very top for example.↩
In fact this can be configured. It’s called the consistency-level, where typically values are ONE to return the very first thing returned, QUORUM to wait for a majority of machines to return, and ALL to wait for all machines to return. The latter causes Cassandra to act much more like a Consistent-Partitioned database like Redis instead of the Available-Partitioned database it’s designed to be. Consistency level affects both writes and reads.↩
You can directly access this yourself by using the WRITETIME function. This will report the most recent time a value for any column was written.↩
OLAP, or online analytical processing, refers to database tasks that generate summary statistics (SUM, AVG, etc.) according to certain criteria (or in the lingo, summaries of measures for certain dimensions). OLTP, or online transactional processing, is about adding, deleting a mutating entire records, and collections of records, at a time in a safe, consistent way where every read perfectly reflects the most recently completed write.↩
You might think this means that the WRITETIME() function has to inspect all columns of a row in order to return the most recent timestamp. In fact if you use the low-level list() function on a table you’ll see every write includes a special value to a column with no-name. This anonymous column represents the overall list of columns, and it’s this one that it queried to get the most recent timestamp. You can also get the timestamp for a particular column by selecting WRITETIME(col_name).↩
Compared to standard RDMSs. By Cassandra’s standards it’s relatively expensive as the cluster is locked while the schema change percolates through. Nevertheless, we’re talking a second or two compared to what could be hours or days in RDMSs with the same amount of data.↩
This can actually be tuned by different CompactionStrategy classes. SizeTieredCompactionStrategy is the default, but if you do a lot of work with time-series, it may make sense to use DateTieredCompactionStrategy in conjuction with a specific cluster key ordering to make the most recent data available first.↩
This data-modelling guide gives a decent overview of what data-modelling on Cassandra involves for those coming from relational dtabases.↩

2 February, 2017

How Apple lost its Scientists

Or how Jeremy Clarkson helped me understand why Apple’s Macbook Pro was such a disappointment.

As a machine-learning professional I can’t avoid deep-learning, and since 14 of the 16 deep-learning toolkits are NVidia-only I need a machine which has an NVidia GPU to do my job. Apple stopped selling such computers back in 2014. Therefore when Apple released its new Macbook Pros with ATI cards I joined the chorus of dismay writing, among many other things

What’s astonishing is Apple built a pro computer completely around GPUs, the Mac Pro, but chose an ATI GPU. Did they not talk to any end-users?

Recently, Jeremy Clarkson helped me realise where Apple’s gone wrong.

First lets consider Apple’s reasoning. They correctly anticipated the need for scientific computing on the GPU, but 80% of the computers they sell are iPhones and iPads, for which NVidia sells no suitable chipset. Therefore they couldn’t use CUDA, NVidia’s proprietary maths library¹.

So instead they decided to promote a cross-platform API: OpenCL. NVidia was already number one, so they asked the number two – AMD/ATI – to be a partner.

ATI GPUs have the further advantage that they have much less power-draw than NVidia GPUs, which made them a great choice for graphics cards in consumer laptops.

For a prosumer laptop, one can get a lot more OpenCL power by just adding a faster ATI card. And one can make a minimalist desktop like the iMac by reusing a lot of laptop components.

At this stage you’re using ATI the whole way up, so there are significant efficiencies in scale in just standardising across the line, and putting ATI GPUs in the only remaining computer, your professional computer for scientific users, the Mac Pro.

The only problem is you can no more do scientific computing on the Mac Pro than you can write computer games on a machine unsupported by either the Unreal or Unity game engines.

And this is where Jeremy Clarkson comes in.

In the Censored to Censored episode of the Grand Tour, Hammond, May and Clarkson reviewed three SUVs, the Janguar F-Pace, the Bentley Bentayga and a Range Rover. It all ended with a race around the track, which Clarkson won by cheating: he simply left the dirt road and went cross country.

Finally, it was the turn of the best car here. However I had no intention of relying on my supreme driving skills.

You see the thing is, the Jaguar and the Bentley were designed as road cars and then given some off-road ability

Whereas the Range Rover was designed as an off-road car, and then given some ability to work on the road

This car senses what sort of terrain its driving over and then engages or disengages the differentials accordingly.

You could not come up here in the Bentley or the Jaguar. Look at that, look at it! What a machine you are!

This epitomises the difference between the prosumer and the professional. The Bentley and Jaguar are big enough and burly enough to work better on dirt tracks than a standard car, but if you take them off-road, they’ll get stuck instantly. For people whose jobs require off-road capability, only the Range Rover makes sense, because only it has the odd, peculiar, things a road-car would never have: things such as adjustable suspension; front & rear electronic locking differentials; or extremely low gearings.

Great professional hardware – motoring or computing – is created when one starts from needs and work back to a chassis. You will never succeed starting with a chassis and trying to scale up. Form has to follow function, not dictate it.

Yet this is exactly what Apple’s been doing the last five years. Final Cut X had a great new UI, and was an improvement on iMovie, but on release lacked the peculiar features professional video editors need. Photos.app is better than iPhoto, but it lacks all the editing and curation features Lightroom has. The Macbook Pro is faster than a Macbook, but lacks a sufficiently powerful NVidia GPU necessary to do scientific computing or game-development. The iMacs and Mac Pros are faster than Macbook Pros, but feature the same hardware trade-offs, and so are similarly disqualified from many professions.

Apple did once ship great professional tools. The 2010 15" Macbook Pros had an NVidia graphics card and a Unix OS, but Apple gave it the easy ergonomics of a consumer laptop by using a separate embedded GPU whenever possible to save battery, and providing a macOS shell to make Unix easy. That was a great professional laptop.

In the last five years, Apple has moved away from this philosophy, selling F-Paces instead of Range Rovers. For a while it was safe to do so, since the PC industry was still selling the computing equivalent of Land Rover Defenders: bulky ungainly things that were tolerable at work and a chore at home.

Unfortunately for Apple, Microsoft and Dell are now selling the computing equivalents of Range Rovers in their Surface and XPS ranges, and Apple is in the invidious position where it must sell dongles at a discount in order to lure professionals into purchasing its prosumer PCs. I suspect many won’t: in my case I’ll need a new computer this year, and Apple isn’t selling anything I can use.

If you’re unfamiliar with the way math libraries are structured, here’s an analogy to graphics. At the top end you have math and deep-learning toolkits like Theano and Caffe which are like game engines such as Unreal or Unity. These are built on BLAS and Lapack libraries, which are like OpenGL and GLUT. CPU vendors often release libraries following the BLAS/Lapack API to expose their chips’ features (e.g. the Intel MKL). Once one starts doing maths on the GPU, there is one additional layer: NVidia CUDA or OpenCL, which are analoguous to Metal or Vulkan.↩

22 November, 2016

Poached Frogs and Professional Mac Users

The old adage says that if you drop a frog in boiling water he’ll jump out, but if you put him in cold water and slowly heat it up, he’ll happily sit there till he’s poached to fatal perfection.

It’s an awful metaphor. Our English forebears had some grim imaginations.¹

Anyway, if one were to ask the frog if he was happy with his new situation, he’d probably say no. Why? Well, the frog would notice it was a bit warmer, but it had been getting warm for a while, so he’d discount that. What he would see was that suddenly there were lot of bubbles around, and – looking for some tangible detail – the frog would decide it was the bubbles that were to blame for his discomfort.

This is the problem of criticism. People know when they don’t like something, but they often have trouble articulating the root cause of their dislike. Instead they latch onto the most obvious, tangible difference. Film Critic Hulk² has discussed this in the past in the context of movie reviews, where people may focus on a tangible detail, such as the silly emo-Tobey-Macguire scene in Spiderman 3, instead of the broader issue, which in that case was the fact that the movie didn’t find a consistent tone in which such a scene could work.

Something similar is happening with the Macbook Pro and its critics. They’re blinded by bubbles – ports and RAM – and haven’t taken the long view necessary to see the true cause of their unease.

Gently Poached Professionals

Things have been getting slowly, inexorably, and continuously worse for Apple’s professional customers over the last five years.

In 2005 Apple suggested professional photographers should use its new app, Aperture. In 2014 Apple discontinued it. Out of the upgrade cycle, users had to pay full whack for Lightroom and re-train.

Final Cut Pro, another app from Apple, stagnated for several years in the noughties, with few updates and no 64-bit support. So it was a great relief when Apple released Final Cut X in 2011. Professional’s relief turned to ashes when they realised Apple had dropped all the awkward pernickety little features necessary to get actual work done. Apple promised plugins would address this, but as they had not forewarned developers, plugins were slow to arrive. During this time Final Cut’s users struggled in a way Adobe After Effect’s didn’t.

Apple has abandoned its scientific computing users. NVidia has massively invested in maths on the GPU – both in software and hardware – while ATI has bleated about open standards and spent a pittance. The result is that of the 16 deep-learning toolkits 14 support NVidia while just two support ATI ³. A similar situation exists with general-purpose linear-algebra toolkits. Apple stopped selling computers, of any kind, with NVidia cards in 2014.

This particularly affects me, as I do machine-learning research. There is no machine that Apple sells that I can justify buying for professional use.

What’s astonishing is Apple built a pro computer completely around GPUs, the Mac Pro, but chose an ATI GPU. Did they not talk to any end-users?

Indeed the worst affected are corporate buyers of the Mac Pro, which has seen no update in three years. Not only does it have frequently faulty GPUs and no upgradability; but there’s no hope of any new model in the future, and even if a new model were to arrive, there’s no second-hand market for such old hardware. Like Aperture users, Mac Pro owners have to write-off the entire cost and buy something completely new, out of the upgrade cycle⁴.

This was the background to the Apple event on October 27th, and to the response that followed.

Prosumers and Professionals

Just before Apple’s event, Microsoft had one of its own. They launched two computers – a laptop and a desktop – with touch-screens and styli. The laptop’s screen could be detached to form a tablet; the desktop – which looked like an iMac – could be arranged into an easel. This desktop optionally came with was a touch-sensitive dial for fine-grained adjustments. Thanks to their work with Adobe, Microsoft could use Photoshop and Illustrator to demonstrate how well these new computers worked.

These were genuinely novel.⁵

By contrast Apple announced a slightly faster, slightly smaller laptop. It had largely the same form factor, but with reduced functionality to support its smaller size. It also had a touchbar as a new input-method, and it was very evident that in execution, and even ambition, it was more limited than the dial.

In the thinking that saw both Pixelmator and Photoshop take equal billing in the launch, in Final Cut X’s absent features, in the switch to ATI and the rationale for the 16GB limit on the Macbook Pro, we find the root cause of Apple’s professional malaise

Apple has conflated professionals – whose needs are awkward and particular – with prosumers who just want a little more of everything. This has coincided, and may be a result of, a general disinterest in developing professional software and hardware.

Consider the GPU for instance. ATI GPUs have a far lower power draw than NVidia GPUs. Since casual users value portability, an ATI GPU makes sense. For professionals, only an NVidia GPU is good enough for scientific computing, VR, or game development.

Thus a professional computer should ship with NVidia GPUs irrespective of the adverse effect on battery life and portability. Portability is a secondary concern: professionals work in offices, not coffee shops.

This too is why many professionals would be happy if the most expensive Macbook Pro sacrificed battery-life to raise the RAM ceiling.

Fundamentally, if you try to scale up a machine aimed at casual users, you’ll miss the things professionals need to get work done, since professional needs are often slightly esoteric. Instead one should start with what professionals need, then work down to a chassis.

Modern Apple doesn’t seem to know what professionals need. Microsoft invited several artists to discuss the Surface Studio during its design, film-makers were excluded from the development of Final Cut X.

What’s worse is the laptop-isation extends to the entire line. Having chosen ATI as the best supplier of GPUs for portables, Apple standardised on ATI as a supplier, and now includes mediocre ATI GPUs in desktops and the Mac Pro. These computers consequently fail to meet the needs of gamers, VR developers, or scientific-computing professionals.

Why Bother at All?

As things stand, there is a significant business risk in choosing to purchase professional software or hardware from Apple rather than from Adobe or PC makers. The best possible computer that one can use for Photoshop or Illustrator is a Surface Studio. Apple sells no computer suitable for professionals in machine-learning or scientific computing.

But so what? Macs make up only 12% of Apple’s revenue, and creative professionals and scientists account for a small minority of that 12% in turn.

The reason is Beats Audio.

Apple, you’ll recall, paid $5bn for a maker of mediocre head-phones with a second-best streaming service. I couldn’t understand it at first, but in the end I could come up with only one reason: the importance of the humanities to Apple’s brand.

The iPod, and iTunes store, completely changed Apple. What once was a niche PC maker, suddenly became a maker of fashionable, electronic life-style accessories. This required that certain sense of cool that a love of music can provide.

I think Apple bought beats to reaffirm its commitment to music, and through it, fashion and lifestyle.

The public, conspicuous use of Apple hardware by professionals in art, publishing, music and science similarly adds to Apple’s brand. In particular, it’s an affirmation of Apple’s reputation for creating best-of-class hardware, and of that hardware’s potential to engender successful professional careers.

Apple is now at risk of losing these users, and by extension, damaging their brand. By upscaling casual computers for prosumers, and blithely forgetting to cater to the particular needs of its professional customers, Apple risks losing its professionals altogether.

If that were to happen, Apple would return to where it was in the nineties: a purveyor of undeniably fashionable computers for people who only need to pretend to work.

And quite possibly some terrible eating habits.↩
Ostensibly the Incredible Hulk writing film reviews, though they occasional break character to suggest they’re a professional screen-writer in Hollywood pretending to be the Incredible Hulk writing film reviews. Then again they may well be a randomer in a basement pretending to be a screen-writer pretending to be the Hulk writing movie reviews. The point is, they write good movie reviews.↩
Of course plugins and forks exist to provide partial OpenCL support for some of these, but you would be foolish to bet your professional career on such unsupported software.↩
In a dismal irony, the Mac Pro, championed as the computer that would reaffirm Apple’s commitment to its professionals, has instead become a monument to their neglect. Not only has Apple not updated the computer, they even forgot to update its product webpage which until the furore this month, had embarrassingly touted how well the the Mac Pro ran Aperture.↩
Apologists snarkily griped that the stylus wasn’t as good as the Pencil on an iPad Pro, but there is no version of Adobe Photoshop and Illustrator for the iPad Pro. In fact due to Apple’s curation of the iOS store there are very few people developing pro apps for iOS at all.↩

9 November, 2016

How Did Donald Do it?

So America’s racist then

Not quite. I think it’s more likely that Americans decided to vote for the guy who promised to make them richer, over the woman who promised to inspire them.

But the poor voted for Hillary

If you look at absolute numbers, yes. However if you look at the trends, you’ll see that 16% percent of voters earning less than $30,000 swapped from Democrat to Republican this election.

But Donald’s plans would make them poorer

This is true. It’s also irrelevant in an environment where even mainstream news prefer to report on the frivolities of the race than the detail of the policies.

What matters is narrative. Donald had a simple proposal: he’d block all imports from abroad, raise tariffs if anyone tried to outsource jobs, and so protect jobs at home.

Then he’d throw out all the immigrants working for nothing, creating more jobs in turn for proper Americans.

But it’s all nonsense

It’s simple and it sounds right, and that’s all that matters.

In particular, this is why people with jobs, particularly union workers in places like Michigan, voted for Trump. It also explains why almost 1-in-3 “Hispanics” voted for Trump as well.

Union workers and legal-immigrants weren’t going to vote for the woman who secretly told Goldman Sachs she wanted free trade and unlimited immigration.

Incidentally, this also explains why Bernie, who also favoured trade-restrictions, tended to better against Trump in national polls than Hillary.

So there’s no room for honesty in politics?

There is, it’s just hard.

The thing is, Hillary never came out and said “this is how I’ll make you richer”. She had no simple stump speech. She focused on how awful Donald and his voters were, and how inspirational and right she was.

Meanwhile Donald said vote for me and I’ll protect your job and make you richer.

This should have been obvious. Bill Clinton always knew it was “the economy stupid”. The example of Mitt Romney is instructive too, he was nearly blown away by Herman Cain’s 9-9-9 plan. It was a terrible plan¹, but Herman was the only one who presented Americans with a simple plan to get them rich, quick. It almost won him the GOP nomination.

But still, who could possibly vote for someone so awful?

Here’s a few numbers about America in 2016

One in seven of Americans think interracial marriage is morally wrong in general, and one in four white Americans would not like a black person marrying a member of their own family.
One in three Americans oppose gay marriage, and in 14 states a majority think it should be illegal
One in ten of Americans say they’d be less likely to vote for a candidate if they were female according to one survey. A different survey commissioned by FiveThirtyEight showed that with the exception of female Democrats, all Americans are more likely to vote for a man than a women.
Michael Bay is America’s most successful film-maker, even though his movies consistently get terrible reviews.

America is quite a diverse place, and its inhabitants are very different to the staff of most mainstream media.

But more than these “deplorables”, there’s people who just don’t care about racism. These are the really dangerous ones, as Edmund Burke eloquently pointed out, and they have always been the majority. Inspiration & justice just don’t feed families the way cash does.

And now we’ve got the world’s most awful president

As bad as Trump is, he’s still much more liberal than the rest of the Republican contenders. For example while he clearly doesn’t respect women, he doesn’t appear to actively resent them: he was the only primary candidate to support Planned Parenthood. Equally while he’s egotistical, he doesn’t believe he’s God’s chosen candidate on earth, like Ted Cruz. And as daft as his economic plan is, so too were many of the other Republicans plans (consider for example Cruz, and Rubio, or what Brownback and Jindal did to Kansas and Louisiana).

Even his support for extremists has precedent: in 2000 George W Bush visited the the Bob Jones University for an endorsement while it still had a ban on interracial relationships on campus, refused to admit gay students and called the Pope an antichrist. It reluctantly dropped the ban shortly after, but replaced overt hostility with covert dog-whistling, and was still visited (and so implicitly endorsed) by John McCain and Mitt Romney subsequently.

In short, Donald Trump comes from the liberal wing of the Republican party, and his views are very much in tune with a good chunk of those of the US electorate.

So I should be reassured?

If anything no: it means the senate, congress and the states – all of which are under GOP control – are unlikely to check Donald’s worst instincts.

Gee, thanks

Yeah, I feel great too…

So let me get this right, you’re saying it’s poor people’s fault

Not quite. Donald did win the far (“alt”) right, gamer-gate vote too.

Trump won voters earning more than $100,000, Hillary won those under $50,000 (the median salary in the US is $55,000). Proper analysis of his primary successes and campaign in the run-up to the election showed that Donald Trump consistently won the vote of prosperous white people, not the working or unemployed poor. Generally the more negative their view of women and minorities, the more voters preferred Trump.

So it’s the fault of union workers and rich racists?

Also people who didn’t bother to vote, roughly 50% of American voters.

So it’s the fault of poor people, racists, and layabouts?

It’s Hillary’s fault too.

Hillary?! But everything she planned would have made people richer!

True, but politics isn’t about being right; neither is it about convincing other people you’re right; politics is about convincing other people that the right thing is worth doing.

As good as she is at policy, Hillary is terrible at politics. It was no accident she trailed not only Barack Obama, but John Edwards, in the 2008 primaries.

She absolutely failed to mobilise votes: if you look at the figures, the number of GOP voters has stayed constant the last three elections, all that happened this election is the number of Democrat voters fell. Voters either stayed at home or voted for third parties.

Hillary could easily have rectified this by putting Elizabeth Warren or Bernie Sanders on the ticket beside her: instead she chose an unthreatening, uninspirational nonentity.

What’s more, like Mitt Romney, she was two-faced. She got caught admitting to it in her Goldman Sachs speech, with her discourse on the need for public and private personas. Her speeches to Goldman Sachs didn’t match her speeches on the stump, and were as fatal to her as Romney’s 47-percent speech was to him.

People thought she was two-faced?

They thought a lot worse. During a trip through Nevada, one of the Economist’s reporters found that Trump supporters believed absolutely in things that were absolutely false (Cached version).

In the modern world the gatekeepers of news have been overwhelmed by a huge number of retailers selling news on demand. Consumers have the opportunity to shop around until they find the news that feels right, in short that confirms their biases. The Economist reporter found most Trump voters were finding their preferred “news” on Facebook.

The state of modern journalism is dire. Good reporters who do the research simply don’t generate content fast enough to get ad-clicks: the money is in being first, not correct. In the modern world the audience forgives errors: worse, it forgets them.

Accusations stick in a way exonerations can never clean away.

The result is an electorate that has never been more misinformed.

So what do we learn from all this?

The lesson of Herman Cain, of Mitt Romney, of Brexit and of Donald Trump, is three-fold.

First the majority of voters don’t care about inspiration: they will vote for the person who most plausibly promises to make them richer.

Secondly many, if not most, people believe that opposing globablisation is a surefire way to get rich.

Thirdly, everyone believes politicians are corrupt and duplicitous, and the longer one stays in politics, the worse one is.

Left-wingers therefore need to find a way of convincing voters that they can make people richer, and present it in a positive way². Bernie’s approach seemed to resonate, and so provides a good template.

They also need to find a way to sell the electorate on globalisation, which is a hard one, as the negatives (your job is going to China) seem more obvious than the positives (but look how cheap my iPhone is!).

Finally, counterintuitively, they need to choose candidates with minimal experience for leadership. People prefer fresh faces.

Hillary should just have said that if you earn less than $80,000 a year, I’ll make you richer and Donald will make you poorer. So see how much you earn and vote accordingly. Everything else – racism, anti-semitism, feminism – was meaningless to the majority of voters.

So what’s going to happen

It’s going to be a bad four years. Americans will end up poorer. Russian and China will probably win some geopolitical battles as America deliberately enfeebles itself abroad. Trade will worsen, and so will the US deficit.

In Europe, Trumpism is already well underway. Poland and Hungary are led by increasingly autocratic governments robbing their citizens’ rights and oppressing women, gays and minorities. Marine Le Pen will win a substantial vote in France, and Angela Merkel is under risk from the Pergida movement in Germany. Britain is fatally wounded by the Brexit vote, and in the race between political expediency and national prosperity, it appears politics is winning (I’m a UK resident).

The world in four years will be a smaller, poorer, and more vulnerable place.

And then people will see what’s wrong?

In Britain left-wing politics are lost in the wilderness. Meanwhile right-wing parties are well organised. The same is true in the USA.

Even if people see what’s wrong, there may not be any good alternatives.

Um, so, give up then?

For the last 25 years governments have been dominated by career politicians, people with no normal life experience, whose parties and careers have been funded by minorities of people with an excess of free-time and extreme views, both on the right and the left (the latter concentrated in unions).

The only way to change that is for “ordinary” people with ordinary careers to get more involved in politics than ever they would want.

Basically, we all need to join a political party and change it.

By lowering taxes and entitlements, it would have reduced the amount of money in the pockets of the poorer half of America, and increased borrowing, to pay for a tax break for the richer half of America, who already have a greater share of the nation’s wealth than at any time in the last fifty years ↩
People’s reaction to being called a deplorable isn’t to consider their life-choices, it’s to shout got-to-hell at the judgmental asshole who made the accusation. You can only change someone’s mind once you’ve befriended them.↩

19 May, 2016

Associated Types and Haskell

Or how to write a type-class containing a function returning a value of another related type-class.

Lets assume you’re using type-classes extensively to make your business logic independent of your data-representation. Some may argue whether this is a good idea, but that’s not relevant to what follows. This article explains why you’ll encounter a bug, and how to use the TypeFamilies extension to work around it.

Assume you have a type-class that defines a few functions:

type Position = (Int, Int)

class Player p where
  playerPosition :: p -> Position
  playerMoveTo :: Position -> p -> p
  -- etc. and lots more

Now lets say this is all part of your game state, which you also define via a type-class

class GameState g where
  getPlayer :: (Player p) => g -> p
  getMonsterPositions :: g -> [Position]

So far, so good. All of this compiles. Next we create the implementations:

data PlayerData = PlayerData { _pos :: Position, ... }
instance Player PlayerData where
  playerPosition = _pos
  playerMoveTo pos player = player { _pos = pos }
  -- etc..

data GameStateData = GameStateData PlayerData [Position]
instance GameState GameStateData where
  getPlayer           (GameStateData p _) = p
  getMonsterPositions (GameStateData _ mPoses) = mPoses

We want use this to write a wholly generic function like follows

checkForCollisions :: GameState s => s -> [Position] -> Bool
checkForCollisions s ps =
  let
    p    = getPlayer s
    pPos = playerPosition p
  in
  pPos `elem` ps

The problem is that none of this compiles!

The implementation of GameState for GameStateData won’t compile, you’ll get an error of Could not deduce (p ~ PlayerData).
The code for checkForCollisions won’t compile, with a similar error: Could not deduce (Player p0) arising from a use of ‘getPlayer’

Polymorphism vs Monomorphism

To someone coming from the imperative object-orientated world, this seems mysterious, as one could trivially achieve the same effect in imperative OOP languages using interfaces.

The issue is the difference between polymorphism and monomorphism.

Consider the following Java code

interace MyInterface {
  public int someFunction();
}

public static MyInterface genericFunc(int a) { ... }

public static void int callGenericFunc() {
  MyInterface mi = genericFunc(42);
  return mi.someFunction();
}

What we are saying is that

There is a family of types that implement MyInterface
genericFunc will return a value of a single, specific type in that family[1]
callGenericFunc must be able to handle a value of any type in the MyInterface family, regardless of the underlying representation (i.e. it must be polymorphic)
It is genericFunc that chooses the particular type.

The following Haskell code looks very similar:

class MyTypeClass t where
  someFunction :: t -> Int

genericFunc :: (MyTypeClass t) => Int -> t
genericFunc = ...

callGenericFunc :: Int
callGenericFunc =
  let mt = genericFunc 42 in
  someFunction mt

In the Haskell version, we are saying that

There is a family of types that implement MyTypeClass
genericFunc can return a value of any type in that family.
Whereas the Java version returned a value of a single specific type.
callGenericFunc will only work on one specific type in that family.
Whereas the Java version worked on values of any type in the family
It is callGenericFunc which decides which type genericFunc should return.
Whereas the Java version had genericFunc make the decision

So while the Java compiler renders this generic code polymorphic, by adapting callGenericFunc to work with any value in the MyInterface family, Haskell makes the code monomorphic by choosing a single specific type in the MyTypeClass family and generating variants of genericFunc and callGenericFunc which work on that type.

There are a few advantages to this. On a machine level, forcing everything to a single concrete type allows for static dispatch and therefore function-inlining which is a performance optimisation[2]. This is is why you see monomorphism appearing in recent ML-derivatives like Swift and Rust.

The second is that it means the typeclass code you write is incredibly generic, and can work in whichever way the caller requires.

However in order for monomorphism to work, the compiler needs to be able to identify the particular type that callGenericFunc will use.

Associated Types to the Rescue

If we look at our example again, we can see the problem

checkForCollisions :: GameState s => s -> [Position] -> Bool
checkForCollisions s ps =
  let
    p    = getPlayer s
    pPos = playerPosition p
  in
  pPos `elem` ps

GameState is a generic typeclass, so the compiler will inspect the code that calls checkForCollisions to choose the specific implementation.

Once it’s chosen an implementation in the GameState family, the typechecker looks at checkForCollision and sees getPlayer returns a value of another generic typeclass Player.

Remember it’s not the implementation of GameState that must determine the type, it’s checkForCollisions, so that’s where the type-checker looks.

Unfortunately, all the code in checkForCollisions is completely generic, so it can’t choose a single concrete type: hence Could not deduce (Player p0).

The solution to this to allow the implementation of GameState to additionally specify the particular type in the Player family to use.

To this this we use the TypeFamilies extension.

First we alter our type-class to add a type placeholder called PlayerType

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}

class Player (PlayerType s) => GameState s where
  type PlayerType s :: *
  getPlayer :: s -> PlayerType s
  getMonsterPositions :: s -> [Position]

Essentially PlayerType is a variable that contains a type rather than a value. Consequently it’s annotated with a kind (recall, that the “type of a type” is called a kind). In this case the single asterisk means that this should be a concrete type.

Associated types must be tied (i.e. associated with) the type defined in the type class, which is why it’s PlayerType s and not just PlayerType.

However we can still constrain the associated type to be in our Player type-class as you can see. We need the FlexibleContexts extension in order to perform this constraint however.

You can have as many associated types as you want by the way, I’ve just used one in this example for simplicity.

The next step is to assign a particular concrete type in our implementation:

{-# LANGUAGE TypeFamilies #-}

instance GameState GameStateData where
  type PlayerType GameStateData = PlayerData
  getPlayer           (GameStateData p _) = p
  getMonsterPositions (GameStateData _ mPoses) = mPoses

This then solves our problem. The code that called checkForCollisions has already chosen the particular type in the GameState family, and in this example lets assume the type is GameStateData.

The compiler next looks at checkForCollisions, but now it knows from the GameStateData implementation that the associated type of Player used for getPlayer is PlayerData. Hence the code type-checks, and the compiler has the information it needs to monomorphise it.

And we’ve managed to do this while keeping checkForCollisions completely generic.

Final Thoughts

This only really rears it’s head once you start making extensive use of type-classes. Since defining and altering types is so easy in Haskell, you can argue that there’s no need for typeclasses. In fact, since abstraction bulks out code, and so can make it harder to read, there’s an argument to be made against the use of typeclasses for abstraction.

However there are many Haskellers that use typeclasses as a way to write code their monadic code in an “effectful” style (emulating the effects features in pure functional languages such as PureScript) and it’s here that they can run into issues, as I did.

In my case, I had a function like

goto :: (HasLevel r, MonadReader r m, HasPlayer p, MonadState p s)
     => Direction -> m MoveResult

And in this case I’d defined HasLevel to return a value in a Level type-class so the game engine could work equally well with different sorts of Level implementations. As it turned out, in the end, I only had the one implementation, so this was an unnecessary and premature abstraction.

In short, I wouldn’t encourage developers to use this on a regular basis. It is a useful trick to know, however, particularly since it’s begun to appear in other, more mainstream ML-derivatives like Swift and Rust.

1. This is not strictly true, you can have an if-statement in Java code which will return one type in one branch, and another type in another branch. The point being the Java code can only return values from a small subset of types in the MyInterface family whereas the Haskell code can return a value of any of the types in the MyTypeClass family.
2. Polymorphic code looks up a table for every function call, then calls the function. Static dispatch calls the function directly, without a run-time lookup and so is faster. Inlining skips the function call overhead entirely, copying the function body into the calling code, and so is faster still. However as this copying makes the overall size of your code greater, it can overflow the cache, which will make your code run much much slower. As a result, inlining is not an automatic win.