Thin and Graphy
Normal disclaimer – this is not GraphQL. This is me making stuff up based on a 30 minute conference talk and a gist – and now a tweet!
I spent 10 days working on an implementation of GraphQL queries using Prismatic Graph. It didn’t really work out through no fault of the library but due to my misuse of the library. I started to write up what I had done and found myself writing pseudo code to explain the actual working code and that seemed like a bad sign. “Pretend this code looked like this much more obvious and explainable code – thats what it does”.
Then Nick Schrock posted a screenshot of a GraphQL mapping file with the comment:
Key takeaway is thinness of this file
This turned my doubt of whether my current attempt at an implementation was good into outright disgust at the code I had written.
So I threw it away and started over.
One of the things I struggled with is separating API’s. For my example I’m using the Github API which is an HTTP / REST API. I’m going to write some amount of code which is a client of this API and then I’m going to write some amount of code which is a GraphQL mapping onto that client code. I kept letting these two things mix. GraphQL was being responsible for too much – it was not thin.
If you look at the mapping file you’ll see the code:
This refers to the already existing and working internal Facebook API. It can live fully on its own oblivious to the existance of GraphQL. At least that is what I tell myself.
Then you have the field definitions:
This maps what GraphQL exposes externally – name – to the internal representation – getName. It doesn’t care if that works over HTTP or SQL or Mongo or a CSV file. That is the job of EntBootcampPet
.
Again, just a guess.
This keeps it thin – it just needs to know how to get from point A to point B.
Instaparse!
I had previously put off writing a parser. I don’t know how to write a parser. I don’t know what a Context Free Grammer is or anything of that sort. But I decided to take a look at Instaparse over lunch last week. Instaparse is a library by Mark Engelberg used to create parsers in Clojure(Script).
It is amazing.
I love it.
I followed along with the readme during lunch – about 1 hour. The readme is great. Everytime I got through a section and thought up a question the next section immediately answered.
I got home and within the hour I was able to turn this:
Nodes(1, 2) { name, id, link(desktop), birthday { month, year }, friends.after(20).first(10) as amigos { count, edges { cursor, node { name, location } } } }
into this:
The library is truly awesome. It also works on ClojureScript so I can eventually do those cool Relay demo things and merge queries and stuff on the clientside. Booyah.
For full disclosure here is the syntax I came up with:
My tip to aspiring parser writers – when it doesn’t work keep adding <whitespace>
until it does.
Graphs
One of the nice things about my previous implementation (even though it sucked) was that it had an honest to goodness graph with nodes and edges. I wanted to keep that idea. I found this writeup behind the library: rhizome by Zach Tellman very enlightening around representing graphs using immutable data structures in Clojure.
I started with the following:
And ended up with this:
But really, they are just the same thing. A type is a ‘node’ and a field is an ‘edge’ between two nodes (types) with an executor function. I should really go back and make the ‘node’ and ‘edge’ names ‘type’ and ‘field’ … but it really helped me start off thinking about them in graph terms.
The equivalent of my phpType
is just a clojure map of the organization response from the Github API.
Also, as a reminder, keywords are functions in clojure which makes simple getters nice to write.
At this point I have a parsed query tree and a graph of types and fields with each field having an executor function. I now want to do a joint traversal of the query tree and graph executing each field on the way.
I have hard coded the root call but an example traversal would look like the following:
expand
is the function which does the walk (maybe walk is a better name … I suck at naming). For the graph our starting point is an edge from name to Organization with the executor function name->organization
. The starting point for the query is the root of the parsed query tree.
This ends up being the meat of the implementation. The actual code isn’t that important – it is just worth noting that the joint walk is very little code. This handles scalars, complex objects, one to many’s, fields which take arguments (e.g. format(YYYY-MM-dd)
), etc.
Execute
The execute method is kind of interesting. When we parse a query like “created_at { format(YYYY-MM-dd) }” we create a field which has arguments attached to it.
Execute then applies these arguments to the edge’s executor.
The executor in this case is a function which takes two arguments. We use the clj-time library to turn the datetime into a string with the provided format.
The first argument is the value generated while running the query – for example the organization’s created_at field – while the second argument is the one parsed out of the query itself – the “YYYY-MM-dd” format string.
These query arguments are also validated and coerced via a schema. The schema for format
happens to be a single string. If instead it was an integer the parse-args
function would have noticed this executor expects an integer and converted the string representation returned by the query parser into an integer or thrown an exception if it was not convertable. This is done via Prismatic Schema Coercion.
For whatever reason these function fields also make me think about internationalization. I mean, I think the entirety of GraphQL would be great for internationalization. It kind of sucks having to duplicate code on the server and client and then you don’t want to send every language file for your entire application to the client. If you take the Relay type approach you can have each component declare which language keys it depends on – even those which might take parameters. You can then query a language key node via GraphQL and get back just the translated text you need in the language you care about – and no more.
One to Many
For one to many I don’t do anything special – I just make a few more nodes.
The only real difference is that the field :edges
is marked with a cardinality of :many
.
Filters are not implemented. E.g. friends.first(10)
or friends.after(12345)
. There is nothing stopping this implementation from doing filters – just add them to the executor function. I just have some outstanding questions I need to think about on how I want to do filters.
Basically I am curious around how special to treat first
and after
. It seems like making these two special cases would make my life way easier.
Enter Manifold
In order to be cool nowadays you can’t block. The version of expand
I showed above is actually not what I use anymore – it was blocking. Each executor that made an http call would block the traversal until the call finished.
Instead I now use the Manifold library which has a Deferred implementation (think javascript promises). I also use Aleph for making non-blocking HTTP requests which return a Manifold Deferred. Both of these are by Zach Tellman – a gentlemen and a scholar I do say.
Since we are traversing a tree we can just call each branch in parallel all the way down. Manifold has this nice let-flow
construct which turns callback code into synchronous looking code (think await or go blocks) … but I didn’t use it because I’m lazy. So instead the part of expand that calls each branch in parallel looks like:
We call expand on each field and it returns a deferred immediately. We then zip them together (like a javascript when(promise1, promise2, promise3, ...)
) and turn them into a map. This function also returns a deferred.
Deferreds all the way down.
Middleware
One other notable thing in the non-blocking version is that we have two new parameters – executer and path. Since we have a single place where we call each edge’s execute method it makes it easy to wrap this in middleware. That is what the ‘executer’ function is – middleware passed through the graph traversal. Path then is just something which tracks how deep in the tree traversal you are – it is purely for logging / diagnostic purposes.
This allows the creation of a middleware function which profiles each execute method along with the path.
When we run a graph we get back the profiled execution.
As you can see each organization along with their count is fetched in parallel. They all serially wait on the root fetch of modocache
because they depend on the user response.
Putting it Together
Another huge inspiration for all of this was Robert Mosolgo’s work on GraphQL. He has an amazing implementation up and running where you can type in queries and explore the schema using the __type__
field. It is so awesome.
I wanted to try something like that so I made this guy. It isn’t as cool but what can you do! Very few fields and roots work and so I wouldn’t expect anything beyond the initial demo query to return anything. Just make sure you scroll down and see the profile graph – it is my favorite part.
The implementation right now is basically my thoughts as they came to me mashed into emacs. There are a lot of names I want to change and protocols I want to add to clean up the execution.
When I first started writing Clojure I felt like I couldn’t really just sit down and type code – I had to think up front on what I wanted to do. I justified this as forcing good design and Hammocks and stuff … but this past week has basically been me at a 1000 line clojure file with a REPL exploring ideas with horrible, horrible code. I don’t know if this signifies progress or regression in my Clojure abilities …
Exploring the schema with the __type__
construct is so powerful. I immediately started using it on Robert’s demo to see what type of queries I could write. I need to add that.
I had previously wondered how Facebook handled Node
being their root call for damn near everything. Then while writing this post I noticed the fields
on their GraphQL definition is a function – not static. I wonder if that is how – they can’t statically tell which fields a Node might have in their graph but as they expand / execute the graph they can build it out using the fields
function. That wouldn’t work for me – the fields and types are defined up front. Food for thought.
So yah, another week, another stab at GraphQL. Good times.