Modeling Queries in a GraphQL Like Way
Just to be clear if someone reads this from the future … this is not GraphQL. This is me making stuff up based on a 30 minute conference talk and a gist.
Also, hello future person, thanks for stopping by.
I gave my thoughts on GraphQL previously as well as the starting of a simple server side query implementation (which is but a small, small fraction of GraphQL). I’d like to share the current status of that implementation which includes schemas, validations, and one-to-many’s.
Schemas, Schemas, Schemas
My week consisted of going to work, being heavily distracted thinking about GraphQL, feeling I had solved the problem I was facing the night before, then getting home and realizing that it wouldn’t work at all.
For my initial attempt I was using Datomic as the datastore but this ended up being a mistake – Datomic is too easy. Want to walk relationships on your data? Just walk it like a normal clojure data structure – Datomic will lazily do the rest. But I want the same code to be able to walk an HTTP API or query a database, etc. I needed something generic.
But even Datomic had issues – when the value of an attribute is a sequence is it something I want to turn into a collection object (e.g. friends.first(10)
) or just a component or primitive list that we include directly? Basically I was relying on the shape of the data to create the shape of the response.
Then I saw a tweet – a beautiful, beautiful tweet – from Stuart Halloway:
s/schema/power, ergo schemaless = powerless
It hit me. I needed schemas. Lots of them!
This was actually from December 11th, 2014 which Stuart Sierra seemingly randomly retweeted last week. Timely.
No need to guess on the shape of the data, just make a schema.
So the first schema is the root schema. This is the shape of your viewer
or node(id)
root call. In my case I decided to use the Github API because it has a lot of public data and is a pretty good representation of an HTTP API. The one root I’m supporting is organization(name)
.
Oh, I didn’t mention, I’m using Prismatic Schema for the schemas.
For now we only represent four resources and just a small subset of the fields on those resources. These resources are also lacking relationships. Lets add them now.
So this is not the prettiest thing … but it works thanks to Prismatic Schemas just being data. There are two reasons we add relationships at the end. The first is that while the schemas can handle recursive definitions I have no desire to deal with that. The second, and more important reason, is that I actually want the ability to only include a subset of the relationships per root call. For instance an author(:name)
root call might have a list of commits, but for the organization(:name)
the author is going to be a leaf (otherwise you could have an infinite graph).
There’s also a call to collection-object
for repositories
. This is a helper function which takes a map of filters, a cursor schema, and a node schema and produces a collection object schema (GraphQL represents one-to-many’s via collection objects).
We are left with this schema:
This schema represents all the possible fields and relationships that are exposed by the organization root call. This is nice because we can publish the schema for potential users as well as validate incoming queries.
Speaking of queries – they are also modeled as a clojure data structure.
This is pretty ugly compared to the string query (those nils …) but we can validate and transform the query into the actual shape of our data using the organization root schema defined above.
graph/build
takes the root schema, OrganizationRoot
, and query structure, query
, and produces an output schema or throws an exception on an invalid query.
We’re now left with a query as a schema which we want to execute. Handy!
Executors
In the Unofficial Relay FAQ it is mentioned that GraphQL ‘traverses the nodes evaluating an executor which uses the definitions from the schema to retrieve objects’. So we have the concept of an executor. Lets steal that.
An executor takes a schema and returns data which satisfies that schema. A simple example would be for a leaf node in our graph – the author of a commit. In order to understand the task at hand we need to look at the shape of a commit and author from the Github API.
I’ve limited the json to just the fields we care about.
As you can see, the commit contains much of the information about the author. Github calls these summaries and if you’ve made an HTTP API you’ve probably done the same thing. For example a use case comes up for showing the commit author’s name but we don’t want to make a request for every commit – so lets just throw the author’s name in the commit. Oh, email too. And login, and … you get the idea.
This is what we do and this is what GraphQL can eliminate.
Instead we have our own public schema.
Its really nice having this schema separated out. The client can model the data without needing worry about the shape our internal data was forced into due to technical requirements. So instead of having to know which author fields are on a commit and which are on an author (… which I really should have called user but I’m an idiot) we expose them all on author and hide whether or not a separate HTTP request was required on the server backend.
This is the job of the executors.
So this code is a bit rough – work in progress I say!
Lets start with the second executor: LatestCommitExecutor
. This executor requires a url (in this case “https://api.github.com/repos/facebook/react/commits{/sha}”) and when executed fetches that url and pulls the first commit from the result. It then associates a second executor, CommitAuthorExecutor
, on to the result under the key :author
.
I don’t want to get too much into the details, but when walking the graph if an ‘IExecutor’ is encountered as the value of a field it is executed with the expected schema.
So in our above case, if the schema for the commit doesn’t include any information about the author we’ll never execute the CommitAuthorExecutor
executor.
The CommitAuthorExecutor
is neat in that it may or may not make an HTTP request. It checks if the data it has from the commit is enough to satisfy the query schema. If it is, no HTTP request is made. If it is not, we fetch the full author resource from Github. In our case the only field which forces an HTTP request is location.
This pattern is how I model the whole server side query execution. We start with a root executor, in this case an OrganizationRootExecutor
and just walk the desired output schema calling any executors we meet along the way. Only as much work is done as is minimally required.
One-to-Many
One last example of an executor is handling a one-to-many. In the current graph we only have one of those – Organizations to many Repositories.
This code does three main things:
- produce the shape of a collection object – count, edges, cursors, and nodes
- filter the collection with the supplied filters (left out above, but shown below)
- setup each node with an executor
We are making use of the lazyness of sequences here. All of the work to create the edges isn’t done until the first item is taken from the sequence. And then any work to create the individual repositories isn’t done until the node is walked.
lazy-resources
is kind of cool. You give it a url and it will page through the results on demand – but again, not until you take the first item from the sequence.
This makes implementing the filters trivial. First is a simple take
and after is a drop-while
.
This does have limits – the cursor is very dumb, just an id. A smarter cursor could include a page number at which point we wouldn’t want to start at the first page.
The full filtering code is just a reduce over the filters.
One drawback of the way I model filters in the query is that they are a map – we lose the call order. Instead the call order is specified in the executor (e.g. [:after :first]
). I’m not sure if that is going to end up limiting things.
Another point is that each collection specifies which filters it supports. An example from Nick Schrock is a birthday_in_range
filter on the friends
one-to-many. Clearly not every one-to-many will support birthday_in_range
so the schema needs to be able to specify which filters are valid.
Our Repositories schema expands to:
Right there in the schema we say which filters are supported (first and after) as well as the arguments expected. If we want to add a new filter, we can.
Sample Request
For now I have a hard coded query – I need to write the client side of the story in order to generate queries on demand plus I don’t want to piss off Github since it isn’t hard to make a query that would trigger 100’s of HTTP requests. The result can be viewed in this gist or hitting a heroku end point which I don’t know how long I’ll keep up. The code is on github but don’t expect much – my understanding of GraphQL changes every day and this is the first non-trivial thing I’ve done with Clojure.
Looking Forward
So this executor thing works but I could see writing them getting tedious. I mean, I already hate it and I’ve written like five. They are also opaque – these function calls which might call other function calls which might result in HTTP requests but none of that is visible. What I’d like to do is represent executors as edges between schemas.
The first thing to realize is that the Github HTTP response for a commit is not the Commit
schema of your API. Instead you want to model it as its own personal schema – CommitResponse
. You might also want to massage the JSON into a more fitting shape which could be a new schema – CommitResource
. So you can end up with an edge between CommitResponse
and CommitResource
which is the result of applying a transform function (executor?).
An incoming query has its own schema which is strictly a subset of the full Commit
schema (the public one of your API). I want to be able to model the steps required to fulfill the data for that schema.
We end up with a graph.
Our starting point will be a query schema – which is a subset of Commit
– and a commit root call Commit("sha")
which has a schema CommitRoot
. An edge has been defined between CommitRoot
and CommitResource
. In this case executing that edge is going to make an HTTP response. There is then another edge to go between CommitResource
to Commit
which purely transforms the data between representations.
At this point I’d love to be able to say (plan CommitRoot query)
and have something come out – I’m not sure exactly what. This would now make my execution of a query representable as data. That seems much better than having it be an opaque function.
This should sound familiar – it is the same motivation around Prismatic Graph. I need to think harder about whether this can just be thrown into that library.
Another benefit of query as data is it opens up the possibilities of middleware. I’d love to execute a query and then get information about how long each executor took, how many http requests were made, etc. That would be sweet.
So ya, I have another week of being distracted at work to look forward to. Thanks Facebook!