Search APIs with HTTP interfaces

I had a brief exchange with Erik Wilde or @dret on twitter about REST and query languages; specifically SPARQL and whether the SPARQL DESCRIBE could be made RESTful with GET and PUT.

SPARQL DESCRIBE is a somewhat controversial feature of SPARQL that returns an RDF graph “around” the query, including all the referenced URIs and some sort of domain specific context. It is less well defined than CONSTRUCT, that also extracts a graph.

But there are not many guidelines for how to actually write useful interfaces for search type functionality for REST web service APIs.

A lot of this applies to other collection type interfaces, collection in the Atom sense, which is essentially a predefined filter rather than one you can submit a query to. Resource collections are a fundamental feature of any real world system, being able to take a set of resources and manipulate it as one. Query languages are generally an efficient way of defining interesting collections.

I do not specifically refer to SPARQL here. Like many of the W3C recommendations, it is XML document oriented in its specification, not HTTP oriented, and so does not try to address the types of issues here. I have some issues with the W3C approach, sidestepping implementation issues; indeed the whole RDF as Prolog wrapped in XML does not seem very productive, and I think a more pragmatic approach from practise will work better, as with some of the other W3C initiatives. Items like DESCRIBE imply much more domain knowledge, and it is unclear what interoperability would mean.

Whats the problem with query languages and REST?

The main issues are that a SQL style SELECT should map to an HTTP GET, but instead we are passing a query with another verb in it. One method of avoiding this would be to work with predefined refinement “queries” where you could walk through paths of filters from a resource that represents everything. Imagine a faceted navigation system drilling down to construct what becomes a filter query through a selection of refine links (where you might end up with a hierarchical URL in the end not a query document). This might work better in a system with a finite vocabulary, although it could still be navigated via forms potentially. However this starts to work less well with complex nested queries with sorting and limits, and subterms, as the number of opotions is large, and it would be complex to make a comprehensive set of link types to make them understandable (you might just end up tokenizing SQL in links which does not seem elegant).

REST search and GET queries

First off, for standard read only GET queries, there is one model that is really useful and is too rarely used. For some reason most search queries in “REST” frameworks involve submitting a query as a POST request because the query string would be too long for a GET request. This however generally does not create a resource, it just returns a response, or a temporary response resource. CMIS for example works along these lines. However the sane model along these lines is to submit a query document with POST (or PUT) and have it persistent until DELETEd, and be able to requery as needed. Also, it should be possible to create query resources with unbound variables, which can be used by a system like prepared queries in SQL databases; in particular the creation of these should allow a system to work out perhaps if it needs additional indexes based on the queries that have been created. These can then be evaluated by using query strings to bind the parameters. For some reason you rarely if ever see a query specification for a REST API that has unbound variables in the query, thus missing out on the ability to use this pattern.

Note that the query itself is a resource, so we can for example DELETE the prepared query eg http://example.com/query. There are also resources corresponding to the result sets eg http://example/com/query?q=red.

As well as unbound variables for constants, another missing feature from most query languages (probably because they were not designed for HTTP transport) is the lack of ability to specify a URL that corresponds to a subexpression. This can be considered as another extension of unbound variables; you can also see this as the extension of the labguage to support SQL style views that can themselves be queried. Like a view, you could either use the definition or the current materialized version for performing further queries.

REST and update queries

Although it rarely seems to be done, there is no particular reason that once you have a query resource like the ones defined above, you should not be able to DELETE http://example/com/query?q=red to delete all red objects, or PUT a new set of red objects to replace the existing ones. Most collection oriented protocols, such as Atom, do not allow PUT and DELETE methods on collections, only on the member resources. POST is allowed in Atom, to define collection membership; this only makes sense where collection membership cannot be defined by visible properties of a resource; the instruction create this item with whatever properties are needed to put this item in this query result is not necessarily something that works for any domain. Certainly one can think of models in which allowing PUT and DELETE on search results resources would be useful, although there might be issues with paging and large document sizes on some types of resource. PATCH would be a useful addition for doing the equivalent of SQL UPDATE, in order to reduce the document sizes, rather than having to PUT back everything.

There may be other domain issues, and in some systems consistency issues with mass deletion and insertion that mean that actions on individual items not collections are more efficient of course, but extending the uniform HTTP interface to collections should certainly not be ruled out for many domains.

1 Trackbacks

You can leave a trackback using this URL: http://blog.technologyofcontent.com/2009/11/search-apis-with-http-interfaces/trackback/

  1. By erik wilde - StartTags.com on January 26, 2010 at 05:13

    [...] in improving the way the US Federal Government provides transparent access to its data assets. …Search APIs with HTTP interfaces Technology of ContentI had a brief exchange with Erik Wilde or @dret on twitter about REST and query languages; [...]

One Comment

  1. “Most collection oriented protocols, such as Atom, do not allow PUT and DELETE methods on collections, only on the member resources.”

    AtomPub just does not define the behavior. The server is free to use DELETE and PUT for collection resources.

    Posted November 28, 2009 at 23:31 | Permalink

Post a Comment

Your email is never shared. Required fields are marked *

*
*