Unusually, there has been a significant change to the HTTP protocol this week. The PATCH method was approved by the IETF.
This is a big change as one of the parts of the HTTP model is the small “uniform interface”, where there are very few things you can do to web resources. GET is the most common, to retrieve a resource representation. Then there is PUT to update a resource, and DELETE to delete it. Then there is POST, which tends to cover everything else you might want to do. The problem with that is that discovering the interface for POST is difficult, as is knowing exactly what it will do. (There are a few other verbs too).
PATCH is much more straightforward. PUT updates an entire resource with a new version, while PATCH just makes an amendment to a resource. For some types of resource, the entire resource may be large, so that just sending differences will save bandwidth. Also, sending the full resource may unnecessarily make the changes sequential, for example append operations where the order of the operations is not significant. One example given is a log file, where many processes may be adding entries, and if they had to retrieve the whole log, append a new entry and write it back there would be a lot of extra traffic, and a chance of either lost updates or processes having to retry if the resource was modified during this process. Clearly a PATCH operation here that does an append would make sense. I am not sure that is actually a very good example though, as you would almost certainly create a resource for each log entry, rather than one for the whole lot, but clearly other similar patterns exist.
HTTP is not a filesystem
When it was new, people tended to treat HTTP like a filesystem. After all that was the common model for storage, and web servers generally stored web pages as files, so they tended to e treated much like that, with filename extensions annd index files, and WEBDAV was created to try to make the web usable as a filesystem protocol. This model does not really work very well however, as it does not model the things you can and can’t do with HTTP. The methods are one example; updating entire resources at once means they tend to be small units, rather than, for example, log files. File systems generally struggle to store millions of tiny files without wasting a lot of space, and without becoming slower. The web resource does not have to support full Unix filesystem semantics (a topic that oddly Wikipedia seems to be missing an entry on! May have to rectify that), and supports a much simpler updte model.
Maybe the easist way of thinking about HTTP is to see every URL is a small Model View Controller (MVC) system. The model is an abstract resource, which we can see through GET, which retieves views. There can be multiple views, as a request can ask for different media types and encodings for a single resource. The controllers are the media types supported by PUT, which are usually the same as those for GET, but need not be; because both the view and the controller need to represent the whole resource state, they do tend to be quite complex representations, such as XML documents. PATCH however is also a controller, but a more interesting one in many ways, as it can send just state changes to the model, which tends to be how many MVC systems work.
Another thing that PATCH enables is resources that hide some of their state. A resource could only support PATCH and not PUT so that state modifications were only changes. If the state returned by GET is not the complete state, the resource could hide parts of the model. An example could be a voting method that records but does not reveal who has voted, only returning totals, which accepts votes as PATCH requests.
Server side scripting
One PATCH format that makes a lot of sense is actually to use executable code, rather than say diff files. There is no reason why you should not send the server a PATCH request that is some Javascript to modify the DOM of an HTML resource which can be executed serverside, or an XSL transform to modify and XML object. Sending code is an efficient way of making changes to a resource, and can be executed in a sandbox like the browser sandbox. This will be another driving factor for Javascript on the server side, as it is well suited for embedding like this, and already has a DOM model for transforms.
All these changes take us further away from the filesystem model. Web resources will more and more combine some storage with some computation, including ability to execute code in a contolled way. Smart resources will become more common, over dumb storage only resources.
In other news
Next in line for HTTP, hopefully, is the Link header which adds a new header for legacy document formats that do not include a native hyperlinking capability. This will allow relationships between these documents to be included in the retrieved resource, such as a link to metadata or other related resources. The HTTP replacement for the file system model is getting serious.
3 Trackbacks
You can leave a trackback using this URL: http://blog.technologyofcontent.com/2009/12/smart-resources-or-why-you-should-care-about-http-patch/trackback/
[...] This post was mentioned on Twitter by Richard Hulse, Justin Cormack. Justin Cormack said: Smart resources, or why you should care about HTTP PATCH http://bit.ly/5X644P [...]
Social comments and analytics for this post…
This post was mentioned on Twitter by justincormack: Smart resources, or why you should care about HTTP PATCH http://bit.ly/5X644P...
[...] work if your controller for the resource being mapped isn't named using standard conventions? …Smart resources, or why you should care about HTTP PATCH …The controllers are the media types supported by PUT, which are usually the same as those for GET, [...]
4 Comments
Heya! Just read this, and to begin with id just like to say that i completely agree, PATCH is needed as a HTTP method. Here are my thoughts on REST and HTTP
To start with, your example is pretty poor! (i think u know that
) You would and should use POST (which is “append”, “echo >> /resource”). I can not see why you would keep the resource in two places, but if u really must then u can use the If-Modified-Since header when doing an update (GET) =].
Again, your right =] Many people seem to see resources and collections as “files” and “folders”. Which is a terrible thing! Looking back at older operating systems (RISCOS being my fav), there was a very small difference in the two. After all, both are simply blocks of data, just handled differently. An Example. Consider an XML file. In a filesystem we can navigate to it at /dir/subdir/xmlfile.xml but what if this data is structured? this could then be seen as a “folder” and u could infact navigate even deeper /dir/subdir/xmlfile.xml/users/mikeyb/firstname.
And again! MVC is a great way to look at REST and HTTP! However, I personally believe its not the Model, View, Or even the Controller elements. It is in fact the LINKAGE between them! HTTP (and REST) has 3 rolls, 1 is to transport the request (proxies, caches etc), 2 is to control “viewables”, which is used with the message body being passed back and forth. (Such as Compression, Auth, Encryption) and 3 is to manipulate or interact with a resource (The method, GET, PUT, POST etc).
To give another example, Consider your “voting” system. You have a resource. “/votes” (Model). If you did a “GET” on /votes it would say access denied, its private data. However if you did “/votes/total” (Model logic) you may receive something like
10 50 40
Which is simply the totals added up. This is still the model. the View would be the output when this is run in “piechart.xsl”. Which would make a nice pretty chart for you =]. The Controller is to do with the View. The Model has no knowledge of the Controller in MVC. So if clicking on a segment of the piechart makes a beep sound, This would be your controller (Most likely javascript).
Again, Votes are appended, so a POST to “/votes” with an ID is what id do =]
Im not sure about the server side stuff. But as you prolly can tell its pretty hard to find a good reason to have PATCH. and after all its been years this topic has been going on. Heres what i think;
Consider all the types of “movement” data has.. (in an operating system). It can be moved via..
1) Pipe (First In First Out), 2) Stack (First In Last Out), 3) Queue (The order is determined by application).
These are all fairly common, and easily done with POST, and GET. But there is one other that exists.
4) Shared Memory.
THIS is where i believe PATCH exists. Its the whole aspect that the data is being manipulated by two or more applications. Neither can fully know the true state of the data, and therefore can not fully change it without conflict. Extending POST (Which is basically what PATCH does) to allow the server to decide on the correct action, or to notify the client of a conflict. Thats the true beauty of PATCH.
There is however, not many real world examples of shared memory on the web and isnt used much in operating systems either (application wise anyway) which is most likely why its taken so long to be approved. But i can offer you one! Its called Mobwrite =], Which is a real-time collaborative text editor. It currently uses its own protocol tho, but i am currently (As we speak) updating it to use the HTTP methods.
MikeyB
the example had its tags striped.. should be.
[?xml-stylesheet href="piechart.xsl" type="text/xsl"?] [votes] [party name="teama"]10[/party] [party name="teamb"]50[/party] [party name="teamc"]40[/party] [/votes]
Yes you are right, my example s not very good.
I think there is a moral of this – you should consider whether your resource should be broken up into smaller subresources before using PATCH.
A collaborative text editor is a good example, but I was looking for something simpler, will come up with another one…
A better example perhaps is if I wish to maintain a resource that keeps a search index of a bunch of sites. I would want (my crawler) to post changes to the index when sites are updated using PATCH, not the whole index. Also I would never (normally) want to retrieve (or write) the whole index, just retrieve query results from it using query strings.