“Spent a week in a dusty library waiting for some words to jump at me” Camera Obscura.
Kas Thomas in his contribution to Julian Wraith’s popular thread on the future of content management really managed to make me disagree, the first of the posts that has!
The theme of this blog (yes it has got one) is that content management is changing as the web way of working starts to infiltrate the enterprise. And the web way of metadata is not what the old document oriented way was.
Kas says “Keeping knowledge about a file separate from the file itself is a hugely important concept.”
Looking at that point first. Look at any newish document format, from PDF/A to EXIF via the HTML meta tag, even a Word document and you will find embedded metadata. Remember that documents are emailed around, generally get lost. Metadata is the dog collar with a name and phone number on, saying version me and send me home. This trend is not going away, documents are becoming self contained.
Kas then continues “A file’s metadata becomes its interface to the outside world. It’s like a service descriptor.” That is simply not the way the web works. Resources use self describing formats like HTML for core data, and then all the other important metadata is linking information. It used to be that you had a blob file type and a program that could understand it, that was the basis for the desktop architecture, but that is not the case any more, even on the desktop you have a choice of applications that can understand a given file type, and your choice depends on what you can do with them. The web architecture goes much further, and resources become fully self-describing, a browser can understand all the web as every resource carries its own description, and bundled code to help you interpret it. A web page is its own service descriptor, and defines application state through hyperlinks. The web architecture has never had service descriptions.
There is one vital part of metadata that is not kept with a file, that is the link. Kas says “content, on the whole, is becoming richer, less structured”, missing out completely on the big picture that content is being structured by the imposition of links onto it, by its transformation into a hypertext, that creates a much richer structure than the individual documents have in themselves. Documents contain metadata about other documents in the form of links. The semantic web project is an attempt to add further richness to this structure. “What does the trend toward richer, less structured content mean for management of content?” well that means that content management is going to be about managing those links and relations between items, a lot more than it is now, when it came from a background of just managing documents, each an isolated item.
In a way that does come back to “Keeping knowledge about a file separate from the file itself” but not at all in the way Kas was trying to argue. Now time to link this to his argument to create a structured discussion…
6f82f1d2683dc522545efe863e5d2b73

3 Trackbacks
You can leave a trackback using this URL: http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/trackback/
[...] Kas Thomas and the response to this post from Justin Cormack [...]
[...] on the role of metadata in content management: is metadata the future of content management, an integral part of the content, or are we making an artificial [...]
[...] metadata representations that have sane serializations or embeddings into all formats. Metadata now lives within documents; it used to get lost before that. So the RDF model has won, and microformats have lost. Oh, and the [...]
5 Comments
Hi Justin,
First off, Kas didn’t actually say to keep the metadata in a separate repository or file — just keep it separated, logically. I think whether or not it’s better to keep the metadata in the same file, or to have it separated out “physically” is an interesting discussion, and a case can be made for both. But it’s rather besides the point he’s making.
At any rate, your post moves on about halfway through, and I’m having difficulty in understanding what you’re saying there. You must be taking a very technical viewpoint, because, yes, everything on the web is linked, and that’s the web’s inherent coherence; and yes, resources on the web describe themselves. And certainly, those links infer important metadata, which needs to be managed.
But if that’s your point, the problem with it is exactly what both Kas and I were talking about in our blog posts today. This kind of metadata is very technical: it’s only there to allow a browser to render the pages. It’s largely devoid of meaning, per se, and not at all sufficient to effectively manage the content itself. For that, you need a lot more meaningful metadata than the 1/20th shutter time gleaned from EXIF, or the that’s in the HTML of this blog.
As the amount of content explodes, and becomes ever more complex (tell me, what kind of metadata will a browser be able to extract from a video file?), we will need a lot more, and much better metadata to do anything meaningful with it. Being able to track the link from one URI to another URI won’t be enough. And where the metadata is, is of lesser concern. As long as it can be read along with what it describes.
But like I said, I may have completely misunderstood what you were trying to say here
Adriaan, I should try not to write these blog posts quite as quickly
I think there is a lot of common ground between us. Your post was very similar to my experiences in similar projects.
You are wrong about how much a browser will be able to find in a video file, see for example this spec. Because the browser is extensible with javascript it will be able to use this metadata. A DAM system should be adding information to the EXIF about the workflow, not just about the shutter speed.
You are right that we do need to know more about the content, that the really basic stuff of author and creation date is not sufficient. Google actually shows how much information there is just in simple links though, but adding more structure to them will help a lot more.
And I’m just going to bleat the same bleat I bleated last time. There is no difference between data and metadata in my world. One man’s data is another man’s meta-data. The usage of the data probably defines whether it is meta-data or not. I think all resources should be self-describing.
Hi Justin,
Well, I can understand the view of how correctly self-describing content could (one day) become content that’s self-managing (in the way that content in mashups is, or on the semantic web.) But I don’t think I’d want a browser to take over the role of the delivery tier of the content I want to push out in a very specific, targeted manner (which is usually the goal of a web CMS.)
It’s actually quite ironic I’d have to defend that to a content management vendor!
There is still plenty of delivery work to do, you still have to turn the descriptions into interfaces…