<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; jcr</title>
	<atom:link href="http://blog.technologyofcontent.com/tag/jcr/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 29 Jan 2012 16:38:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Towards a comparison of content repositories</title>
		<link>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/</link>
		<comments>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/#comments</comments>
		<pubDate>Sun, 19 Sep 2010 11:57:07 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[data modelling]]></category>
		<category><![CDATA[jcr]]></category>
		<category><![CDATA[modelling]]></category>
		<category><![CDATA[properties]]></category>
		<category><![CDATA[repositories]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=237</guid>
		<description><![CDATA[I am a bit behind on my blog at the moment, with a lot of unfinished posts. While I was writing about Lily CMS, I got distracted with an issue that I have been working on in the background for a long time. There was a comment saying &#8220;The Lily content model has been academically [...]]]></description>
			<content:encoded><![CDATA[<p>I am a bit behind on my blog at the moment, with a lot of unfinished posts. While I was writing about <a href="http://www.lilycms.org/">Lily CMS</a>, I got distracted with an issue that I have been working on in the background for a long time. There was a comment saying <a href="http://outerthought.org/blog/426-ot.html">&#8220;The Lily content model has been academically validated and accommodates data mapped from various domains, such as rich hypermedia, HTML5, NewsML, MXF, CMIS, RDF and many more&#8221;</a> which reminded me of the work I have been doing on classification of content models, as after all how you can validate a content model without a metamodel? And no one seems to have described the space of possible models, or the scope of choices. So here is my attempt.</p>

<p>The basic model is that we have resources, which may have some content and metadata attached. We are mainly interested here in the properties that can be attached to a resource, and the fact that some of those properties are relations to other resources. We are less concerned about what goes inside the structured body of a resource, although there are some issues about how many &#8220;bodies&#8221; a resource can have. So we have a model that has resources, each of which has key-value pairs, some of the values may be links to other resources, which presumably have referential integrity support</p>

<h2>Properties of properties</h2>

<ul>
<li><p><strong>STRING</strong>. Resources can have string valued properties.</p>

<p>This is a basic starting point; I don&#8217;t think I know of a CMS that does not support this. You can store any other type in a string if necessary, though binary values are more efficient in some cases.</p></li>
<li><p><strong>VALID</strong>. Property values can be validated.</p>

<p>Many core repositories do not validate property values at all, it is just a validation proxy layer that does, so as a repository principle this is fairly rare, although a small number of types (numbers, dates) might have native validated representations.</p></li>
<li><p><strong>TYPE</strong>. Property names can be validated.</p>

<p>Many systems have a typing facility that restricts the set of property names a resource can have. Others are unstructured, and any property can be added to any resource. There may be type composition mechanisms, such as mixins or type inheritance. Unlike value VALID this is more often tied to the core repository model, rather than a proxy layer, if the repository is typed for internal performance or indexing reasons, that is tied to dense rather than sparse storage.</p></li>
<li><p><strong>BINARY</strong>. Resources can have at most one binary property.</p>

<p>I have split this property as some content management systems can only have one binary property (such as an image file) on a particular resource, and multiple ones have to be constructed from multiple linked resources. This is not generally a huge limitation in an otherwise flexible system, but in a weaker system could be annoying.</p></li>
<li><p><strong>N-BINARY</strong>. Resources can have any number of binary properties.</p>

<p>This is the fully flexible version; one may still be a distinguished value in some way, but you can store all the sizes of an image (say) as properties of one resource, which makes managing them easier, although it may actually make things more difficult if it is not easy to iterate over properties (STRUCTPROP), and using multiple resources could be easier.</p></li>
<li><p><strong>STRUCTPROP</strong>. Properties can be structured.</p>

<p>Some systems have structured properties, for example some systems have a JSON representation for properties, rather than the flat key-value namespace of other systems. JSON supports arrays that can be iterated over, and structures that can be repeated. To make this sort of structure with only key-value properties you may need to use more resources. Structured properties though add a lot more complexity, and perfectly useful, but different, systems can be made with or without this model type. Structured properties often have partial update interfaces, which adds complexity, so that one subproperty can be modified at a time. Note while technically JCR does not have structured properties, you can use the distinguished tree below any resource as a tree of properties, so it is rather similar to this model. Note also that property naming can informally add structure, such as in the way slashes denote URI hierarchy, they can denote property hierarchy in a technically flat namespace.</p></li>
<li><p><strong>MULPROP</strong>. Properties can have multiple values.</p>

<p>Structured properties can usually have multiple values (JSON array for example), but not all systems with key-value type properties allow the same key to be set multiple times with different values. This is the model with say HTML metadata, where each property (key) can be set multiple times; however some key value systems only allow a key to hold a single value, and so the user would have to make a structured value to hold the multiple data items instead, by some encoding scheme without the system providing support directly. Having multiple properties complicates the simple set and get interfaces that single valued properties have.</p></li>
<li><p><strong>TREE</strong>. Resources can be in exactly one tree structure.</p>

<p>Another split one. Many systems have one distinguished tree structure that content items must be in, and that tree has special operations, like fast access to parents and children; other trees might be constructed by other means, like using a general relation, but the operations on them might be difficult. Children in a tree are almost always ordered and can be reordered, although some systems might not have this property.</p></li>
<li><p><strong>N-TREE</strong>. Resources can be in any number of trees.</p>

<p>The distinguished tree is very common (although Lily for example does not have one); but I do not think I know of any system with multiple named trees that share a common tree interface (like a parent function). You can make a tree with general relations, but you will not get help in making it acyclic for example. So while this is a possible design, it is complex to implement, although arguably useful as a modeling tool. Generally you will have to manage general relations yourself to do this.</p></li>
<li><p><strong>CLONE</strong>. A resource can be cloned.</p>

<p>This means that the same item can appear as more than one resource at the same time, each of which will update in the same way. This is similar to say a Unix hard link. This is the usual way of turning a TREE into a <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">dag</a>, which adds some flexibility. Different tree locations of the cloned resources may affect properties such as permissions in some systems, or inheritance so this property can add a fair amount of modeling flexibility; conversely without these it is of less use.</p></li>
<li><p><strong>RENAME</strong>. A resource can be renamed or moved.</p>

<p>Many content management systems provide this operation in their model, but it is not a native operation in others at the repository level, as the end user visible name just be a property for example. HTTP does not provide a rename operation, but WebDAV does.</p></li>
<li><p><strong>REL</strong>. General relations between resources can be created.</p>

<p>This is the basic relation (named by the key name) between two items. It corresponds to the HTML <link> metadata element, or an RDF triple. It turns the resources into a directed graph with named edges. It is certainly essential for any content management system; I will talk more  about how you want to be able to use and query it later.</p></li>
<li><p><strong>RELNS</strong>. Relations have a different namespace from properties.</p></li>
</ul>

<p>This is a distinguishing feature between XML, which has attributes and child elements (relations) syntactically distinguished, versus JSON that does not. Non child relations in XML are still attributes though. Generally seems a pointless distinction, and using a single namespace is simpler.</p>

<ul>
<li><p><strong>RELPROP</strong>. Relations can have properties.</p>

<p>This is an interesting one. Adding a value to the relationship triple to make it a quad, means that a number (or other value) can be assigned to a relation, making each relation a weighted directed graph (or you can view the system as a matrix). The general model of the <a href="http://arxiv.org/abs/1006.2361">property graph</a> has properties for edges, but for example the RDF model does not, although they are often what blank nodes are used to model, although of course blank nodes can have relations as well as properties. You can make up for a lack of properties on edges/relations by adding extra nodes like this, but they may proliferate and need managing, so allowing properties may help. It is also worth noting that a system without MULPROP can use naming of properties to implement RELPROP, as a relation could have a naming convention for its properties; similarly STRUCTPROP generally allows storing the extra information in the property structure.</p></li>
<li><p><strong>REFINT</strong>. Referential integrity is preserved for relations.</p>

<p>Preserving referential integrity at the repository layer is a fair amount of work, relational databases can do this, but not all content repositories do, for example over delete operations.</p></li>
<li><p><strong>ORDREL</strong>. Properties are ordered.</p>

<p>True key-value models do not tend to have an ordering for properties. As with many of these things, ordering adds interface complexity. Structured properties may however be ordered, and if the model supports a distinguished tree (TREE) this almost certainly has ordered children. If you have to build an ordered tree simple from basic relations it is quite complex. A sort order on a relation is another relation property that seems to be rarely supported for general relations, like weights.</p></li>
<li><p><strong>REIFY</strong>. Properties can have properties.</p>

<p>RDF in principle lets properties themselves be resources (reification), so that they can in turn have properties. This allows me to add information about the properties, such as where they came from. This rarely seems to be useful in common models. Giving different properties different permissions might be a more useful side effect.</p></li>
<li><p><strong>EXTREL</strong>. Relations can be defined externally to their subject.</p>

<p>HTML originally had a rev relation, which defined a relation backwards from object to subject, and RDF triples can be stored in any document, divorced from subject and object referants. This causes all sorts of issues with updates and managing (even finding) relations, while adding no descriptive ability except potentially REIFY.</p></li>
<li><p><strong>INHERIT</strong>. Relations can inherit properties.</p>

<p>The inheritance tree might be set from other properties, or from a distinguished tree, but one model is that properties not explicitly set can be inherited from another resource, or a prototype. This often makes models simpler, as rather than explicitly walking a tree, you can implicitly do it though inheritance. Seems surprisingly uncommon in content repositories.</p></li>
<li><p><strong>MINHERIT</strong>. Multiple inheritance.</p>

<p>Allow inheritance from multiple resources, not just for example based on the primary distinguished tree. More complex.</p></li>
<li><p><strong>NAMESPACE</strong>. Namespacing on properties.</p>

<p>Some systems have a type of namespacing on properties, often used for multiple language variants for example, so that a property may differ across these namespaces. This can also be implemented with multiple resources, structured properties or inheritance. Usually not all properties are namespaced at once; some may not vary, which makes the set and get interface more complex.</p></li>
<li><p><strong>ATOMIC</strong>. All the properties of a resource must be updated together.</p>

<p>A resource and all its properties are all updated at once to a complete new state (this will also be the versioning state). Other than for versioning, this also affects how concurrent access works. The alternative is that each property can be updated independently. Atomic updates are the HTTP resource update model.</p></li>
<li><p><strong>RVERSION</strong>. Resources may be versioned.</p>

<p>An entire resource may be versioned; this is a similar namespacing operation, where a namespace is available to retrieve the old values of a resource.</p></li>
<li><p><strong>PVERSION</strong>. Properties may be individually versioned.</p>

<p>Some systems (such as Lily) allow versioning to be turned on or off on a per property basis. This property generally implies that ATOMIC is not true.</p></li>
<li><p><strong>DELVERSION</strong>. Versions are deleted when a resource is deleted.</p>

<p>This is surprisingly common, if versions are namespaced properties, then they are often deleted along with the resource when it is deleted. The better solution is not to delete versioned resources, just give them a tombstone (whiteout) marker.</p></li>
<li><p><strong>SNAPSHOT</strong>. The versioning namespace is whole system state not resource state based.</p>

<p>Although versioning of the total state of a system is now common in source code control systems, many content management systems only let individual ressources be versioned (hence creating issues such as DELVERSION). The main issue here is that you cannot apply or undo a set of changes together, only individually. Apart from the difficulty in making easy user interfaces, whole system versioning is superior in every way to versioning of individual resources or properties and no one should be designing a system that does not behave like this.</p></li>
<li><p><strong>TREEVERSION</strong>. Versioning supports branching and merging.</p>

<p>A full versioning model like git or subversion, rather than just a linear series of checkpoints is another model. It generally simplifies the concurrent updates (ie can avoid both CAS and LOCK in theory). Although these provide the richest model of versioning, it is the hardest to present to the non technical user. Note also that this is one clear area where the content model for delivery can differ from the one for authoring; for authoring there are much more complex operations that are useful, while for delivery performance is key, and versioning may not be required at all, depending on how updates are applied.</p></li>
<li><p><strong>CAS</strong>. A <a href="http://en.wikipedia.org/wiki/Compare-and-swap">CAS</a>-type operation is supported on resource updates.</p>

<p>Some type of atomic update-if-unchanged since this version operation is supported, for lockless updates. HTTP Etags are the canonical example. This is the simplest choice for API access, and simple for users too. The unit of atomicity is usually the whole resource here, making it the unit of transactions; atomic update only of individual properties does not let two properties be updated in a single transaction so is not so useful.</p></li>
<li><p><strong>LOCK</strong>. A locking operation is supported.</p>

<p>The traditional alternative to CAS is a locking operation, that disables write operations while the operation is locked. Some administrative or time based unlock operations are required as well. Less suited than other methods to automated APIs, due to issues like deadlock. As multiple locks can potentially be obtained, cross resource transactions are possible, although this could impact concurrency.</p></li>
<li><p><strong>TRANSACTION</strong>. Transactions across multiple resources are supported.</p>

<p>Generally individual resources are the unit of transaction, or possibly individual properties. Some systems however allow a transaction in which multiple resources are updated together. JCR is probably the main example of these. A system with snapshots may also have this property if moving between versions is atomic. HTTP deliberately does not have this sort of transaction, as it does not work well if the resources are distributed, and system design for HTTP should ensure that resources model the right things so that transactions across resources are not needed.</p></li>
</ul>

<h2>Queries</h2>

<p>With the property model above, you can retrieve resources, and read and modify their properties. There may also be some additional maybe slightly different properties (the ones that TREE might expose for example, parent and child relations). We can traverse between resources by following their links. However we do generally want to make more complex queries, either about global questions, or more complex traversals based on properties. There are a lot of query models we should really explore, particularly we need to focus on how properties are indexed. I suspect that the analysis below is just a starting point.</p>

<ul>
<li><p><strong>PINDEX</strong>. Property values are indexed.</p>

<p>This is not necessarily essential, as for most interesting properties one would create a node rather than a value, and use relations, though you need reverse relation indexes anyway.</p></li>
<li><p><strong>REV</strong>. Relations have a reverse index.</p>

<p>This lets me find the opposite direction of a relation. This is a key property, as relations are directional, and important properties are in the other direction, such as finding all the resources tagged with a particular tag value.</p></li>
</ul>

<h2>Interfaces to properties</h2>

<p>I mentioned above that some of the property models have differing interface complexities, and I think it probably helps to show what some of the interfaces look like.</p>

<p>The canonical interface in web content management is that one exposed by HTTP. Resources and all their properties have to be updated atomically (ATOMIC) &#8211; <a href="http://blog.technologyofcontent.com/2009/12/smart-resources-or-why-you-should-care-about-http-patch/">PATCH</a> is just an optimization. CAS is available (Etags or last update). No versioning is specially supported (although the system could create resources for old versions and add properties to access them). Other property behaviours depend on the document types, so HTML for example supports a meta and link flat property namespace, but other schemes are possible. A resource TREE is very loosely defined by &#8216;/&#8217; in URLs, but does not provide any properties, so it is barely a tree.</p>

<p><a href="http://blog.technologyofcontent.com/2009/12/the-bottom-10-things-of-2009/">WebDAV</a> changed the HTTP data model to push it much more close to one traditional content model, supporting LOCK, RENAME, and TREE on top of HTTP, and an explicit property model independent of the resources in question. The property model is flat, with no STRUCTPROP although extending it is mentioned in the <a href="http://www.webdav.org/specs/rfc2518.html">RFC</a>, and no MULPROP. CLONE is allowed, as resources can have more than one URI. Updates to properties are not ATOMIC, as the PROPPATCH method can update some properties without others. The main HTTP resource is the body, which allows storage of one BINARY property, as the other properties are XML strings. This is pretty much the standard document management style set of properties.</p>

<p>As I was looking at Lily CMS recently, it is pretty different. There is no TREE, you have to construct it yourself. There is TYPE, no STRUCTPROP, there is NAMESPACE and PVERSION. There is no CAS or LOCK, conflicts are resolved by time of modification alone. It is an interestingly different model which I will look at in more detail in another post, as it chooses complexity in some areas and simplicity in others.</p>

<p>One of the common themes of the NoSQL movement is saying things along the lines of if you only give up this one feature we can make the storage layer that much simpler and faster, push some more work up to higher layers to resolve, issues like conflict resolution say, or referential integrity, or tree structures. This is the path of a simpler core repository which does not implement all common usage patterns for a CMS application, with some layering and conventions on top to build the next level. This is not so different from the relational model, with a low level relational algebra and a set of database management tools, then the application. What is still unclear is exactly where to make that split, but certainly large monolithic repository models that try to do everything listed above end up very large and complex to use.</p>

<p>There is definitely a case for moving some of these properties out of the repository into the authoring tools. Referential integrity at the repository level is actually quite hard to work with, as you cannot refer to something you are about to create, for example, but the authoring layer can provide tooling to help the user here.</p>

<p>I will post some follow-ups about some of the other issues arising from this, and what I think the best set of constraints to work in is, and more on the CMIS and JCR models.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>RESTful daydream #4</title>
		<link>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 15:34:41 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[CMIS]]></category>
		<category><![CDATA[jcr]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=110</guid>
		<description><![CDATA[In favour of a REST architecture for a web content repository]]></description>
			<content:encoded><![CDATA[<p>This blog post has gone through far too many iterations, and taken far too long to write! It got much shorter in the process though.</p>

<p>It started with an idea I had, in an innocent sort of way. I thought if I looked at the JCR specs for a bit I might find some kind of way of building a non Java interface with them. You know, maybe there might be a nice REST architecture waiting to get out. But of course there is no such thing. It is an application definition. There are not even that many ways of implementing it, other than choosing your object persistence method to be a database, file, or something else.</p>

<p>The REST architecture is notionally provided by another layer, such as Apache Sling, but Sling is in no way a REST layer, it is a URL dispatcher and scripting and application layer with which some REST style applications can be developed. With that you end up with a pretty heavyweight development framework, indeed together you have much of Day&#8217;s CMS offering in effect, rather than a lightweight REST repository solution.</p>

<p>I had a look at CMIS again. Fielding once <a href="http://roy.gbiv.com/untangled/2008/no-rest-in-cmis">laid into CMIS for not being REST</a> and you can see why, although some improvements have been made since that. Although resources are discoverable through hypertext, there is a fair amount of semantics that needs to be known to understand what a type or a checkout means, and the search queries are obviously just RPC wrappers. It is not too bad though, but unfortunately the data model does not map well onto web content management right now for obvious historical document management reasons. Fixable? I think it serves a part‎icular purpose well and should probably not be forced into anything else, as we need it to succeed in its field.</p>

<p>Day claims that JCR is <a href="http://dev.day.com/microsling/content/blogs/main/fudbusting2.html">not a Java standard</a> in an odd way, that you can implement the API in another language. Thats a strange argument to make, especially as the types are defined as Java types, and standards without interoperability are pretty vague. Without some sort of wire format or ABI this is meaningless outside the JVM world. People are making <a href="http://www.simpcore.org/">JCR like repositories in PHP</a> but outside any standards process, so in the end this just becomes a PHP repository project; Typo3 seems to be building another, also closely aligned to JCR.</p>

<p>The problem with these efforts is that it is not helping the balkanization of web CMS, which is already fragmented by language and API, which is ridiculous in an industry that is about the web. The web has an architecture (REST) and an API (HTTP). Building web content management on Java APIs or PHP APIs or .NET is a legacy way of thinking; it is acceptable for document management given its role in existing enterprise architectures, but it is not going to work if we want to get widespread acceptance in web development; in the short term it is the easy path, it is what people are used to, but a forward thinking industry needs to look at defragmenting the landscape and building future proof tools.</p>

<p>The odd thing is that a web content repository alone surely lends itself to a simple REST architecture. Content is after all lots of small resources with relations. Hypertext. It is pretty much in presentation a fairly dumb web application, although with a fair amount going on behind the scenes. It takes content, relates it to other content, and serves it back, with authentication and versioning. Everything else is in other system layers, transforming it and so on. Not simple, but well defined; lower level than JCR + Sling say</p>

<p>So we need to work on a web content repository model, as a community. Process wise, it makes sense for this to sit in an organization like AIIM, as a content management based industry body. It may well be that what ends up coming out of this is more standardized architectures and semantics and open source implementations rather than the tighter prescriptions of JCR and CMIS; I have some ideas along these lines that I need to code up. I have had some discussions and there is a degree of interest in some sort of solution; who is interested? Or is infrastructure dead, everything ust wants interfaces?</p>

<p><a href="http://dilbert.com/strips/comic/2009-09-02/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/6000/400/66480/66480.strip.gif" width="480" alt="Dilbert.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Content Delivery Network via Amazon Web Services: CloudFront: blog.edge3.org

Served from: blog.technologyofcontent.com @ 2012-02-04 13:46:47 -->
