<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; CMS</title>
	<atom:link href="http://blog.technologyofcontent.com/tag/cms/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 29 Jan 2012 16:38:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Towards a comparison of content repositories</title>
		<link>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/</link>
		<comments>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/#comments</comments>
		<pubDate>Sun, 19 Sep 2010 11:57:07 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[data modelling]]></category>
		<category><![CDATA[jcr]]></category>
		<category><![CDATA[modelling]]></category>
		<category><![CDATA[properties]]></category>
		<category><![CDATA[repositories]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=237</guid>
		<description><![CDATA[I am a bit behind on my blog at the moment, with a lot of unfinished posts. While I was writing about Lily CMS, I got distracted with an issue that I have been working on in the background for a long time. There was a comment saying &#8220;The Lily content model has been academically [...]]]></description>
			<content:encoded><![CDATA[<p>I am a bit behind on my blog at the moment, with a lot of unfinished posts. While I was writing about <a href="http://www.lilycms.org/">Lily CMS</a>, I got distracted with an issue that I have been working on in the background for a long time. There was a comment saying <a href="http://outerthought.org/blog/426-ot.html">&#8220;The Lily content model has been academically validated and accommodates data mapped from various domains, such as rich hypermedia, HTML5, NewsML, MXF, CMIS, RDF and many more&#8221;</a> which reminded me of the work I have been doing on classification of content models, as after all how you can validate a content model without a metamodel? And no one seems to have described the space of possible models, or the scope of choices. So here is my attempt.</p>

<p>The basic model is that we have resources, which may have some content and metadata attached. We are mainly interested here in the properties that can be attached to a resource, and the fact that some of those properties are relations to other resources. We are less concerned about what goes inside the structured body of a resource, although there are some issues about how many &#8220;bodies&#8221; a resource can have. So we have a model that has resources, each of which has key-value pairs, some of the values may be links to other resources, which presumably have referential integrity support</p>

<h2>Properties of properties</h2>

<ul>
<li><p><strong>STRING</strong>. Resources can have string valued properties.</p>

<p>This is a basic starting point; I don&#8217;t think I know of a CMS that does not support this. You can store any other type in a string if necessary, though binary values are more efficient in some cases.</p></li>
<li><p><strong>VALID</strong>. Property values can be validated.</p>

<p>Many core repositories do not validate property values at all, it is just a validation proxy layer that does, so as a repository principle this is fairly rare, although a small number of types (numbers, dates) might have native validated representations.</p></li>
<li><p><strong>TYPE</strong>. Property names can be validated.</p>

<p>Many systems have a typing facility that restricts the set of property names a resource can have. Others are unstructured, and any property can be added to any resource. There may be type composition mechanisms, such as mixins or type inheritance. Unlike value VALID this is more often tied to the core repository model, rather than a proxy layer, if the repository is typed for internal performance or indexing reasons, that is tied to dense rather than sparse storage.</p></li>
<li><p><strong>BINARY</strong>. Resources can have at most one binary property.</p>

<p>I have split this property as some content management systems can only have one binary property (such as an image file) on a particular resource, and multiple ones have to be constructed from multiple linked resources. This is not generally a huge limitation in an otherwise flexible system, but in a weaker system could be annoying.</p></li>
<li><p><strong>N-BINARY</strong>. Resources can have any number of binary properties.</p>

<p>This is the fully flexible version; one may still be a distinguished value in some way, but you can store all the sizes of an image (say) as properties of one resource, which makes managing them easier, although it may actually make things more difficult if it is not easy to iterate over properties (STRUCTPROP), and using multiple resources could be easier.</p></li>
<li><p><strong>STRUCTPROP</strong>. Properties can be structured.</p>

<p>Some systems have structured properties, for example some systems have a JSON representation for properties, rather than the flat key-value namespace of other systems. JSON supports arrays that can be iterated over, and structures that can be repeated. To make this sort of structure with only key-value properties you may need to use more resources. Structured properties though add a lot more complexity, and perfectly useful, but different, systems can be made with or without this model type. Structured properties often have partial update interfaces, which adds complexity, so that one subproperty can be modified at a time. Note while technically JCR does not have structured properties, you can use the distinguished tree below any resource as a tree of properties, so it is rather similar to this model. Note also that property naming can informally add structure, such as in the way slashes denote URI hierarchy, they can denote property hierarchy in a technically flat namespace.</p></li>
<li><p><strong>MULPROP</strong>. Properties can have multiple values.</p>

<p>Structured properties can usually have multiple values (JSON array for example), but not all systems with key-value type properties allow the same key to be set multiple times with different values. This is the model with say HTML metadata, where each property (key) can be set multiple times; however some key value systems only allow a key to hold a single value, and so the user would have to make a structured value to hold the multiple data items instead, by some encoding scheme without the system providing support directly. Having multiple properties complicates the simple set and get interfaces that single valued properties have.</p></li>
<li><p><strong>TREE</strong>. Resources can be in exactly one tree structure.</p>

<p>Another split one. Many systems have one distinguished tree structure that content items must be in, and that tree has special operations, like fast access to parents and children; other trees might be constructed by other means, like using a general relation, but the operations on them might be difficult. Children in a tree are almost always ordered and can be reordered, although some systems might not have this property.</p></li>
<li><p><strong>N-TREE</strong>. Resources can be in any number of trees.</p>

<p>The distinguished tree is very common (although Lily for example does not have one); but I do not think I know of any system with multiple named trees that share a common tree interface (like a parent function). You can make a tree with general relations, but you will not get help in making it acyclic for example. So while this is a possible design, it is complex to implement, although arguably useful as a modeling tool. Generally you will have to manage general relations yourself to do this.</p></li>
<li><p><strong>CLONE</strong>. A resource can be cloned.</p>

<p>This means that the same item can appear as more than one resource at the same time, each of which will update in the same way. This is similar to say a Unix hard link. This is the usual way of turning a TREE into a <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">dag</a>, which adds some flexibility. Different tree locations of the cloned resources may affect properties such as permissions in some systems, or inheritance so this property can add a fair amount of modeling flexibility; conversely without these it is of less use.</p></li>
<li><p><strong>RENAME</strong>. A resource can be renamed or moved.</p>

<p>Many content management systems provide this operation in their model, but it is not a native operation in others at the repository level, as the end user visible name just be a property for example. HTTP does not provide a rename operation, but WebDAV does.</p></li>
<li><p><strong>REL</strong>. General relations between resources can be created.</p>

<p>This is the basic relation (named by the key name) between two items. It corresponds to the HTML <link> metadata element, or an RDF triple. It turns the resources into a directed graph with named edges. It is certainly essential for any content management system; I will talk more  about how you want to be able to use and query it later.</p></li>
<li><p><strong>RELNS</strong>. Relations have a different namespace from properties.</p></li>
</ul>

<p>This is a distinguishing feature between XML, which has attributes and child elements (relations) syntactically distinguished, versus JSON that does not. Non child relations in XML are still attributes though. Generally seems a pointless distinction, and using a single namespace is simpler.</p>

<ul>
<li><p><strong>RELPROP</strong>. Relations can have properties.</p>

<p>This is an interesting one. Adding a value to the relationship triple to make it a quad, means that a number (or other value) can be assigned to a relation, making each relation a weighted directed graph (or you can view the system as a matrix). The general model of the <a href="http://arxiv.org/abs/1006.2361">property graph</a> has properties for edges, but for example the RDF model does not, although they are often what blank nodes are used to model, although of course blank nodes can have relations as well as properties. You can make up for a lack of properties on edges/relations by adding extra nodes like this, but they may proliferate and need managing, so allowing properties may help. It is also worth noting that a system without MULPROP can use naming of properties to implement RELPROP, as a relation could have a naming convention for its properties; similarly STRUCTPROP generally allows storing the extra information in the property structure.</p></li>
<li><p><strong>REFINT</strong>. Referential integrity is preserved for relations.</p>

<p>Preserving referential integrity at the repository layer is a fair amount of work, relational databases can do this, but not all content repositories do, for example over delete operations.</p></li>
<li><p><strong>ORDREL</strong>. Properties are ordered.</p>

<p>True key-value models do not tend to have an ordering for properties. As with many of these things, ordering adds interface complexity. Structured properties may however be ordered, and if the model supports a distinguished tree (TREE) this almost certainly has ordered children. If you have to build an ordered tree simple from basic relations it is quite complex. A sort order on a relation is another relation property that seems to be rarely supported for general relations, like weights.</p></li>
<li><p><strong>REIFY</strong>. Properties can have properties.</p>

<p>RDF in principle lets properties themselves be resources (reification), so that they can in turn have properties. This allows me to add information about the properties, such as where they came from. This rarely seems to be useful in common models. Giving different properties different permissions might be a more useful side effect.</p></li>
<li><p><strong>EXTREL</strong>. Relations can be defined externally to their subject.</p>

<p>HTML originally had a rev relation, which defined a relation backwards from object to subject, and RDF triples can be stored in any document, divorced from subject and object referants. This causes all sorts of issues with updates and managing (even finding) relations, while adding no descriptive ability except potentially REIFY.</p></li>
<li><p><strong>INHERIT</strong>. Relations can inherit properties.</p>

<p>The inheritance tree might be set from other properties, or from a distinguished tree, but one model is that properties not explicitly set can be inherited from another resource, or a prototype. This often makes models simpler, as rather than explicitly walking a tree, you can implicitly do it though inheritance. Seems surprisingly uncommon in content repositories.</p></li>
<li><p><strong>MINHERIT</strong>. Multiple inheritance.</p>

<p>Allow inheritance from multiple resources, not just for example based on the primary distinguished tree. More complex.</p></li>
<li><p><strong>NAMESPACE</strong>. Namespacing on properties.</p>

<p>Some systems have a type of namespacing on properties, often used for multiple language variants for example, so that a property may differ across these namespaces. This can also be implemented with multiple resources, structured properties or inheritance. Usually not all properties are namespaced at once; some may not vary, which makes the set and get interface more complex.</p></li>
<li><p><strong>ATOMIC</strong>. All the properties of a resource must be updated together.</p>

<p>A resource and all its properties are all updated at once to a complete new state (this will also be the versioning state). Other than for versioning, this also affects how concurrent access works. The alternative is that each property can be updated independently. Atomic updates are the HTTP resource update model.</p></li>
<li><p><strong>RVERSION</strong>. Resources may be versioned.</p>

<p>An entire resource may be versioned; this is a similar namespacing operation, where a namespace is available to retrieve the old values of a resource.</p></li>
<li><p><strong>PVERSION</strong>. Properties may be individually versioned.</p>

<p>Some systems (such as Lily) allow versioning to be turned on or off on a per property basis. This property generally implies that ATOMIC is not true.</p></li>
<li><p><strong>DELVERSION</strong>. Versions are deleted when a resource is deleted.</p>

<p>This is surprisingly common, if versions are namespaced properties, then they are often deleted along with the resource when it is deleted. The better solution is not to delete versioned resources, just give them a tombstone (whiteout) marker.</p></li>
<li><p><strong>SNAPSHOT</strong>. The versioning namespace is whole system state not resource state based.</p>

<p>Although versioning of the total state of a system is now common in source code control systems, many content management systems only let individual ressources be versioned (hence creating issues such as DELVERSION). The main issue here is that you cannot apply or undo a set of changes together, only individually. Apart from the difficulty in making easy user interfaces, whole system versioning is superior in every way to versioning of individual resources or properties and no one should be designing a system that does not behave like this.</p></li>
<li><p><strong>TREEVERSION</strong>. Versioning supports branching and merging.</p>

<p>A full versioning model like git or subversion, rather than just a linear series of checkpoints is another model. It generally simplifies the concurrent updates (ie can avoid both CAS and LOCK in theory). Although these provide the richest model of versioning, it is the hardest to present to the non technical user. Note also that this is one clear area where the content model for delivery can differ from the one for authoring; for authoring there are much more complex operations that are useful, while for delivery performance is key, and versioning may not be required at all, depending on how updates are applied.</p></li>
<li><p><strong>CAS</strong>. A <a href="http://en.wikipedia.org/wiki/Compare-and-swap">CAS</a>-type operation is supported on resource updates.</p>

<p>Some type of atomic update-if-unchanged since this version operation is supported, for lockless updates. HTTP Etags are the canonical example. This is the simplest choice for API access, and simple for users too. The unit of atomicity is usually the whole resource here, making it the unit of transactions; atomic update only of individual properties does not let two properties be updated in a single transaction so is not so useful.</p></li>
<li><p><strong>LOCK</strong>. A locking operation is supported.</p>

<p>The traditional alternative to CAS is a locking operation, that disables write operations while the operation is locked. Some administrative or time based unlock operations are required as well. Less suited than other methods to automated APIs, due to issues like deadlock. As multiple locks can potentially be obtained, cross resource transactions are possible, although this could impact concurrency.</p></li>
<li><p><strong>TRANSACTION</strong>. Transactions across multiple resources are supported.</p>

<p>Generally individual resources are the unit of transaction, or possibly individual properties. Some systems however allow a transaction in which multiple resources are updated together. JCR is probably the main example of these. A system with snapshots may also have this property if moving between versions is atomic. HTTP deliberately does not have this sort of transaction, as it does not work well if the resources are distributed, and system design for HTTP should ensure that resources model the right things so that transactions across resources are not needed.</p></li>
</ul>

<h2>Queries</h2>

<p>With the property model above, you can retrieve resources, and read and modify their properties. There may also be some additional maybe slightly different properties (the ones that TREE might expose for example, parent and child relations). We can traverse between resources by following their links. However we do generally want to make more complex queries, either about global questions, or more complex traversals based on properties. There are a lot of query models we should really explore, particularly we need to focus on how properties are indexed. I suspect that the analysis below is just a starting point.</p>

<ul>
<li><p><strong>PINDEX</strong>. Property values are indexed.</p>

<p>This is not necessarily essential, as for most interesting properties one would create a node rather than a value, and use relations, though you need reverse relation indexes anyway.</p></li>
<li><p><strong>REV</strong>. Relations have a reverse index.</p>

<p>This lets me find the opposite direction of a relation. This is a key property, as relations are directional, and important properties are in the other direction, such as finding all the resources tagged with a particular tag value.</p></li>
</ul>

<h2>Interfaces to properties</h2>

<p>I mentioned above that some of the property models have differing interface complexities, and I think it probably helps to show what some of the interfaces look like.</p>

<p>The canonical interface in web content management is that one exposed by HTTP. Resources and all their properties have to be updated atomically (ATOMIC) &#8211; <a href="http://blog.technologyofcontent.com/2009/12/smart-resources-or-why-you-should-care-about-http-patch/">PATCH</a> is just an optimization. CAS is available (Etags or last update). No versioning is specially supported (although the system could create resources for old versions and add properties to access them). Other property behaviours depend on the document types, so HTML for example supports a meta and link flat property namespace, but other schemes are possible. A resource TREE is very loosely defined by &#8216;/&#8217; in URLs, but does not provide any properties, so it is barely a tree.</p>

<p><a href="http://blog.technologyofcontent.com/2009/12/the-bottom-10-things-of-2009/">WebDAV</a> changed the HTTP data model to push it much more close to one traditional content model, supporting LOCK, RENAME, and TREE on top of HTTP, and an explicit property model independent of the resources in question. The property model is flat, with no STRUCTPROP although extending it is mentioned in the <a href="http://www.webdav.org/specs/rfc2518.html">RFC</a>, and no MULPROP. CLONE is allowed, as resources can have more than one URI. Updates to properties are not ATOMIC, as the PROPPATCH method can update some properties without others. The main HTTP resource is the body, which allows storage of one BINARY property, as the other properties are XML strings. This is pretty much the standard document management style set of properties.</p>

<p>As I was looking at Lily CMS recently, it is pretty different. There is no TREE, you have to construct it yourself. There is TYPE, no STRUCTPROP, there is NAMESPACE and PVERSION. There is no CAS or LOCK, conflicts are resolved by time of modification alone. It is an interestingly different model which I will look at in more detail in another post, as it chooses complexity in some areas and simplicity in others.</p>

<p>One of the common themes of the NoSQL movement is saying things along the lines of if you only give up this one feature we can make the storage layer that much simpler and faster, push some more work up to higher layers to resolve, issues like conflict resolution say, or referential integrity, or tree structures. This is the path of a simpler core repository which does not implement all common usage patterns for a CMS application, with some layering and conventions on top to build the next level. This is not so different from the relational model, with a low level relational algebra and a set of database management tools, then the application. What is still unclear is exactly where to make that split, but certainly large monolithic repository models that try to do everything listed above end up very large and complex to use.</p>

<p>There is definitely a case for moving some of these properties out of the repository into the authoring tools. Referential integrity at the repository level is actually quite hard to work with, as you cannot refer to something you are about to create, for example, but the authoring layer can provide tooling to help the user here.</p>

<p>I will post some follow-ups about some of the other issues arising from this, and what I think the best set of constraints to work in is, and more on the CMIS and JCR models.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Trends in content management 2010</title>
		<link>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/</link>
		<comments>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:05:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=226</guid>
		<description><![CDATA[This is a an overview of the medium term trends in content management, from a mostly technology point of view. Standards repository feeds CMIS JCR terminology, ways of thinking, industry model Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer [...]]]></description>
			<content:encoded><![CDATA[<p>This is a an overview of the medium term trends in content management, from a mostly technology point of view.</p>

<h2>Standards</h2>

<ul>
<li>repository</li>
<li>feeds</li>
<li>CMIS</li>
<li>JCR</li>
<li>terminology, ways of thinking, industry model</li>
</ul>

<p>Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer focussed. This has changed rapidly, with the wide adoption of the JCR standards, but particularly with process around CMIS. What is being set now is the model of the industry for the next five years, what the customers expect and what the products will deliver. Setting the agenda matters, and now is the opportunity to participate.</p>

<h2>CMS as a platform</h2>

<ul>
<li>build applications on a content platform</li>
<li>API driven development</li>
<li>SOA</li>
<li>embed code everywhere in domain level scripting languages</li>
</ul>

<p>A content management system is at last becoming less of a product that lets you do some stuff and more of a platform for working with content and building content centered applications and a service oriented world. Pervasive invasion of scripting languages such as Javascript into this is coming. The web programming model of pervasive agile scripting and rich REST APIs is going to be the norm, not large scale Java programming or application specific templating languages.</p>

<h2>Co-opetition and community</h2>

<ul>
<li>collaboration on standards, infrastructure</li>
<li>open source as community</li>
<li>twitter, blogs, enterprise 2.0</li>
<li>end of NIH</li>
<li>customers are community too</li>
</ul>

<p>In the last year especially the landscape of content management as a community has changed. First through the standards processes, particularly CMIS and JCR, and then through social media, particularly twitter, as well as via events and blogs, there is now a growing cross vendor technical content management community, particularly with the open source players, and joint projects, for example with CMIS. This is in addition to the developer communities that are strongest around the open source products, although the .net products are trying hard to build around the Microsoft developer relations model. And of course the community of customers, who are becoming more vocal.</p>

<h2>Rich content</h2>

<ul>
<li>richer xhtml and xml</li>
<li>enhanced metadata; richer metadata in other formats</li>
<li>constraints not just validation</li>
<li>RDF and semantic web, linked data</li>
<li>relations and IA expressed in metadata</li>
<li>enhancement via deeply integrated search</li>
<li>document management, DAM and WCM converge</li>
<li>richer presentation layers, richer APIs</li>
<li>Flash is dead, plugins are dead, HTML5 is winning faster than anyone thought</li>
</ul>

<p>As we have moved from document management, where the focus was on whole documents, to web content management, which is more component and assembly based, there has been a gradual push to do more with the documents. Standardized rich document semantics are after all one of the main advantages of web documents. It is taking a while but making use of the potential here is beginning to happen, now we have <a href="http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html">Google indexing rich snippets</a> and even <a href="http://rdfa.info/2010/01/20/uk-retail-chain-tesco-adopts-rdfa/">Tesco using RDFa</a>. There is a lot more standardization work to do here.</p>

<p>In the front end the aim of this backend information enhancement is to build richer interfaces more easily, and to enhance findability, search and navigation, as well as to enable repurposing, richer APIs, and linked data. Authoring is the biggest challenge, as the majority of users need to be given interfaces that are independent of the IA, simple to use, but support generation and modification of complex data structures.</p>

<h2>SAAS and the service business</h2>

<ul>
<li>cloud</li>
<li>internal delivery in a SAAS way</li>
<li>devops</li>
<li>APIs and standardization forced by SAAS</li>
<li>changes to customer service model</li>
</ul>

<p>Software as a service models are winning because no one wants to buy software as a product any more. I will cover more of this in another article I have been working on for a bit, but the main point is that enterprise software is a paid big ticket product is dead. The replacements are open source software and SAAS. These are not alternatives though, as people want the open source software delivered as a service, albeit maybe a more commoditized one if there are multiple providers, and many of the SAAS products delivered will be largely built of open source components by companies that run a mixed model. Microsoft is <a href="http://www.theregister.co.uk/2010/03/04/ballmer_on_azure/">going headlong into cloud</a> in a way that redefines what the operating system is. Even purchased software will be delivered in internal clouds.</p>

<p>This changes both how code as written and administered, with the <a href="http://lethargy.org/~jesus/writes/a-job,-a-mission,-a-career-all-without-a-path-or-a-name.">web operations</a> joining up into rolling delivery and creating the emerging field of <a href="http://www.devopsdays.org/">devops</a>. Developers need to understand operations and how to build code for this environment.</p>

<p>The service business as a business is different from the product business. Open source companies have got that better than product based vendors, but the less there is lockin the more key these changes become. The <a href="http://www.interwest.com/software-as-a-service/on-demand/vp-of-customer-success-critical-to-the-saas-business-model/">success of the customer using the services becomes the key business driver</a>.</p>

<h2>Performance and scaling, real time</h2>

<ul>
<li>cloud has pushed scale up out of picture</li>
<li>scale out transparently</li>
<li>new technologies beyond RDBMS that fit CMS </li>
<li>dynamic generation becoming the norm; Google pushing the performance thing; the industry norm of 100ms will fall</li>
<li>real time becomes more important &#8211; dynamic updates, forget crawling, Google is going push</li>
<li>backend: queuing (0MQ, AMQP)</li>
<li>frontend: websockets, XMPP, long polling </li>
</ul>

<p>Just buying big hardware for scale up is really becoming difficult; the web vibe has always been to scale horizontally on commodity hardware. There is a lot of development around scale out <a href="http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/">technologies such as NoSQL</a> which fit into the WCM data models, which are those of the web after all.</p>

<p>As well as scaling for volume, latency and real time are becoming key. Google&#8217;s time to crawl has been falling rapidly, to a few or less, but is <a href="http://www.readwriteweb.com/archives/google_developing_real_time_index.php">moving to real time</a> with push updates. Twitter has really pushed the boundaries of expectation for real time. Behind the scenes there are a lot of technologies for efficiently pushing around notifications and events both at the backend and on the frontend. Real time is going to become increasingly pervasive.</p>

<p>Page generation times will need to fall; the standard industry benchmark of 100ms per component will probably need to be halved; overall total times under 1s will become the norm.</p>

<h1>Security</h1>

<ul>
<li>web increasingly hostile</li>
<li>every bug is a potential security issue</li>
<li>security focussed on fewer areas, push into the OS not out to applications</li>
</ul>

<p>I read the excellent <a href="http://lwn.net">Linux Weekly News</a> every week, and every week there are <a href="http://cwe.mitre.org/top25/">security exploits</a> for many pieces of software; one that really struck me recently was the <a href="http://www.h-online.com/security/news/item/Possible-backdoor-in-the-e107-CMS-913588.html">major exploit against the CMS e107</a>. What happened here was the a group of crackers found a serious security flaw in the CMS, which they began attacking systematically. When the patch was released however, they already had control of the developer&#8217;s website via the flaw, so they replaced the patched version of the code with a version with a backdoor. Hacked websites are a vital part of the underground <a href="http://www.securitytube.net/Phishing-%28Evil-on-the-Internet%29-FOSDEM-Talk-video.aspx">online crime scene</a>, and a content management system is a high value target. Expect much more of this, and be prepared.</p>

<p>Narrowing the security into fewer points of vulnerability, sandboxing, using every available facet of the operating system&#8217;s security layers; make the most of processes, permissions, everything that you get there; I <a href="http://blog.technologyofcontent.com/2010/01/scaling-security-and-architecture-in-2010/">wrote more about this in an earlier post on emerging trends</a>. File format parsing is another area of vulnerability that is common.</p>

<p>It is war out there on the internet, and many people underestimate or ignore the issues, and too many programmers do not code defensively by habit.</p>

<h2>Summary</h2>

<p>It is an exciting time in web content management right now; the industry is growing up beyond its beginnings as a way of getting web sites up, towards being the core of the broader content management industry. The choices made now will shape the industry; the next generation of products will be a big step forward forr the industry.</p>

<p><a href="http://dilbert.com/strips/comic/2009-07-26/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/1000/700/61747/61747.strip.sunday.gif" border="0" alt="Dilbert.com" width="450"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>NoSQL and content management</title>
		<link>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 23:34:15 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[data modelling]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=216</guid>
		<description><![CDATA[I went to many of the first ever NoSQL devroom talks at FOSDEM this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from [...]]]></description>
			<content:encoded><![CDATA[<p>I went to many of the first ever <a href="http://nosql.mypopescu.com/post/385372130/your-chance-to-review-the-fosdem-nosql-event">NoSQL devroom</a> talks at <a href="http://fosdem.org">FOSDEM</a> this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from memory; Tim Anglade gave an excellent introduction where he reminded people of the historical roots, both before relational databases and since then; so not new but there is a renewed focus now. Why is that? I am going to look here at the field of content management and why you might be interested in different data models if that is your problem space, based loosely on some of the ideas from the talks at FOSDEM. There was a talk about <a href="http://outerthought.org/blog/blog/353-OTC.html">content management specifically and the Lily CMS by Evert Arckens</a> although I missed it, but I have added some comments after watching the video.</p>

<p><a href="http://www.flickr.com/photos/justincormack/4375594326/" title="FOSDEM by Justin Cormack, on Flickr"><img src="http://farm5.static.flickr.com/4029/4375594326_7ebdafd796.jpg" width="450"  alt="FOSDEM" /></a></p>

<h2>The data model for content management</h2>

<p>I have another draft post on this subject in more detail, which I am working on as parrt of my REST modelling in content management work, but I will outline some of the types of data relations that are important. I will be quite abstract here, if you want more concrete examples you will have to wait for the other post: database models like the ones we are talking about here are more easily understood in the abstract I think.</p>

<p>First we our unit of modeling. This in itself is the first issue. Content management tends to deal with, at the conceptual level, something that looks like a document. It may be a fragment, in the sense that it is say a page component (asset if you use that terminology) rather than a whole item, but the unit for the user to edit and which is usually versioned is a structured object itself. The processing model tends to treat it as almost of binary blob, except that certain properties can be extracted, such as metadata, links in HTML and so forth, but it is stored as an item rather than decomposed further.</p>

<p>OK, so we have a piece of content and some attributes extracted from it as one basic model. This corresponds pretty much to the JCR data model for example. There are variations; sometimes people do not store metadata in the file formats, as historically many file formats had poor support for arbitrary structured metadata, although that is largely obsolete now, and the advantages of actually storing metadata and relations substantially within documents are high. External storage does not change the model much, just complicates processing and storage. Another variant, often seen in document management systems is to be able to have multiple &#8216;streams&#8217; ie several document variants rolled into one, for example a video and a still from it. You can however from the modelling point of view regard these as anotehr compound document format kept together because conceptually they are a bundle of content; you might distribute them as a zip file if you havent got any other suitable container format.</p>

<p>So now we have a storage model where we have a blob, with rich media operations on it, and extracted structural and metadata information. There is also versioning to consider, but let us ignore that and treat it either as part of the blob, or as a new document with some relation to the old ones, those being the two core versioning models, this does not really affect anything else.</p>

<p>There are two kinds of metadata, although they are more similar than they appear, properties and relations. Properties are the standard attributes (this picture depicts sheep), while relations join two items in the repository (this is a cropped version of this other picture). Although this distinction seems clear, in the end richer information architectures demand that everything becomes a relation, so I can browse a sheep node and find all the sheep items, turning every attribute value of any significance into a node with relations instead. Pure attribute values are only left for the less interesting properties (this PDF file is 176k in size).</p>

<p>They are also less interesting from a relational versus non relational storage point of view, although there is one important point, which is the dense versus sparse question, so let us take a look at this. Most real world attributes are sparse, that is most attributes aare not set on most items. In the relational model we have a row for our item, and columns for all the attributes, so we are saying most are NULL. (I was brought up on matrix algorithms and still think in terms of sparse versus dense matrices as this is exactly the same problem, and matrices represent graphs anyway). Storing huge mainly null tables is not very efficient, so there are two common practices in relational mapping of attributes in content management systems. First is to define a type based system, where a particular type of content item is defined to have certain attributes (or at least fewer NULLs!), and each set of that type therefore can have its own table which is assumed to have fewer NULL values. Mixins, sets of properties that live across types can potentially be added to this model, as can inheritance schemes, but the basic idea is one table per type. This gives a nice simple direct database programming model, and causes a complete nightmare if you ever want to change the schema, for example add an attribute, as for any large database most DBMSs will effectively shutdown the system while a schema change takes place, as schema changes require pretty much all locks. <a href="http://www.silverstripe.com">Silverstripe</a> is one example of a content management system built like this; there are many others.</p>

<p>The alternative is the <a href="http://en.wikipedia.org/wiki/Entity-attribute-value_model">entity attribute value</a> (EAV) model (terrible Wikipedia article, please fix), where rather than a direct mapping of the attributes to relations, you indirectly map, creating a table that joins entites, attributes and values; this table of course looks just like RDF triples. Doing this though loses everything that makes a relational database useful: constraints, typing, query optimization. It adds an extra layer of logical schema above the physical schema which the database layer does not understand. This is a pretty common relational mapping for content management systems, as it allows full flexibility in defining and redefining attributes. To implement well it needs a large mid layer to manage the constraints, provide an API layer, generate efficient queries, effectively to manage the logical layer to physical layer map. The <a href="http://drupal.org/node/82661">Drupal CCK</a> is an example of this model.</p>

<p>Of course this is not to say that neither of the two relational models do not work. The direct mapping works well with simple, unchanging content types in small websites, for example, or in models where attributes are not very sparse, or the sparseness is worth the overhead, and changing the schema is rare. EAV works well too, if managed carefully; it helps if the type of queries required on the model are not too complex.</p>

<p>Once you add relations as well as attributes, the already difficult mapping layer gets harder; you add another set of operations (recursion to handle tree structures) that the relational model does not handle well, so you may need to add more into the mapping layer. The promise of NoSQL is that you can bypass this for these types of applications, and program directly to a database model that handles sparse attributes and relations natively. But how much do the NoSQL databases get you? You can argue that if you are already looking at EAV, then you are already not getting much from a relational database, and you are building a modeling layer on top of it, so dropping that and going for something that maps the logical data layer directly does make sense from a development point of view. Whether that really helps performance is less clear; much of the original work for NoSQL has come out of huge scaling, big problems, not actually providing efficient solutions to the types of data mapping problem we are seeing here on a medium scale; of course for huge sites there may be benefits.</p>

<p>The types of NoSQL database vary in their level of support for attributes and relations as they are used in content management. Document oriented databases do not give you much more than retrieval of content items; associative ones give key value type attribute lookups; graph databases should let you query relations directly, expressing the types of queries that are needed for information architecture problems directly, in principle. Examples I am thinking of are things like tag clouds, which is simple to express as a graph problem as it is simple a count of the number of edges from a set of nodes. Indeed most information architecture problems look like graph problems, and also like <a href="http://en.wikipedia.org/wiki/OLAP_cube">OLAP processing operations</a> which also do not work well on relational databases. And of course one of the things that NoSQL has shared with OLAP is the use of denormalization; you can use simpler models if you denormalize data to match the queries you will be using, rather than assuming that the types of query you will use can necessarily be optimized and made efficient by a general purpose system.</p>

<p>Denormalization is not without its difficulties, although arguably it could become a tool embedded in databases like indexes are now. One of the issues with NoSQL is most of the database systems leave denormalization to the user: you need to use it because joins are not available, but you have to manage that yourself. Building an infrastructure to explicitly manage denormalization as a first class database item akin to an index might be interesting. So that gives us a first issue, as in any NoSQL system except a graph database we will either need to denormalize or compose queries to get the results we want.</p>

<p>So I think there are four realistic models for content management backends going forward:</p>

<ol>
<li>The direct relational model for small systems with simple data models, rare attribute changes, little or no use of relations.</li>
<li>EAV models wrapped in a content modeling layer; JCR is an example of this, hiding the underlying SQL layer very well, and indeed allowing it to be replaced with another underlying storage model potentially; I am sure someone is testing a Neo4J backend somewhere. This is where most production solutions are at now.</li>
<li>Direct, nondenormalized graph database backends, with the raw content stored in a document store. Cuts out a special purpose middle level by mapping the domain more directly. As <a href="http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html">Emil Neo</a> says, it may not scale right up as far as the othe NoSQL technologies, but it cuts complexity of implementation; there are also issues about whether all the kinds of queries required are available efficiently. I think this will be the sweet spot in a few years once the products mature and we see more open source activity in the field. Of course RDF based solutions, for example using SPARQL fall into this category too, and the maturity of products around these technologies will help drive this category as well as the NoSQL models.</li>
<li>Big, denormalized systems, probably with software support for managing the denormalization, and using underlying simple but scalable technologies like key-value stores. These already exist in large scale web applications, but may remain niche if the development effort remains high. If frameworks for modelling more easily on these turn up they may trickle down for performance reasons even on smaller datasets; a key value store runs fine on a relational database backend, although the types of processing required probably means a specialized backend is useful.</li>
</ol>

<p>Note that the <a href="http://lilycms.org/">Lily CMS</a> which there was a talk about fits very much into the fourth option above; this is where the NoSQL technologies have perhaps seen most use, but I think there will be a lot of work in order to build a CMS like this now, in particular in terms of tools to support denormalization strategies that are needed. The outlined approach sounded much like the outlines I have been thinking about for this type of model, although I would focus more on tooling for denormalized queries and less on scaling other parts like full text search right now. It will be interesting to follow the progress of this project.</p>

<p>We are at an interesting juncture, where it looks like there are some options that will let us do domain modelling in a way that corresponds more directly to the domain, but there are a lot of interesting challenges on the way.</p>

<p><a href="http://dilbert.com/strips/comic/2008-02-12/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/800/1869/1869.strip.gif" border="0" alt="Dilbert.com" width="440"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Content microarchitecture: How I learned to love HTML part 2</title>
		<link>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/</link>
		<comments>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 16:34:24 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[html]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=145</guid>
		<description><![CDATA[I posted recently on an unfinished series by Daniel Jacobson, which was perhaps slightly unfair, so I thought I should write a followup to his final part. My argument was mainly that storing flat, unstructured data was not enough for most content projects, and the difficult questions of structure needed to be addressed. Daniel&#8217;s third [...]]]></description>
			<content:encoded><![CDATA[<p>I <a href="http://blog.technologyofcontent.com/2009/11/id-love-to-stay-here-and-be-normal-but-its-just-so-overrated-or-how-i-learned-to-stop-worrying-and-love-html/">posted recently</a> on an unfinished series by Daniel Jacobson, which was perhaps slightly unfair, so I thought I should write a followup to <a href="http://blog.programmableweb.com/2009/11/11/content-portability-building-an-api-is-not-enough/">his final part</a>.</p>

<p>My argument was mainly that storing flat, unstructured data was not enough for most content projects, and the difficult questions of structure needed to be addressed. Daniel&#8217;s third part addresses how they do this. Actually when I first looked through the NPR content after reading the first two articles I could not find any content that had inline HTML, clearly was an accident should have read more, as it is used.</p>

<p>The actual NPR process is interesting. In particular it shows the amount of care and attention in curating content that is needed to keep it reusable, repurposable and valuable. Semantic content requires you understand the range of meanings that it encodes and how to work with them, and transform them, and quality control them. And that means you need to know every tag and attribute that is going into the system and what that means for every output you are or may using.</p>

<p>One of the outputs that many people forget is plain text, and NPR is very clear on the writing style that is necessary for writing for HTML and plain text. Everything must make sense in the text form; links should be additional information that adds to the text not necessary to understand it. And of course no &#8220;click here&#8221;. Text output for other devices may vary between the expressiveness of HTML and that of plain text. Text that reads sanely also helps screen readers and other assistive technologies make the content understandable.</p>

<p>The key points here are that content markup must be</p>

<ol>
<li>Valid. Processing is likely to be inaccurate without valid content, and tools will be more limited in how the process it, or will fail unexpectedly. Best fix this at the beginning of the pipeline.</li>
<li>Meaningful. You need the markup to mean what the author intended, so look at interface usability and training.</li>
<li>Accepted. You do not have to accept all valid XHTML, say. For a start, XML is an extensible language! You can choose a whole range of markup for a story, from the very minimal, to marking up each person and place involved, or more.</li>
<li>Stored. The marked up text must be stored; the NPR decomposition is plain text plus plus normalized markup which may work for some systems; storing marked up HTML without output transforms may work out better for others.</li>
<li>Processed. You need to handle each kind of markup for all output mechanisms, so they need to be introduced in a controlled way, although this should not be difficult. Changing markup is something that may need to happen.</li>
</ol>

<p>I think the third part is the interesting one. Information architecture with websites often stops at the content level, missing out on this information microarchitecture of the textual content itself, leaving this to authors without enough guidance to build a consistent structure to maximise the long term content value.</p>

<p><a href="http://www.teara.govt.nz/en/fossils/7/1"><img src="http://www.teara.govt.nz/files/p9047niwa.jpg" alt="Fossil foraminifera" width="100%"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Wave experiment: Things We Hate About Content Management</title>
		<link>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/#comments</comments>
		<pubDate>Sat, 24 Oct 2009 13:33:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Wave]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=135</guid>
		<description><![CDATA[Experiment with writing in the wave.]]></description>
			<content:encoded><![CDATA[<p>Well, that was the story, six people in content management writing a blog about stuff using Google Wave. Mostly for the first time I think; something to do with those fresh invites.</p>

<p>Other links are here: <a href="http://jonontech.com/2009/10/23/a-collaborative-google-wave-blog-post/">Jon Marks</a>, <a href="http://irinaguseva.wordpress.com/2009/10/23/things-we-hate-about-content-management/">Irina Guseva</a>, <a href="http://www.persuasivecontent.com/i-predict-a-cms-riot-1-hour-6-people-1-wave">Ian Truscott</a>; other participants Adriaan Bloem, Andrew Liles, <a href="http://contentedmanagement.net/blog/bove-the-contentious-waves-he-kept/">Philippe Parker</a> (first use of Wave over GotoMeeting?)</p>

<p>Well it was fun. Technical difficulties, lost sync and crashed a few browsers, some people lost whole machines though. Safari coped better than Firefox. It took a while to realize what was happening here, hey but this is in beta!</p>

<p>As a brainstorming tool at worked pretty well. I thought it scaled pretty well. The named cursors indicate who is in the bit you are in, but for brainstorming you can look, write another point, move, continue, not edit much. After half an hour of getting to bulletted lists, a bit of moving around the heavy writing started (after a discussion at the top in our proxy process section; we should have split the thing up a bit).</p>

<p>There is a great tendancy to write temporary notes about the discussion and then just delete them. Which feels odd, data and metadata together of course. The editing process was odd, you would find orphaned bits, move things, try to join stuff up to make it flow, while it was all changing around you. Pretty chaotic. Bits that no one expanded into prose got junked (quite a good edit method, as they couldnt stand up themselves).</p>

<p>Here is the &#8220;finished&#8221; article&#8230; which cannot be attributed to anyone individually of course&#8230; the subject was chosen about 10 minutes in, just as something people would have something they could easily contribute into this situation, there are some good points in there though!</p>

<p><strong>Things We Hate About Content Management</strong></p>

<p><em>- By The Motley Crew</em></p>

<p>It was a lovely Friday morning/afternoon, and we were Waving. The experiment initiated by McBoof (yes, that one) brought together 6 CMS folks from around the world. The event gathered together analysts, journalists, vendors, system integrators to Wave on a topic that was decided at that very moment. We had one hour (in between conference calls and other job thingys) to pick a topic and Wave it.</p>

<p>A little collab on what exactly to Wave about later, we decided to do &#8220;a mindmap of things we find annoying in CMSs.&#8221; To up the ante, we also decided to take the original bullet points (deemed &#8220;too easy&#8221;) and convert the whole thing to prose. Was the tool given really up to the task? Were our minds flexible enough to wrap around this kind of realtime collaboration?</p>

<p>In the beginning &#8212; we blame the tool <img src='http://blog.edge3.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  &#8212; we were Drowning, not Waving. We (almost) didn&#8217;t fight about edits. We almost didn&#8217;t step on each other&#8217;s toes. All in all, it turned out to be a fun and productive collaborative exercise. Read on to see for yourself.</p>

<p><strong>Cosmetic Issues</strong></p>

<p>There really should be a CMS UI fashion police. As there should be a Magic Quadrant for shoes and handbags. Why? Well, there&#8217;s a couple of issues.</p>

<p>For instance, sloppy, non-designed design. You know the kind of thing that has not been thought about and reworked and made to feel right. The sort of thing coders do if you don&#8217;t force them. But at the same time, over-designed interfaces can be just as bad: the designers and developers really need to be on speaking terms.</p>

<p>When building a system that works, you can&#8217;t have the development team in the basement on a sustenance of Jolt coding away into the night, and the designers in the penthouse in turtleneck sweaters sipping espressos. Too many CMS designs end up being programmer vs. end-user friendly. And this is not the best way to charm away those marketing and web content folks.</p>

<p>Developers and designers need to talk to each other and essentially, both should talk to users &#8211; not just eat your own dogfood &#8211; but listen to what dogs like to eat. A developer or UI designer are not content editors, marketers or knowledge and information workers.</p>

<p>Some vendors say that the agonizingly and depressingly black UI backgrounds are hip and modern. Well, they are not, really. Who told you that? Especially if you add a Star Trek theme to it and sprinkle in some stars and cosmic swirls, because if Apple does it, it must be cool right? Not pointing any fingers, but I would quit if I were a content manager having to spend my 9-5 staring into the &#8220;black hole&#8221; of some of the CMS UIs that are out there on the market.</p>

<p>Even pop-ups seem less annoying when compared to dark UIs. Which brings us onto&#8230;</p>

<p><strong>Interface Issues</strong></p>

<p>Interfaces need a comfortable lived in feel. Content management is something people work with every day, it is their interface to their job. You meet people who hate the interface, and that makes their work a heap of pain. I have seen people who describe the 44 clicks it takes to insert an image. You have a responsibility to these people, to make them love the content and make the tool disappear.</p>

<p>We all hate it when the interface does something on its own that ruins your context. E.g. a page refresh, or in Wave the jumping around of the scrolled window in some cases <img src='http://blog.edge3.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  Or the lack of an easy way to bookmark, so you can reference someone to the content. Remember people will be collaborating and need to send links around. Make sure the UI is a proper web application with URLs. And why do tasks that are easy to describe and often repeated in exactly the same way still take more than a few clicks? (Or maybe even dozens of clicks.) With bonus points for forcing users to use dialogs or tabs to enter mandatory information. Remember people do not have all the information in the right order.</p>

<p>Also, we need sane conflict merges. Check in and check out is too extreme for most uses. But people want to edit offline still. Of course Wave doesn&#8217;t have an offline: Google thinks this problem is going away, it&#8217;s real time so there are never conflicts (that&#8217;s defined in the XML protocol; it&#8217;s quite interesting if you are that way geeky). Does Google have the right answer here? Well, the Motley Crew is struggling here, and some browsers lost sync during this experiment.</p>

<p>&#8220;Power users&#8221; (those who use it all day long) of CMSs needed to have a &#8220;Desktop&#8221; experience. What does Desktop Experience mean? Well, it doesn&#8217;t really have to be on the desktop &#8212; these days it is perfectly possible to get very close to a hitherto Desktop experience in a browser or similar. these are qualities: very low latency from action to response, no page refreshes, modal and modal-less dialog boxes as appropriate, &#8220;push&#8221; notification.</p>

<p><strong>Architectural Issues</strong></p>

<p>Architectural issues of the wave overtook any architectural issues of Content Management Systems. The fact that we authored this entire article in a single blip didn&#8217;t help, and slowed everything down enormously. McBoof learned the hard way that he really need a new laptop and spent most of the session giving his machine CPR. Next time we&#8217;ll do each paragraph in its own blip to stop FireFox going down like a Led Zeppelin.</p>

<p>Monolithic systems. Build it out of pieces that the client can not use all of. Obviously your pieces may work together better, but there should be components. Do not try to reinvent all kinds of wheel. &#8220;Best of breed,&#8221; though, is just another weasel marketing idea, as if systems are pinnacles not about meeting requirements.</p>

<p>Marketeers are adroit at using the term Best Practice to position Their Way as the only way that a particular matter can be solved. (Many of us live in that netherland of having to pedal that point of view, but it is a falsehood that the careful buyer should try to see through.) I think this devalues genuine best practice, vendors should cite references</p>

<p>Most often a marketeer&#8217;s Best Practice view is the only one they subscribe to as their product development has paddled up the wrong stream and cannot or won&#8217;t reverse their architectural design (probably because of the cost of doing so). This intransigence most often causes a product to doom itself. (Think of IBM and The Mainframe Is The Only Way To Do Serious Business).</p>

<p>Who really still believes that there is a place in this world for Flash or Java Applet based Rich Text Editors? TinyMCE, FCKeditor and others are filling the gap left by Ektron when they bit the hand that feeds and entered the CMS market. Ephox is trying to spread, but I find it difficult to come up with an excuse to use an Applet over HTML with javascript these days. Stick with the standard.</p>

<p><strong>Business Issues</strong></p>

<p>Where you are buying into something that you may very well need to change or integrate with there is strong benefit in considering Open Source. Open Source used to frighten commercial software companies but we have come along way on that road to understand that commercial organisation can operate in an Open Source world and benefit. This does not necessarily mean that their prized system needs to be fully opened up, but taking the spirit of it to mean that you are completely open to people seeing and learning from your code how it operates.</p>

<p>Exactly what you need to see opened up varies. In a CMS there may be a subsystem that stores the content or one that allows a Rich Text Editor. These arguably don&#8217;t need to be opened up, but when a CMS ships with modules for, for example, an RSS feed widget, calendaring tool, prebuilt webforms, users who then want a variation on this module can benefit from seeing how the &#8220;pros&#8221; did it, they can then use it as a starting point for their own different implementation.</p>

<p>We really don&#8217;t need vendors that pay lip service to the buzzwords. When they think the new CMS buzzword &#8220;engagement&#8221; is just a screenshot of Google Analytics. Or when they add an image picker and call it DAM. And a cross-over between WCM and ECM? Don&#8217;t think WCM is like ECM and it&#8217;s about organizing content, not about effectively communicating with the audience. And don&#8217;t think that if you organize the content, you can automatically communicate effectively.</p>

<p>Completely different, but equally frustrating, is procurement (and the procedures that go with it.) Procurement folk don&#8217;t recognise the importance of user adoption to the success of the project &#8212; of the black background and all the UI issues pointed out previously. If a CMS is procured according to procedure, the selection is a success to them. But those same rules are often a recipe for ignoring what the users really need.</p>

<p>At the same time, budgets that aren&#8217;t transparent are an issue &#8211; customer and vendor should be able to have a sensible grown up conversation. As a customer, of course you want good value, but how cheap are you? But to vendors: many licensing models don&#8217;t make any sense, and force you to do stupid things. People are scared to have that conversation &#8211; the best architectural fit first I say, lets figure out an appropriate license around that.</p>

<p><strong>Conclusion</strong></p>

<p>So much hatred rolled up into a tight little ball of anti-CMS rage. Who would have expected it from such a respected bunch of CMS folk. We hate the designs, the interfaces, the architectures and the business. Time for a beer/wine? Wave good bye!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>RESTful daydream #4</title>
		<link>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 15:34:41 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[CMIS]]></category>
		<category><![CDATA[jcr]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=110</guid>
		<description><![CDATA[In favour of a REST architecture for a web content repository]]></description>
			<content:encoded><![CDATA[<p>This blog post has gone through far too many iterations, and taken far too long to write! It got much shorter in the process though.</p>

<p>It started with an idea I had, in an innocent sort of way. I thought if I looked at the JCR specs for a bit I might find some kind of way of building a non Java interface with them. You know, maybe there might be a nice REST architecture waiting to get out. But of course there is no such thing. It is an application definition. There are not even that many ways of implementing it, other than choosing your object persistence method to be a database, file, or something else.</p>

<p>The REST architecture is notionally provided by another layer, such as Apache Sling, but Sling is in no way a REST layer, it is a URL dispatcher and scripting and application layer with which some REST style applications can be developed. With that you end up with a pretty heavyweight development framework, indeed together you have much of Day&#8217;s CMS offering in effect, rather than a lightweight REST repository solution.</p>

<p>I had a look at CMIS again. Fielding once <a href="http://roy.gbiv.com/untangled/2008/no-rest-in-cmis">laid into CMIS for not being REST</a> and you can see why, although some improvements have been made since that. Although resources are discoverable through hypertext, there is a fair amount of semantics that needs to be known to understand what a type or a checkout means, and the search queries are obviously just RPC wrappers. It is not too bad though, but unfortunately the data model does not map well onto web content management right now for obvious historical document management reasons. Fixable? I think it serves a part‎icular purpose well and should probably not be forced into anything else, as we need it to succeed in its field.</p>

<p>Day claims that JCR is <a href="http://dev.day.com/microsling/content/blogs/main/fudbusting2.html">not a Java standard</a> in an odd way, that you can implement the API in another language. Thats a strange argument to make, especially as the types are defined as Java types, and standards without interoperability are pretty vague. Without some sort of wire format or ABI this is meaningless outside the JVM world. People are making <a href="http://www.simpcore.org/">JCR like repositories in PHP</a> but outside any standards process, so in the end this just becomes a PHP repository project; Typo3 seems to be building another, also closely aligned to JCR.</p>

<p>The problem with these efforts is that it is not helping the balkanization of web CMS, which is already fragmented by language and API, which is ridiculous in an industry that is about the web. The web has an architecture (REST) and an API (HTTP). Building web content management on Java APIs or PHP APIs or .NET is a legacy way of thinking; it is acceptable for document management given its role in existing enterprise architectures, but it is not going to work if we want to get widespread acceptance in web development; in the short term it is the easy path, it is what people are used to, but a forward thinking industry needs to look at defragmenting the landscape and building future proof tools.</p>

<p>The odd thing is that a web content repository alone surely lends itself to a simple REST architecture. Content is after all lots of small resources with relations. Hypertext. It is pretty much in presentation a fairly dumb web application, although with a fair amount going on behind the scenes. It takes content, relates it to other content, and serves it back, with authentication and versioning. Everything else is in other system layers, transforming it and so on. Not simple, but well defined; lower level than JCR + Sling say</p>

<p>So we need to work on a web content repository model, as a community. Process wise, it makes sense for this to sit in an organization like AIIM, as a content management based industry body. It may well be that what ends up coming out of this is more standardized architectures and semantics and open source implementations rather than the tighter prescriptions of JCR and CMIS; I have some ideas along these lines that I need to code up. I have had some discussions and there is a degree of interest in some sort of solution; who is interested? Or is infrastructure dead, everything ust wants interfaces?</p>

<p><a href="http://dilbert.com/strips/comic/2009-09-02/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/6000/400/66480/66480.strip.gif" width="480" alt="Dilbert.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Metadata is not what it used to be</title>
		<link>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/#comments</comments>
		<pubDate>Mon, 31 Aug 2009 14:54:05 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[metadata]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=95</guid>
		<description><![CDATA[&#8220;Spent a week in a dusty library waiting for some words to jump at me&#8221; Camera Obscura. Kas Thomas in his contribution to Julian Wraith&#8217;s popular thread on the future of content management really managed to make me disagree, the first of the posts that has! The theme of this blog (yes it has got [...]]]></description>
			<content:encoded><![CDATA[<blockquote>
  <p>&#8220;Spent a week in a dusty library waiting for some words to jump at me&#8221; Camera Obscura.</p>
</blockquote>

<p><a href="http://www.cmswatch.com/Trends/1679-Future-CMS-Metadata">Kas Thomas</a> in his contribution to <a href="http://www.julianwraith.com/?p=328">Julian Wraith&#8217;s popular thread on the future of content management</a> really managed to make me disagree, the first of the posts that has!</p>

<p>The theme of this blog (yes it has got one) is that content management is changing as the web way of working starts to infiltrate the enterprise. And the web way of metadata is not what the old document oriented way was.</p>

<p>Kas says &#8220;Keeping knowledge about a file separate from the file itself is a hugely important concept.&#8221;</p>

<p>Looking at that point first. Look at any newish document format, from <a href="http://en.wikipedia.org/wiki/PDF/A">PDF/A</a> to <a href="http://en.wikipedia.org/wiki/EXIF">EXIF</a> via the HTML <code>meta</code> tag, even a Word document and you will find embedded metadata. Remember that documents are emailed around, generally get lost. Metadata is the dog collar with a name and phone number on, saying version me and send me home. This trend is not going away, documents are becoming self contained.</p>

<p>Kas then continues &#8220;A file&#8217;s metadata becomes its interface to the outside world. It&#8217;s like a service descriptor.&#8221; That is simply not the way the web works. Resources use self describing formats like HTML for core data, and then all the other important metadata is linking information. It used to be that you had a blob file type and a program that could understand it, that was the basis for the desktop architecture, but that is not the case any more, even on the desktop you have a choice of applications that can understand a given file type, and your choice depends on what you can do with them. The web architecture goes much further, and resources become fully self-describing, a browser can understand all the web as every resource carries its own description, and bundled code to help you interpret it. A web page is its own service descriptor, and defines application state through hyperlinks. The web architecture has never had service descriptions.</p>

<p>There is one vital part of metadata that is not kept with a file, that is the link. Kas says &#8220;content, on the whole, is becoming richer, less structured&#8221;, missing out completely on the big picture that content is being structured by the imposition of links onto it, by its transformation into a hypertext, that creates a much richer structure than the individual documents have in themselves. Documents contain metadata about other documents in the form of links. The semantic web project is an attempt to add further richness to this structure. &#8220;What does the trend toward richer, less structured content mean for management of content?&#8221; well that means that content management is going to be about managing those links and relations between items, a lot more than it is now, when it came from a background of just managing documents, each an isolated item.</p>

<p>In a way that does come back to &#8220;Keeping knowledge about a file separate from the file itself&#8221; but not at all in the way Kas was trying to argue. Now time to link this to his argument to create a structured discussion&#8230;</p>

<p><a href="http://www.threadless.com/product/1053/Now_That_s_Dope"><img src="http://www.threadless.com//product/1053/zoom.gif" width="450px"/></a></p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Content Enabled Vertical Applications and taking the CMS apart</title>
		<link>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/#comments</comments>
		<pubDate>Wed, 26 Aug 2009 23:00:25 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[CEVA]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=70</guid>
		<description><![CDATA[Part 2 of my response to Julian Wraith's future of content management meme. This part looks at CMS architecture, rather than the technology choices of the first part, talking about application development and content repositories.]]></description>
			<content:encoded><![CDATA[<p><em>This is a second part of my response to Julian Wraith&#8217;s <a href="http://www.julianwraith.com/?p=313">future of content management thread</a>; the <a href="http://blog.technologyofcontent.com/2009/08/cms-technology-choices/">first part</a> was more about the technical decisions, this one more about the architecture, and responding to some of the other issues. I have a new post which is more of a general view on <a href="http://blog.technologyofcontent.com/2009/10/content-applications-briefing/">content enabled vertical applications</a></em></p>

<p>Stéphane Croisier in <a href="http://stephanecroisier.jahia.com/new-blog-post-what-is-the-future-of-content-m">his post in the thread</a> says</p>

<blockquote>
  <p>There is currently an unclear separation between applications frameworks and content infrastructure. But at the end of the day everything is content and every application has first to deal with content items rather than with processes, states, UI components or other application oriented paradigms.</p>
</blockquote>

<p>In my general work in content management I think this is one of the things that has become very clear, and that &#8220;unclear separation&#8221; is very apparent. First content; it is a very good start, and every project needs to be grounded in content, and in the structure of content, the architecture of content and the user IA. However, the processes, states, UI and other parts of the web application are beginning to dominate projects. In the Gartner terminology we are building CEVAs (Content Enabled Vertical Applications that is), as integration, process, e-commerce, CRM parts of the project start to dominate the requirements over the purely content based parts.</p>

<p>It seems that most of the contributors to Julian Wraith&#8217;s future of content management thread who mention it see content management moving to a clear split between repositories (Common Content Information Infrastructure as Stéphane calls them) and applications and content management systems and CEVAs implemented on top of these.</p>

<p>I don&#8217;t think we can yet see what the successful content infrastructure stack will be; as I said in my earlier post there are technical decisions that have to be made that there is not yet agreement on (except between me and <a href="http://blogs.alfresco.com/wp/pmonks/2009/08/07/the-future-of-cms-technologies/#comment-148">Peter Monks</a>!) and the existing putative standards (CMIS and JCR) do not extend far enough to take a position on. But we can see that this is the way things are going. Quite clearly the standards for the infrastructure will be open, and most implmentations will be open source. There will be some vendors who do not embrace standards, but they will need to be the few large ones or they will lose out. Infrastructure environments remember (think Linux, Apache) are mainly open source, although there is scope for proprietary layers at the very high end (think Amazon, Google).</p>

<p>At the application layer, as Stéphane says, everything is a mashup, content from different systems, content from other APIs, this is the we application layer. It needs to be content aware, very much so, but it needs to be an application development environment. This is where most people will see the value added in the content management business, although in fact the value here is in implementation, design and integration services, not the technology itself. Application development environments no longer make a lot of money, and again they are dominated by open source (think Java, Eclipse, JBoss, Django).</p>

<p>Once you take out content infrastructure and application development, and the other tools like search, workflow, there is a core of tools for working with content, to support reuse, refactoring, cleaning, import and export, that one might call a Content Workbench. There is a lot of potential value if these types of tools are the value added end of the business, as they can differentiate vendors and add value. Interfaces for merging changes and so on would be part of this type of toolkit. This is the stuff where good UX means timesaving for content workers, but it is difficult to build on a customized per-project basis, so this still offers value from a particular vendor.</p>

<p>Overall then we see a picture where the monolithic CMS starts to break apart into infrastructure, application and toolkit layers, that can perhaps gradually be mixed and matched together to build content applications. We are just seeing the beginnings of this now.</p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Resource Oriented Enterprise</title>
		<link>http://blog.technologyofcontent.com/2009/08/the-resource-oriented-enterprise/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/the-resource-oriented-enterprise/#comments</comments>
		<pubDate>Sun, 23 Aug 2009 20:40:58 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[REST]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[ROA]]></category>
		<category><![CDATA[ROE]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=49</guid>
		<description><![CDATA[This is the start of getting a bunch of ideas about how web technologies are going to change the way business works and enterprise software is built. I have been meaning to start writing this up for a while. Here is an initial overview, somewhere to start... follow up post soon.]]></description>
			<content:encoded><![CDATA[<p><em>This is the start of getting a bunch of ideas about how web technologies are going to change the way business works and enterprise software is built. I have been meaning to start writing this up for a while. Here is an initial overview, somewhere to start&#8230; Some of these ideas are floating around <a href="http://www.restfulness.info/">elsewhere</a> it seems, from web practitioners working in the enterprise, but there is surprisingly little written up so far.</em></p>

<p>I went to a  meeting a while back with a company starting to move to a web service based, internal API based architecture, and there was a minute where the CTO said (more or less) &#8220;does anyone know of any let or hindrance to this being a SOAP API?&#8221;. Like those moments at weddings, no one spoke out. But I do object. SOAP is not the backbone of a happy architecture, but we do not have the strength to say no right now.</p>

<p>So here is the beginning of the shout out. The enterprise architecture needs to be a resource oriented architecture (ROA) not a service oriented architecture (SOA). The enterprise needs to move from SOAP to REST.</p>

<p>Web developers know this. When given the choice, <a href="http://www.oreillynet.com/pub/wlg/3005">9 out of 20 cats prefer to use REST</a>. It is more productive. It does not involve programs to generate huge chunks of useless code. It involves hypertext (HATEOS) not opaque documents that are mappings of database schemas.</p>

<p>What are the resources in the enterprise? Start with customers. A CRM system is a clear example of something that needs to be modeled as resources. You need API calls that can find products a customer has bought, support tickets, all the core data that you would need to build customer service portals, internal support applications. What is your customer API?</p>

<p>Building this framework can be incremental. You need either tools that already provide REST APIs, which is becoming easier, or a web application development framework. This is the convergence point between application frameworks and web content management. The big issues are the domination of legacy applications with bolt on APIs that may have moved from SQL to CORBA to SOAP over the years. You can however build a REST API over at least parts of these applications to open them up into the enterprise.</p>

<p>OK thats the beginning. Apart from cheaper application development where is the real value?</p>

<p>Resources are the first part of the REST architecture, and resources are much easier to work with than services, as they share uniform semantics, and they are addressable, two keys to making application design simple. The big bit though is the (harder to understand) HATEOS, Hypertext as the Engine of Application State. What this involves is moving the vague and very expensive field of business logic from code (often code that is not even owned or understood by the enterprise, as it has been embodied into code written into systems from suppliers) into hypertext documents.</p>

<p>A concrete example might help here, based on another recent consulting example. A professional body has a set of different membership levels, exams, rules, CPD requirements and so on. These are resources, described states, and hypertext links of state transitions. This is the core of the business logic of the organization, embodied in documents. It can be used to build the membership application, as it describes for example the actions open to a member at the present time, as well as providing potentially a browsable structure, as well as a formal structure that can be used to ask questions about membership (what are the routes to becoming X, for example).</p>

<p>You can view this as a move of business logic to a declarative rather than imperative form. Things do not have to be stored in documents in the underlying storage, although the interface is document and hypertext based. Hypertext makes things human browsable, and declarative makes them computer browsable, and HATEOS makes them discoverable. Business logic becomes content rather than data, rather than there being tables of parameters that the business logic black box feeds off, the states themselves become resources that can be discovered, addressed, and reasoned with.</p>

<p>As states become resources in a REST model, because of the statelessness of web applications, both states and state transitions become first class objects. Constructing ad-hoc queries about state changes should become easy (what percentage of customers renew their contracts, say). The first class objects need to be the ones that are meaningful for the business.</p>

<p><em>There is more to this, it needs expanding. Perhaps we need a manifesto. The software architecture of the web is going to make huge changes to the architecture of other realms, more than most people realize it seems. It is inevitable as things get re-architected around the web that more areas will be affected; also there are the benefits of scalability and reliability that are being created for the web. If anyone has any other business case references for REST please post in comments.</em></p>

<p><a href="http://www.dmst.aueb.gr/dds/etech/arch/rom.png"><img src="http://www.dmst.aueb.gr/dds/etech/arch/rom.png" width="450px"/></a></p>

<p>(Image from <a href="http://www.dmst.aueb.gr/dds/etech/arch/indexw.htm">this software architecture overview</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/the-resource-oriented-enterprise/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>CMS technology choices</title>
		<link>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/#comments</comments>
		<pubDate>Sun, 02 Aug 2009 20:40:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=33</guid>
		<description><![CDATA[Response to Julian Wraith's "The future of Content Management…" post covering some of my arguments with Jon and some technical decisions that the content management community will have to make to get to that future…]]></description>
			<content:encoded><![CDATA[<p>The future of Content Management is what we make of it right now, it has not been decided or built yet. Remarkably for a market with so many people in it there are no hard and fast rules and nothing definitive. However we are coming to the end of the experimental phase and the hard decisions are going to be made now, and the future for a fairly long period will be determined pretty soon now.</p>

<p>Although the vast majority (but not all) of open source content management systems are continually trying to reinvent the blog, we are talking about internet infrastructure here, and the future of content is going to be open source, like the rest of internet development. I also believe that long term the web project will overwhelm the legacy areas of document management, although it may take some time. Hypertext, the web architecture, XML, HTML, and all those standards are here to stay and to dominate long term. Content management will also become pervasive long term, as the blogging projects show, as the right tools make content management a natural part of workflow. Content management succeeds when it replaces the file and folder paradigm with a content-led paradigm.</p>

<p>In my conversation the other day with <a href="http://jonontech.com/2009/08/01/i-have-a-dream-of-the-cms-future" title="Jon's post on the subject">Jon</a> I was arguing that although we agree on many of the technical issues there are real decisions that need to be made about what needs to be built to get the the content management future. Below are some of my lists of differences. Generally, I think the future of content management is going for the left hand one of the pairs, although some are not clear yet. I have probably missed a lot of the things to determine, but it is a start.</p>

<h2>Architecture &ndash; API differences</h2>

<p>These may cause API and other more significant differences, though some may not matter (eg git can read svn repos, but not vice versa).</p>

<ul>
<li>REST vs SOAP</li>
<li>REST vs Java native interfaces</li>
<li>distributed version control (git) vs file based (SVN)</li>
<li>compositional vs monolithic</li>
<li>structured content vs files</li>
<li>relations vs metadata</li>
<li>web (hypertext) content vs documents</li>
<li>URIs vs referential integrity</li>
<li>web applications with content management vs content management systems</li>
</ul>

<h2>Architecture &ndash; performance differences</h2>

<p>These could have different implementations with different performance characteristics potentially. These are basically IA differences to a large extent, so they do depend on the type of problem being modelled and the modelling process. Models and performance are linked though, and the best we can do is to make parts of this pluggable so that a range of performance characteristics can be used.</p>

<ul>
<li>unstructured vs structured</li>
<li>sparse vs dense</li>
<li>untyped vs typed</li>
<li>NoSQL vs RDBMS</li>
<li>permission hierarchy vs permission graph</li>
<li>scaleable vs local</li>
</ul>

<h2>Development process</h2>

<p>This is key to getting the product to where you want it to be.</p>

<ul>
<li>open source vs proprietary</li>
<li>API driven vs UX driven</li>
<li>ubiquitous content management vs isolated systems</li>
<li>agile vs monolithic</li>
</ul>

<h2>Architecture &ndash; usage differences</h2>

<p>These could potentially just come down to the ways or tools with which components are joined together, maybe they do not affect architecture per se.</p>

<ul>
<li>social media vs controlled content</li>
<li>programming languages (Javascript, XSLT) vs templating systems</li>
</ul>

<p><a href="http://browsertoolkit.com/fault-tolerance.png"><img src="http://browsertoolkit.com/fault-tolerance.png" alt="fault tolerance" width="500px"/></a></p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Content Delivery Network via Amazon Web Services: CloudFront: blog.edge3.org

Served from: blog.technologyofcontent.com @ 2012-02-04 13:46:07 -->
