<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; CMS</title>
	<atom:link href="http://blog.technologyofcontent.com/category/cms/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 29 Jan 2012 16:38:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Search, SQL, NoSQL, Persistence</title>
		<link>http://blog.technologyofcontent.com/2011/04/search-sql-nosql-persistence/</link>
		<comments>http://blog.technologyofcontent.com/2011/04/search-sql-nosql-persistence/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 14:01:20 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[cloud]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[nosql]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=330</guid>
		<description><![CDATA[I highly recommend the Enterprise Search London meetup, there are lots of interesting talks, thanks to our intrepid organizer Tyler Tate. Last meetup, H. Stefan Olafsson from Twigkit gave a short talk about the relation between relational databases and search engines, and whether you need a relational database if you have a search engine. Craigslist [...]]]></description>
			<content:encoded><![CDATA[<p>I highly recommend the <a href="http://www.meetup.com/es-london/events/17010043/">Enterprise Search London</a> meetup, there are lots of interesting talks, thanks to our intrepid organizer <a href="http://twitter.com/tylertate">Tyler Tate</a>. Last meetup, <a href="http://twitter.com/mrolafsson">H. Stefan Olafsson</a> from <a href="http://www.twigkit.com/">Twigkit</a> gave a short talk about the relation between relational databases and search engines, and whether you need a relational database if you have a search engine.</p>

<p><a href="http://xkcd.com/886/">
<figure>
<img src="http://imgs.xkcd.com/comics/craigslist_apartments.png" alt="Craigslist apartments" width="400"/>
<figcaption>Craigslist Apartments, by XKCD</figcaption>
</figure></a></p>

<p>Now this has been something  have been thinking about recently, and there are people who are moving big parts of their systems to just be built on search, such as the <a href="http://www.guardian.co.uk/open-platform">Guardian API</a> which is <a href="http://www.guardian.co.uk/open-platform/blog/what-is-powering-the-content-api">served from Apache Solr</a>. In this case though, the search engine is still not the system of record for the core data, which is still the Oracle based CMS which did not scale up enough to serve the API. There was some discussion at the talk about search engines that do support persistence (the D in ACID databases), something Lucene used to have a bad reputation for. My view here though is that, while actually making <code>fsync</code> work properly is a good thing, and you should not buy software that cannot recover from crashes, persistence involves a lot more than this now, such as replication, audit, versioning and access control. Building this directly into search products is a mistake. Another issue is that search engines are denormalized, and data stores of record should really be normalized to a large extent, to minimise the amount of data to be replicated.</p>

<p>There are two approaches that should work instead, however.</p>

<p>The first is more or less the current approach, to use the search engine as an index to a persistent store. I really like this approach if we follow it to its logical conclusion, which is that the persistent store in this type of application architecture should not be a relational database, but it should be a document store, that is a file, an HTTP resource, a document in a NoSQL document database, or an object in a replicated cloud storage system like S3. Modularize the database application, and split the persistence function from the index function. The persistence function provides the durability, versioning and audit and access control, with replication, backup. This can update the search index, and potentially any other types of index, such as a graph database for querying relationships, potentially even a relational database if that is the best way of querying some aspects of the data.</p>

<p>Obviously there is a potential consistency issue, if updates from the document store happen slowly, so potentially there is an eventual consistency model. Historically search was a bad offender here, as dynamic updates were not the norm and everything was batched into nightly updates, but that is going away and dynamic updates are more normal for search indexes. In principle you can have more consistency, especially in an architecture where there are fixed releases that can be consistently indexed, rather than distributed rolling updates, you choose your architecture and take your choice. Small consistency lags rarely matter in a lot of applications.</p>

<p>So you end up with an architecture with a well defined persistence layer that is not a relational database, and a set of indexes appropriate to the application, almost certainly including a full text search engine, but perhaps a graph engine too. Maybe you <a href="http://highscalability.com/blog/2011/4/6/netflix-run-consistency-checkers-all-the-time-to-fixup-trans.html">run consistency checks</a> on your indexes for peace of mind.</p>

<p>The second approach is to see that search engines were some of the original NoSQL data stores, building custom storage and indexing engines, because they had such difficult problems. Indeed Google&#8217;s BigTable, and so the ancestry of a lot of NoSQL products came from search. However the search engines around now have not yet refactored themselves on top of the NoSQL engines that have emerged from this work, although this is starting with <a href="http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/">Lucandra</a> which is Lucene persisted in Cassandra, which looks promising, offering seamless replication and distribution, and <a href="https://github.com/akkumar/hbasene">HBasene</a>, an HBase Lucene backend. These make a huge amount of sense to me, as if you are developing sophisticated search algorithms, not having to build the whole index and persistence layer as well is a big advantage, as well as the scale out potential. Of course this approach does not conflict with the first one, in fact you could choose a NoSQL backend that is aimed more at read performance than persistence, and at storing small index values fast. The hard bits with this are that the search engines have specifically customised their data storage for the particular use cases, and reworking this onto a more general backend has few apparent advantages; as you can see from the examples above, most of these changes have come from people already using the backends in question and who want a single database to manage all their data requirements, particularly once they are working with high availability and replication. Software modularity really is not at the right level yet is it, I blame object oriented programming for this lack of reusability.</p>

<p>Anyway, back to the main point. For applications like content management, an architecture based on a content store that deals with persistence, versioning, access control, replication, with a set of indexes based on search engine techniques, graph databases, and anything else your applications needs. Ideally the indexes are all based on a common set of low level primitives so the backend can be swapped out or shared between the search store and other application specific indexing requirements, so there is a single low level indexing infrastructure that can be available as a common scalable service, with different implementations available. This type of architecture is quite buildable now, and is certainly used in quite a few applications, and I think it will become much more widespread, particularly in the cloud where it seems more natural, certainly for many types of application that fit into a document type model, such as content based applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/04/search-sql-nosql-persistence/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>CMS or framework and a challenge</title>
		<link>http://blog.technologyofcontent.com/2011/02/cms-or-framework-and-a-challenge/</link>
		<comments>http://blog.technologyofcontent.com/2011/02/cms-or-framework-and-a-challenge/#comments</comments>
		<pubDate>Mon, 28 Feb 2011 15:18:10 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=296</guid>
		<description><![CDATA[I came across an interesting blog post the other day, Drupal as an Application Framework: Unofficially competing in the BostonPHP Framework Bakeoff, where Ben Buckman tried to enter a PHP web framework bakeoff using Drupal, rather than a &#8220;real&#8221; framework; he was not allowed, and in the end CakePHP, Symfony, Zend framework and CodeIgnitor officially [...]]]></description>
			<content:encoded><![CDATA[<p>I came across an interesting blog post the other day, <a href="http://benbuckman.net/tech/11/02/drupal-application-framework-bostonphp-competition">Drupal as an Application Framework: Unofficially competing in the BostonPHP Framework Bakeoff</a>, where <a href="http://twitter.com/thebuckst0p" title="No twitter I won't use your hashbang URLs">Ben Buckman</a> tried to enter a <a href="http://www.meetup.com/bostonphp/events/16011906/">PHP web framework bakeoff</a> using Drupal, rather than a &#8220;real&#8221; framework; he was not allowed, and in the end <a href="http://cakephp.org">CakePHP</a>, <a href="http://www.symfony-project.org/">Symfony</a>, <a href="http://framework.zend.com/">Zend framework</a> and <a href="http://codeigniter.com/">CodeIgnitor</a> officially competed. However Ben cunningly sat in the back and built the test system anyway. What he also did was record his screen for the 38 minutes he was working on this, with commentary added, which is really interesting. The <a href="http://phpbakeoff.newleafdigital.com/">built site is here</a>.</p>

<iframe src="http://player.vimeo.com/video/20286577" width="400" height="225" frameborder="0"></iframe>

<p><a href="http://vimeo.com/20286577">Drupal (unofficially) competing in the BostonPHP Framework Bakeoff</a> from <a href="http://vimeo.com/newleafdigital">New Leaf Digital</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<h2>The challenge</h2>

<p>So first here comes the challenge to other CMS vendors and system integrators: spend three hours (or less) on repeating this using your system. You get an hour or so for planning, based on <a href="https://bostonphp.mybalsamiq.com/projects/bostonphpjobboard/grid">the wireframes</a>. Note there are some incompletenesses in these (typical client!), like what the tag pages look like, so improvise, and some explicit open decision points (&#8220;require some sort of password&#8221;), so use whatever is easiest. You get up to an hour on the build (points given for less!) The aim is not a polished final shippable product, but a first alpha or beta. Then you are allowed another hour for annotating the screencast , putting the site online for us to look at etc, if you need it. You can do the annotation as you build if you are pushed for time and cut the planning and you should be able to get the whole thing done in an hour, so no excuses for anyone not to have a go.</p>

<p>Basically the site is a simple job board, in a circa 2005 Ruby on Rails style, so there is nothing that a decent web content management system should not be able to do out of the box. If there are bits your system technically finds hard its probably worth fixing them (the Drupal video reveals some small things that are apparently fixed in version 7 for example). But get it close enough, as really the aim is to give people an idea of how usable your system is for this sort of build task, the approach you have to take, and how quick it is! If your data model requires something a little different, by all means explain and make something close. There is no attempt for example to impose a REST architectural style and support an API that supports PUT and DELETE on jobs, but if that comes out of the box or trivially with the CMS you use, by all means show it off and put a bit more 2010 into the solution!</p>

<p>And no trying to get out of this, or I will write up your CMS as &#8220;pretty useless compared to Drupal&#8221; or your web CMS integrator as &#8220;rubbish, use New Leaf Digital, they have balls&#8221;. If you want to do it live as a webcast then feel free to invite me along (but record it too).</p>

<h2>The bit about frameworks and CMSs</h2>

<p>It is an interesting question though, whats the real difference between web development frameworks and CMS systems. Indeed, if you think that web frameworks are nicer in every way, feel free to enter the challenge above using your favourite framework.</p>

<p>My current thinking is that a CMS is best defined as a web development framework plus an IDE that persists configuration to a content repository or database. By all means disgree with me!</p>

<p>By web development framework, I mean something that gives you a full web server, front end and backend integration tools, authentication and access control, templating and so on. A CMS has an IDE in addition, a web frontend that lets you configure it, define datatypes, forms, layouts etc. But generally, unlike say visual code editors which persist to editable files, the results of these changes in the backend are persisted to the same content repository as the actual content, either as database entries or schemas, or as tree nodes (as in JCR based systems). Hence for example these changes do not generally end up in a source code control system, although they may be versioned in the way content is, or they may not as for example Drupal CCK schema changes change the underlying database, while with a standalone framework you might have to script schema changes for upgrades in the traditional way.</p>

<p>Also the issues in deployment with CMS products due to the use of a single repository for content and config are clear, even if many systems do try to make them separable by various means; the Drupal example shows this to some extent as all the frameworks have their <a href="https://github.com/bostonphp/Framework-Bakeoff-2011/">code available on github</a> but to set up your own instance of the Drupal example you would need some code, and a database dump which you would then have to filter the content in the example instance from.</p>

<p>This also makes it clear why CMS systems are usually very tied into one framework, as the IDE has to embody a lot of implicit knowledge of the data model and the way the framework wants to do things, as well as how to persist changes. Designing a more decoupled approach is much more difficult, although perhaps possible (thats another blog post to come, the decoupled CMS). But the streamlined coupling makes solving problems that fit the data model easy and quick, as the challenge above demonstrates. Of course if the type of problem does not match up to the the model that the CMS applies, then it comes down to trying to force a teapot into round hole.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/02/cms-or-framework-and-a-challenge/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>wysiwyg editors in web content management</title>
		<link>http://blog.technologyofcontent.com/2011/01/wysiwyg-editors-in-web-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2011/01/wysiwyg-editors-in-web-content-management/#comments</comments>
		<pubDate>Tue, 04 Jan 2011 12:57:22 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[wysiwyg]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=258</guid>
		<description><![CDATA[I am slowly making up a set of blog posts about the components of content management systems, starting with the earlier one on content repositories. Coming up next will probably be templating systems. In the beginning the web was editable in the browser. Tim Berners Lee made it so. But this did not last for [...]]]></description>
			<content:encoded><![CDATA[<p>I am slowly making up a set of blog posts about the components of content management systems, starting with the earlier one on <a href="http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/">content repositories</a>. Coming up next will probably be templating systems.</p>

<p>In the beginning the web was editable in the browser. Tim Berners Lee made it so. But this did not last for long. Eventually Microsoft restored this, to some extent, with the <code>contentEditable</code> properties and related Javascript interfaces. Now these have a lot of foibles, and the current HTML5 standardization has not done a huge amount of work here yet, certainly not adding new features or changing the basic interface substantially, so a lot of behaviour is currently underdefined, which makes cross browser compatibility more difficult. While the main desktop browsers all implement the interface, mobile ones do not in general yet, even on form fators such as the iPad, which could very well be an effective content editing device.</p>

<h2>Should you use one at all?</h2>

<p>Almost every CMS supports wysywig HTML editing, but it is not entirely clear how much they are used. To be quite blunt, they have always been among the worst editing environments ever devised, overlaid with a poor HTML model and a tendency towards presentational markup. Most users do their editing elsewhere, at least for originating content. I think though that we are at the point  where this can be improved.</p>

<p>The first thing is to kill the presentational markup, and instead support customised semantic use of classes. On the basic level this is easy to implement; there is probably more work to easily support more advanced HTML templates, for example to mark up recipes or other specialist content in consistent semantic ways through this type of interface. Pluggable schema guidance might be needed for this.</p>

<p>The second is to make the editors themselves sane. First step is of course local offline HTML5 storage, so that drafts are still there if you wander off, your computer runs out if battery or whatever. Then just make them nicer to use, keyboard friendly and so on.</p>

<p>Then there is the concurrency question, you may want to show concurrent editing, and the versioning model that applies. This does not have to be the Google Wave style simultaneous editng, although that is one option. People may be editing content for release in different versions.</p>

<p>Import from other editors is of course still important, and making it as painless as possible for the common tools is of course important.</p>

<p>Also, remember it is an editor, people like a choice of editors, so loosely couple it, make it standalone and easily changed.</p>

<h2>What not to use</h2>

<p>Old school implementations, with nothing now to recommend them that I can see; often they seem to need server side components too, which should not be required now when pure Javascript solutions should be sufficient.</p>

<ol>
<li><p><a href="http://www.openwebware.com/">Open Wysiwyg</a></p>

<p>Not off to a good start here as the <a href="http://www.openwebware.com/wysiwyg/demo.shtml">demo</a> does not work in Chrome, works in Opera and Firefox. I guess it is not really in development then. Does not create paragraphs by default, just uses <code>&lt;br&gt;</code>. Still believes in web safe colours and fonts. A classic example of the old school, but not up to date. Open source.</p></li>
<li><p><a href="http://ckeditor.com/">CKEditor</a></p>

<p>Much more promising <a href="http://ckeditor.com/demo">demo</a>, . Paragraphs are real (uses a <code>&amp;nbsp</code> for blank ones, which is I suppose ok), there is a paragraph style dropdown that manages paragraphs, headings and divs. The style dropdown is a bit disappointing as it sets styles not classes, and you can still set fonts and so on. Works in Chrome, Firefox and Opera. The old school done right. Open source.</p></li>
<li><p><a href="http://tinymce.moxiecode.com/">TinyMCE</a></p>

<p>Ah yes, the classic one. Feature set similar to CKEditor, in taht it treats paragraphs the same, has a dropdown with predefined styles, that are style tags not classes. Seems a bit clunky and flaky in comparison, and has odd terminology, since when has a div been a layer? Both have a smiley insertion feature that suggests that people have been wasting time on pointless stuff rather than reconsidering how these programs should work. Open source.</p></li>
<li><p><a href="http://xstandard.com/">XStandard</a></p>

<p>Good blurb about supporting CSS properly and the importance of markup and presentation separation, but in a bout of massive fail there is no demo on the website. Your reviewer has no intention of installing it to test it, especially as it does not support Linux as a platform in a second fail. Commercial.</p></li>
<li><p><a href="http://www.innovastudio.com/editor.aspx">InnovaStudio</a></p>

<p>No demo either, apart from animated screenshots. Why are these commercial companies so full of fail? Web 2.0 products with no web trial? Seems obsessively to be about dimensions and fonts from the features list, not about separation of concerns.</p></li>
<li><p><a href="http://code-samples.cybervillage.com/activedit/">ActiveEdit</a></p>

<p>ActiveX DHTML with Java fallback for other platforms. Need I say more? Commercial.</p></li>
<li><p><a href="http://contenteditable.com/">contentEditable</a></p>

<p>Well at least here the <a href="http://contenteditable.com/">demo is the homepage</a>. All inline with a simple-ish mechanism to mark blocks as editable. Stupidly though, while it appears to be non modal, all the menus are modal popups, which completely destroys the usability.</p></li>
<li><p><a href="http://www.themaninblue.com/experiment/widgEditor/">widgEditor</a></p>

<p>Very simple editor. Adds divs not paragraphs around everything. Open source.</p></li>
</ol>

<h2>Good implementations</h2>

<p>Largely based on <code>contentEditable</code>, fixing up the inconsistencies, although Dijit is based on the less flexible <code>designMode</code> which means it has to run in an iFrame.</p>

<ol>
<li><p><a href="http://developer.yahoo.com/yui/3/editor/">YUI editor</a></p>

<p>Note there is also a <a href="http://developer.yahoo.com/yui/editor/">YUI 2 version</a>. In many ways the <a href="http://developer.yahoo.com/yui/3/examples/editor/editor-instance.html">demo</a> is not much use, as of course this is part of the YUI framework and needs some Javscript infrastructure to customise it effectively. There is also a <a href="http://developer.yahoo.com/yui/examples/editor/toolbar_editor.html">YUI2 version demo</a>, and linked demos of plugins. Much better than anything above. Open source. Iframe based for compatibility with the grade A browsers.</p></li>
<li><p><a href="http://dojotoolkit.org/reference-guide/dijit/Editor.html">Dijit</a></p>

<p>This is the editor that comes with the Dojo widget set. Still iframe based, not a full contentEditable implementation.</p></li>
<li><p><a href="http://www.aloha-editor.com/">Aloha</a></p>

<p>Some nice <a href="http://www.aloha-editor.com/demos.php">demos available</a>, indeed probably the best demo site of any editor. Aims to fix up the issues and cross browser quirks contentEditable, and looks nice too. Open Source.</p></li>
</ol>

<h2>Other ways 1: Canvas</h2>

<p>A survey of what is up is not complete without mentioning <a href="https://mozillalabs.com/skywriter/">Skywriter</a>, which is a (code) editor written entirely in canvas. As far as I can see this is an utterly pointless approach long run. It is especially a bad idea for HTML, where fitting in with the CSS of the page matters, so lets ignore this.</p>

<h2>Other ways 2: pure Javascript</h2>

<p>You can in principle implement the whole of a <code>contentEditable</code>-like editor in pure Javascript, using a little div as a cursor and everything, no help at all from the browser. That is a lot of work, of course. I know of two implementations, one being the <a href="http://googledocs.blogspot.com/2010/05/whats-different-about-new-google-docs.html">2010 release of Google Docs</a>, which explains why the did that, in order to be able to provide the best cross browser support. The other implementation I know of is the one for <a href="http://cms.squizsuite.net/">Squiz CMS</a>, no online demo, not open source and tied into one product. This is pretty functional although it had some browser compatibility issues last time I used it, and ther website still says it does not support Webkit.</p>

<h2>Other ways 3: Markdown and Wiki markup</h2>

<p>I have to say I use Markdown a lot, this blog is generally written in it. There are two issues I see. One is that it is limited, and while you can add native HTML, this often means everything has to be redone in HTML (trying to add anchors to a list for example). There are extensions with more features, but then you get into compatibility issues. ALso there is no sanely reversible transformation that will generate the sme output as there are notational choices, so it is generally best to keep the content in Markdown all the way through. There are much the same issues with Wiki markup, although it is extensible too, which causes more issues of trying to remember constructs, and compatibility is weakened. The difficulty is that HTML is not a very good markup for humans to write, but once you get past the very restricted domain that say Markdown handles, producing something that works well is hard, maybe impossible.</p>

<h2>Other ways 4: Don&#8217;t use HTML</h2>

<p>A number of people I have worked with, and some content management systems go with a very minimalist use of HTML, pretty much text is paragraphs, everything else is structured fields. A slightly enhanced version of this, effectively a very minimalist tiny markup, corresponding to even less than Markdown is <a href="http://blog.programmableweb.com/2009/11/11/content-portability-building-an-api-is-not-enough/">the COPE system</a>.</p>

<p>Personally I believe we need to expand the use of rich text, for many use cases &#8220;just the words&#8221; is not enough, and we need equations, images, notes, asides, sidebars, tables, captions, and all the richness of rich text, so I don&#8217;t think this is viable for much serious writing, and we need to expand from the dumbed down web that this enforces.</p>

<h2>Concluding</h2>

<p>While the wysiwyg editor as it has been in content management systems is pretty awful, I think we need to expand the principle and try to do better, go back to the really editable web, and support well structured semantic HTML in rich ways, not forms and simple text boxes. There is a lot to do to get this working though still.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/01/wysiwyg-editors-in-web-content-management/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Towards a comparison of content repositories</title>
		<link>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/</link>
		<comments>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/#comments</comments>
		<pubDate>Sun, 19 Sep 2010 11:57:07 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[data modelling]]></category>
		<category><![CDATA[jcr]]></category>
		<category><![CDATA[modelling]]></category>
		<category><![CDATA[properties]]></category>
		<category><![CDATA[repositories]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=237</guid>
		<description><![CDATA[I am a bit behind on my blog at the moment, with a lot of unfinished posts. While I was writing about Lily CMS, I got distracted with an issue that I have been working on in the background for a long time. There was a comment saying &#8220;The Lily content model has been academically [...]]]></description>
			<content:encoded><![CDATA[<p>I am a bit behind on my blog at the moment, with a lot of unfinished posts. While I was writing about <a href="http://www.lilycms.org/">Lily CMS</a>, I got distracted with an issue that I have been working on in the background for a long time. There was a comment saying <a href="http://outerthought.org/blog/426-ot.html">&#8220;The Lily content model has been academically validated and accommodates data mapped from various domains, such as rich hypermedia, HTML5, NewsML, MXF, CMIS, RDF and many more&#8221;</a> which reminded me of the work I have been doing on classification of content models, as after all how you can validate a content model without a metamodel? And no one seems to have described the space of possible models, or the scope of choices. So here is my attempt.</p>

<p>The basic model is that we have resources, which may have some content and metadata attached. We are mainly interested here in the properties that can be attached to a resource, and the fact that some of those properties are relations to other resources. We are less concerned about what goes inside the structured body of a resource, although there are some issues about how many &#8220;bodies&#8221; a resource can have. So we have a model that has resources, each of which has key-value pairs, some of the values may be links to other resources, which presumably have referential integrity support</p>

<h2>Properties of properties</h2>

<ul>
<li><p><strong>STRING</strong>. Resources can have string valued properties.</p>

<p>This is a basic starting point; I don&#8217;t think I know of a CMS that does not support this. You can store any other type in a string if necessary, though binary values are more efficient in some cases.</p></li>
<li><p><strong>VALID</strong>. Property values can be validated.</p>

<p>Many core repositories do not validate property values at all, it is just a validation proxy layer that does, so as a repository principle this is fairly rare, although a small number of types (numbers, dates) might have native validated representations.</p></li>
<li><p><strong>TYPE</strong>. Property names can be validated.</p>

<p>Many systems have a typing facility that restricts the set of property names a resource can have. Others are unstructured, and any property can be added to any resource. There may be type composition mechanisms, such as mixins or type inheritance. Unlike value VALID this is more often tied to the core repository model, rather than a proxy layer, if the repository is typed for internal performance or indexing reasons, that is tied to dense rather than sparse storage.</p></li>
<li><p><strong>BINARY</strong>. Resources can have at most one binary property.</p>

<p>I have split this property as some content management systems can only have one binary property (such as an image file) on a particular resource, and multiple ones have to be constructed from multiple linked resources. This is not generally a huge limitation in an otherwise flexible system, but in a weaker system could be annoying.</p></li>
<li><p><strong>N-BINARY</strong>. Resources can have any number of binary properties.</p>

<p>This is the fully flexible version; one may still be a distinguished value in some way, but you can store all the sizes of an image (say) as properties of one resource, which makes managing them easier, although it may actually make things more difficult if it is not easy to iterate over properties (STRUCTPROP), and using multiple resources could be easier.</p></li>
<li><p><strong>STRUCTPROP</strong>. Properties can be structured.</p>

<p>Some systems have structured properties, for example some systems have a JSON representation for properties, rather than the flat key-value namespace of other systems. JSON supports arrays that can be iterated over, and structures that can be repeated. To make this sort of structure with only key-value properties you may need to use more resources. Structured properties though add a lot more complexity, and perfectly useful, but different, systems can be made with or without this model type. Structured properties often have partial update interfaces, which adds complexity, so that one subproperty can be modified at a time. Note while technically JCR does not have structured properties, you can use the distinguished tree below any resource as a tree of properties, so it is rather similar to this model. Note also that property naming can informally add structure, such as in the way slashes denote URI hierarchy, they can denote property hierarchy in a technically flat namespace.</p></li>
<li><p><strong>MULPROP</strong>. Properties can have multiple values.</p>

<p>Structured properties can usually have multiple values (JSON array for example), but not all systems with key-value type properties allow the same key to be set multiple times with different values. This is the model with say HTML metadata, where each property (key) can be set multiple times; however some key value systems only allow a key to hold a single value, and so the user would have to make a structured value to hold the multiple data items instead, by some encoding scheme without the system providing support directly. Having multiple properties complicates the simple set and get interfaces that single valued properties have.</p></li>
<li><p><strong>TREE</strong>. Resources can be in exactly one tree structure.</p>

<p>Another split one. Many systems have one distinguished tree structure that content items must be in, and that tree has special operations, like fast access to parents and children; other trees might be constructed by other means, like using a general relation, but the operations on them might be difficult. Children in a tree are almost always ordered and can be reordered, although some systems might not have this property.</p></li>
<li><p><strong>N-TREE</strong>. Resources can be in any number of trees.</p>

<p>The distinguished tree is very common (although Lily for example does not have one); but I do not think I know of any system with multiple named trees that share a common tree interface (like a parent function). You can make a tree with general relations, but you will not get help in making it acyclic for example. So while this is a possible design, it is complex to implement, although arguably useful as a modeling tool. Generally you will have to manage general relations yourself to do this.</p></li>
<li><p><strong>CLONE</strong>. A resource can be cloned.</p>

<p>This means that the same item can appear as more than one resource at the same time, each of which will update in the same way. This is similar to say a Unix hard link. This is the usual way of turning a TREE into a <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">dag</a>, which adds some flexibility. Different tree locations of the cloned resources may affect properties such as permissions in some systems, or inheritance so this property can add a fair amount of modeling flexibility; conversely without these it is of less use.</p></li>
<li><p><strong>RENAME</strong>. A resource can be renamed or moved.</p>

<p>Many content management systems provide this operation in their model, but it is not a native operation in others at the repository level, as the end user visible name just be a property for example. HTTP does not provide a rename operation, but WebDAV does.</p></li>
<li><p><strong>REL</strong>. General relations between resources can be created.</p>

<p>This is the basic relation (named by the key name) between two items. It corresponds to the HTML <link> metadata element, or an RDF triple. It turns the resources into a directed graph with named edges. It is certainly essential for any content management system; I will talk more  about how you want to be able to use and query it later.</p></li>
<li><p><strong>RELNS</strong>. Relations have a different namespace from properties.</p></li>
</ul>

<p>This is a distinguishing feature between XML, which has attributes and child elements (relations) syntactically distinguished, versus JSON that does not. Non child relations in XML are still attributes though. Generally seems a pointless distinction, and using a single namespace is simpler.</p>

<ul>
<li><p><strong>RELPROP</strong>. Relations can have properties.</p>

<p>This is an interesting one. Adding a value to the relationship triple to make it a quad, means that a number (or other value) can be assigned to a relation, making each relation a weighted directed graph (or you can view the system as a matrix). The general model of the <a href="http://arxiv.org/abs/1006.2361">property graph</a> has properties for edges, but for example the RDF model does not, although they are often what blank nodes are used to model, although of course blank nodes can have relations as well as properties. You can make up for a lack of properties on edges/relations by adding extra nodes like this, but they may proliferate and need managing, so allowing properties may help. It is also worth noting that a system without MULPROP can use naming of properties to implement RELPROP, as a relation could have a naming convention for its properties; similarly STRUCTPROP generally allows storing the extra information in the property structure.</p></li>
<li><p><strong>REFINT</strong>. Referential integrity is preserved for relations.</p>

<p>Preserving referential integrity at the repository layer is a fair amount of work, relational databases can do this, but not all content repositories do, for example over delete operations.</p></li>
<li><p><strong>ORDREL</strong>. Properties are ordered.</p>

<p>True key-value models do not tend to have an ordering for properties. As with many of these things, ordering adds interface complexity. Structured properties may however be ordered, and if the model supports a distinguished tree (TREE) this almost certainly has ordered children. If you have to build an ordered tree simple from basic relations it is quite complex. A sort order on a relation is another relation property that seems to be rarely supported for general relations, like weights.</p></li>
<li><p><strong>REIFY</strong>. Properties can have properties.</p>

<p>RDF in principle lets properties themselves be resources (reification), so that they can in turn have properties. This allows me to add information about the properties, such as where they came from. This rarely seems to be useful in common models. Giving different properties different permissions might be a more useful side effect.</p></li>
<li><p><strong>EXTREL</strong>. Relations can be defined externally to their subject.</p>

<p>HTML originally had a rev relation, which defined a relation backwards from object to subject, and RDF triples can be stored in any document, divorced from subject and object referants. This causes all sorts of issues with updates and managing (even finding) relations, while adding no descriptive ability except potentially REIFY.</p></li>
<li><p><strong>INHERIT</strong>. Relations can inherit properties.</p>

<p>The inheritance tree might be set from other properties, or from a distinguished tree, but one model is that properties not explicitly set can be inherited from another resource, or a prototype. This often makes models simpler, as rather than explicitly walking a tree, you can implicitly do it though inheritance. Seems surprisingly uncommon in content repositories.</p></li>
<li><p><strong>MINHERIT</strong>. Multiple inheritance.</p>

<p>Allow inheritance from multiple resources, not just for example based on the primary distinguished tree. More complex.</p></li>
<li><p><strong>NAMESPACE</strong>. Namespacing on properties.</p>

<p>Some systems have a type of namespacing on properties, often used for multiple language variants for example, so that a property may differ across these namespaces. This can also be implemented with multiple resources, structured properties or inheritance. Usually not all properties are namespaced at once; some may not vary, which makes the set and get interface more complex.</p></li>
<li><p><strong>ATOMIC</strong>. All the properties of a resource must be updated together.</p>

<p>A resource and all its properties are all updated at once to a complete new state (this will also be the versioning state). Other than for versioning, this also affects how concurrent access works. The alternative is that each property can be updated independently. Atomic updates are the HTTP resource update model.</p></li>
<li><p><strong>RVERSION</strong>. Resources may be versioned.</p>

<p>An entire resource may be versioned; this is a similar namespacing operation, where a namespace is available to retrieve the old values of a resource.</p></li>
<li><p><strong>PVERSION</strong>. Properties may be individually versioned.</p>

<p>Some systems (such as Lily) allow versioning to be turned on or off on a per property basis. This property generally implies that ATOMIC is not true.</p></li>
<li><p><strong>DELVERSION</strong>. Versions are deleted when a resource is deleted.</p>

<p>This is surprisingly common, if versions are namespaced properties, then they are often deleted along with the resource when it is deleted. The better solution is not to delete versioned resources, just give them a tombstone (whiteout) marker.</p></li>
<li><p><strong>SNAPSHOT</strong>. The versioning namespace is whole system state not resource state based.</p>

<p>Although versioning of the total state of a system is now common in source code control systems, many content management systems only let individual ressources be versioned (hence creating issues such as DELVERSION). The main issue here is that you cannot apply or undo a set of changes together, only individually. Apart from the difficulty in making easy user interfaces, whole system versioning is superior in every way to versioning of individual resources or properties and no one should be designing a system that does not behave like this.</p></li>
<li><p><strong>TREEVERSION</strong>. Versioning supports branching and merging.</p>

<p>A full versioning model like git or subversion, rather than just a linear series of checkpoints is another model. It generally simplifies the concurrent updates (ie can avoid both CAS and LOCK in theory). Although these provide the richest model of versioning, it is the hardest to present to the non technical user. Note also that this is one clear area where the content model for delivery can differ from the one for authoring; for authoring there are much more complex operations that are useful, while for delivery performance is key, and versioning may not be required at all, depending on how updates are applied.</p></li>
<li><p><strong>CAS</strong>. A <a href="http://en.wikipedia.org/wiki/Compare-and-swap">CAS</a>-type operation is supported on resource updates.</p>

<p>Some type of atomic update-if-unchanged since this version operation is supported, for lockless updates. HTTP Etags are the canonical example. This is the simplest choice for API access, and simple for users too. The unit of atomicity is usually the whole resource here, making it the unit of transactions; atomic update only of individual properties does not let two properties be updated in a single transaction so is not so useful.</p></li>
<li><p><strong>LOCK</strong>. A locking operation is supported.</p>

<p>The traditional alternative to CAS is a locking operation, that disables write operations while the operation is locked. Some administrative or time based unlock operations are required as well. Less suited than other methods to automated APIs, due to issues like deadlock. As multiple locks can potentially be obtained, cross resource transactions are possible, although this could impact concurrency.</p></li>
<li><p><strong>TRANSACTION</strong>. Transactions across multiple resources are supported.</p>

<p>Generally individual resources are the unit of transaction, or possibly individual properties. Some systems however allow a transaction in which multiple resources are updated together. JCR is probably the main example of these. A system with snapshots may also have this property if moving between versions is atomic. HTTP deliberately does not have this sort of transaction, as it does not work well if the resources are distributed, and system design for HTTP should ensure that resources model the right things so that transactions across resources are not needed.</p></li>
</ul>

<h2>Queries</h2>

<p>With the property model above, you can retrieve resources, and read and modify their properties. There may also be some additional maybe slightly different properties (the ones that TREE might expose for example, parent and child relations). We can traverse between resources by following their links. However we do generally want to make more complex queries, either about global questions, or more complex traversals based on properties. There are a lot of query models we should really explore, particularly we need to focus on how properties are indexed. I suspect that the analysis below is just a starting point.</p>

<ul>
<li><p><strong>PINDEX</strong>. Property values are indexed.</p>

<p>This is not necessarily essential, as for most interesting properties one would create a node rather than a value, and use relations, though you need reverse relation indexes anyway.</p></li>
<li><p><strong>REV</strong>. Relations have a reverse index.</p>

<p>This lets me find the opposite direction of a relation. This is a key property, as relations are directional, and important properties are in the other direction, such as finding all the resources tagged with a particular tag value.</p></li>
</ul>

<h2>Interfaces to properties</h2>

<p>I mentioned above that some of the property models have differing interface complexities, and I think it probably helps to show what some of the interfaces look like.</p>

<p>The canonical interface in web content management is that one exposed by HTTP. Resources and all their properties have to be updated atomically (ATOMIC) &#8211; <a href="http://blog.technologyofcontent.com/2009/12/smart-resources-or-why-you-should-care-about-http-patch/">PATCH</a> is just an optimization. CAS is available (Etags or last update). No versioning is specially supported (although the system could create resources for old versions and add properties to access them). Other property behaviours depend on the document types, so HTML for example supports a meta and link flat property namespace, but other schemes are possible. A resource TREE is very loosely defined by &#8216;/&#8217; in URLs, but does not provide any properties, so it is barely a tree.</p>

<p><a href="http://blog.technologyofcontent.com/2009/12/the-bottom-10-things-of-2009/">WebDAV</a> changed the HTTP data model to push it much more close to one traditional content model, supporting LOCK, RENAME, and TREE on top of HTTP, and an explicit property model independent of the resources in question. The property model is flat, with no STRUCTPROP although extending it is mentioned in the <a href="http://www.webdav.org/specs/rfc2518.html">RFC</a>, and no MULPROP. CLONE is allowed, as resources can have more than one URI. Updates to properties are not ATOMIC, as the PROPPATCH method can update some properties without others. The main HTTP resource is the body, which allows storage of one BINARY property, as the other properties are XML strings. This is pretty much the standard document management style set of properties.</p>

<p>As I was looking at Lily CMS recently, it is pretty different. There is no TREE, you have to construct it yourself. There is TYPE, no STRUCTPROP, there is NAMESPACE and PVERSION. There is no CAS or LOCK, conflicts are resolved by time of modification alone. It is an interestingly different model which I will look at in more detail in another post, as it chooses complexity in some areas and simplicity in others.</p>

<p>One of the common themes of the NoSQL movement is saying things along the lines of if you only give up this one feature we can make the storage layer that much simpler and faster, push some more work up to higher layers to resolve, issues like conflict resolution say, or referential integrity, or tree structures. This is the path of a simpler core repository which does not implement all common usage patterns for a CMS application, with some layering and conventions on top to build the next level. This is not so different from the relational model, with a low level relational algebra and a set of database management tools, then the application. What is still unclear is exactly where to make that split, but certainly large monolithic repository models that try to do everything listed above end up very large and complex to use.</p>

<p>There is definitely a case for moving some of these properties out of the repository into the authoring tools. Referential integrity at the repository level is actually quite hard to work with, as you cannot refer to something you are about to create, for example, but the authoring layer can provide tooling to help the user here.</p>

<p>I will post some follow-ups about some of the other issues arising from this, and what I think the best set of constraints to work in is, and more on the CMIS and JCR models.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Trends in content management 2010</title>
		<link>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/</link>
		<comments>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:05:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=226</guid>
		<description><![CDATA[This is a an overview of the medium term trends in content management, from a mostly technology point of view. Standards repository feeds CMIS JCR terminology, ways of thinking, industry model Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer [...]]]></description>
			<content:encoded><![CDATA[<p>This is a an overview of the medium term trends in content management, from a mostly technology point of view.</p>

<h2>Standards</h2>

<ul>
<li>repository</li>
<li>feeds</li>
<li>CMIS</li>
<li>JCR</li>
<li>terminology, ways of thinking, industry model</li>
</ul>

<p>Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer focussed. This has changed rapidly, with the wide adoption of the JCR standards, but particularly with process around CMIS. What is being set now is the model of the industry for the next five years, what the customers expect and what the products will deliver. Setting the agenda matters, and now is the opportunity to participate.</p>

<h2>CMS as a platform</h2>

<ul>
<li>build applications on a content platform</li>
<li>API driven development</li>
<li>SOA</li>
<li>embed code everywhere in domain level scripting languages</li>
</ul>

<p>A content management system is at last becoming less of a product that lets you do some stuff and more of a platform for working with content and building content centered applications and a service oriented world. Pervasive invasion of scripting languages such as Javascript into this is coming. The web programming model of pervasive agile scripting and rich REST APIs is going to be the norm, not large scale Java programming or application specific templating languages.</p>

<h2>Co-opetition and community</h2>

<ul>
<li>collaboration on standards, infrastructure</li>
<li>open source as community</li>
<li>twitter, blogs, enterprise 2.0</li>
<li>end of NIH</li>
<li>customers are community too</li>
</ul>

<p>In the last year especially the landscape of content management as a community has changed. First through the standards processes, particularly CMIS and JCR, and then through social media, particularly twitter, as well as via events and blogs, there is now a growing cross vendor technical content management community, particularly with the open source players, and joint projects, for example with CMIS. This is in addition to the developer communities that are strongest around the open source products, although the .net products are trying hard to build around the Microsoft developer relations model. And of course the community of customers, who are becoming more vocal.</p>

<h2>Rich content</h2>

<ul>
<li>richer xhtml and xml</li>
<li>enhanced metadata; richer metadata in other formats</li>
<li>constraints not just validation</li>
<li>RDF and semantic web, linked data</li>
<li>relations and IA expressed in metadata</li>
<li>enhancement via deeply integrated search</li>
<li>document management, DAM and WCM converge</li>
<li>richer presentation layers, richer APIs</li>
<li>Flash is dead, plugins are dead, HTML5 is winning faster than anyone thought</li>
</ul>

<p>As we have moved from document management, where the focus was on whole documents, to web content management, which is more component and assembly based, there has been a gradual push to do more with the documents. Standardized rich document semantics are after all one of the main advantages of web documents. It is taking a while but making use of the potential here is beginning to happen, now we have <a href="http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html">Google indexing rich snippets</a> and even <a href="http://rdfa.info/2010/01/20/uk-retail-chain-tesco-adopts-rdfa/">Tesco using RDFa</a>. There is a lot more standardization work to do here.</p>

<p>In the front end the aim of this backend information enhancement is to build richer interfaces more easily, and to enhance findability, search and navigation, as well as to enable repurposing, richer APIs, and linked data. Authoring is the biggest challenge, as the majority of users need to be given interfaces that are independent of the IA, simple to use, but support generation and modification of complex data structures.</p>

<h2>SAAS and the service business</h2>

<ul>
<li>cloud</li>
<li>internal delivery in a SAAS way</li>
<li>devops</li>
<li>APIs and standardization forced by SAAS</li>
<li>changes to customer service model</li>
</ul>

<p>Software as a service models are winning because no one wants to buy software as a product any more. I will cover more of this in another article I have been working on for a bit, but the main point is that enterprise software is a paid big ticket product is dead. The replacements are open source software and SAAS. These are not alternatives though, as people want the open source software delivered as a service, albeit maybe a more commoditized one if there are multiple providers, and many of the SAAS products delivered will be largely built of open source components by companies that run a mixed model. Microsoft is <a href="http://www.theregister.co.uk/2010/03/04/ballmer_on_azure/">going headlong into cloud</a> in a way that redefines what the operating system is. Even purchased software will be delivered in internal clouds.</p>

<p>This changes both how code as written and administered, with the <a href="http://lethargy.org/~jesus/writes/a-job,-a-mission,-a-career-all-without-a-path-or-a-name.">web operations</a> joining up into rolling delivery and creating the emerging field of <a href="http://www.devopsdays.org/">devops</a>. Developers need to understand operations and how to build code for this environment.</p>

<p>The service business as a business is different from the product business. Open source companies have got that better than product based vendors, but the less there is lockin the more key these changes become. The <a href="http://www.interwest.com/software-as-a-service/on-demand/vp-of-customer-success-critical-to-the-saas-business-model/">success of the customer using the services becomes the key business driver</a>.</p>

<h2>Performance and scaling, real time</h2>

<ul>
<li>cloud has pushed scale up out of picture</li>
<li>scale out transparently</li>
<li>new technologies beyond RDBMS that fit CMS </li>
<li>dynamic generation becoming the norm; Google pushing the performance thing; the industry norm of 100ms will fall</li>
<li>real time becomes more important &#8211; dynamic updates, forget crawling, Google is going push</li>
<li>backend: queuing (0MQ, AMQP)</li>
<li>frontend: websockets, XMPP, long polling </li>
</ul>

<p>Just buying big hardware for scale up is really becoming difficult; the web vibe has always been to scale horizontally on commodity hardware. There is a lot of development around scale out <a href="http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/">technologies such as NoSQL</a> which fit into the WCM data models, which are those of the web after all.</p>

<p>As well as scaling for volume, latency and real time are becoming key. Google&#8217;s time to crawl has been falling rapidly, to a few or less, but is <a href="http://www.readwriteweb.com/archives/google_developing_real_time_index.php">moving to real time</a> with push updates. Twitter has really pushed the boundaries of expectation for real time. Behind the scenes there are a lot of technologies for efficiently pushing around notifications and events both at the backend and on the frontend. Real time is going to become increasingly pervasive.</p>

<p>Page generation times will need to fall; the standard industry benchmark of 100ms per component will probably need to be halved; overall total times under 1s will become the norm.</p>

<h1>Security</h1>

<ul>
<li>web increasingly hostile</li>
<li>every bug is a potential security issue</li>
<li>security focussed on fewer areas, push into the OS not out to applications</li>
</ul>

<p>I read the excellent <a href="http://lwn.net">Linux Weekly News</a> every week, and every week there are <a href="http://cwe.mitre.org/top25/">security exploits</a> for many pieces of software; one that really struck me recently was the <a href="http://www.h-online.com/security/news/item/Possible-backdoor-in-the-e107-CMS-913588.html">major exploit against the CMS e107</a>. What happened here was the a group of crackers found a serious security flaw in the CMS, which they began attacking systematically. When the patch was released however, they already had control of the developer&#8217;s website via the flaw, so they replaced the patched version of the code with a version with a backdoor. Hacked websites are a vital part of the underground <a href="http://www.securitytube.net/Phishing-%28Evil-on-the-Internet%29-FOSDEM-Talk-video.aspx">online crime scene</a>, and a content management system is a high value target. Expect much more of this, and be prepared.</p>

<p>Narrowing the security into fewer points of vulnerability, sandboxing, using every available facet of the operating system&#8217;s security layers; make the most of processes, permissions, everything that you get there; I <a href="http://blog.technologyofcontent.com/2010/01/scaling-security-and-architecture-in-2010/">wrote more about this in an earlier post on emerging trends</a>. File format parsing is another area of vulnerability that is common.</p>

<p>It is war out there on the internet, and many people underestimate or ignore the issues, and too many programmers do not code defensively by habit.</p>

<h2>Summary</h2>

<p>It is an exciting time in web content management right now; the industry is growing up beyond its beginnings as a way of getting web sites up, towards being the core of the broader content management industry. The choices made now will shape the industry; the next generation of products will be a big step forward forr the industry.</p>

<p><a href="http://dilbert.com/strips/comic/2009-07-26/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/1000/700/61747/61747.strip.sunday.gif" border="0" alt="Dilbert.com" width="450"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>NoSQL and content management</title>
		<link>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 23:34:15 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[data modelling]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=216</guid>
		<description><![CDATA[I went to many of the first ever NoSQL devroom talks at FOSDEM this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from [...]]]></description>
			<content:encoded><![CDATA[<p>I went to many of the first ever <a href="http://nosql.mypopescu.com/post/385372130/your-chance-to-review-the-fosdem-nosql-event">NoSQL devroom</a> talks at <a href="http://fosdem.org">FOSDEM</a> this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from memory; Tim Anglade gave an excellent introduction where he reminded people of the historical roots, both before relational databases and since then; so not new but there is a renewed focus now. Why is that? I am going to look here at the field of content management and why you might be interested in different data models if that is your problem space, based loosely on some of the ideas from the talks at FOSDEM. There was a talk about <a href="http://outerthought.org/blog/blog/353-OTC.html">content management specifically and the Lily CMS by Evert Arckens</a> although I missed it, but I have added some comments after watching the video.</p>

<p><a href="http://www.flickr.com/photos/justincormack/4375594326/" title="FOSDEM by Justin Cormack, on Flickr"><img src="http://farm5.static.flickr.com/4029/4375594326_7ebdafd796.jpg" width="450"  alt="FOSDEM" /></a></p>

<h2>The data model for content management</h2>

<p>I have another draft post on this subject in more detail, which I am working on as parrt of my REST modelling in content management work, but I will outline some of the types of data relations that are important. I will be quite abstract here, if you want more concrete examples you will have to wait for the other post: database models like the ones we are talking about here are more easily understood in the abstract I think.</p>

<p>First we our unit of modeling. This in itself is the first issue. Content management tends to deal with, at the conceptual level, something that looks like a document. It may be a fragment, in the sense that it is say a page component (asset if you use that terminology) rather than a whole item, but the unit for the user to edit and which is usually versioned is a structured object itself. The processing model tends to treat it as almost of binary blob, except that certain properties can be extracted, such as metadata, links in HTML and so forth, but it is stored as an item rather than decomposed further.</p>

<p>OK, so we have a piece of content and some attributes extracted from it as one basic model. This corresponds pretty much to the JCR data model for example. There are variations; sometimes people do not store metadata in the file formats, as historically many file formats had poor support for arbitrary structured metadata, although that is largely obsolete now, and the advantages of actually storing metadata and relations substantially within documents are high. External storage does not change the model much, just complicates processing and storage. Another variant, often seen in document management systems is to be able to have multiple &#8216;streams&#8217; ie several document variants rolled into one, for example a video and a still from it. You can however from the modelling point of view regard these as anotehr compound document format kept together because conceptually they are a bundle of content; you might distribute them as a zip file if you havent got any other suitable container format.</p>

<p>So now we have a storage model where we have a blob, with rich media operations on it, and extracted structural and metadata information. There is also versioning to consider, but let us ignore that and treat it either as part of the blob, or as a new document with some relation to the old ones, those being the two core versioning models, this does not really affect anything else.</p>

<p>There are two kinds of metadata, although they are more similar than they appear, properties and relations. Properties are the standard attributes (this picture depicts sheep), while relations join two items in the repository (this is a cropped version of this other picture). Although this distinction seems clear, in the end richer information architectures demand that everything becomes a relation, so I can browse a sheep node and find all the sheep items, turning every attribute value of any significance into a node with relations instead. Pure attribute values are only left for the less interesting properties (this PDF file is 176k in size).</p>

<p>They are also less interesting from a relational versus non relational storage point of view, although there is one important point, which is the dense versus sparse question, so let us take a look at this. Most real world attributes are sparse, that is most attributes aare not set on most items. In the relational model we have a row for our item, and columns for all the attributes, so we are saying most are NULL. (I was brought up on matrix algorithms and still think in terms of sparse versus dense matrices as this is exactly the same problem, and matrices represent graphs anyway). Storing huge mainly null tables is not very efficient, so there are two common practices in relational mapping of attributes in content management systems. First is to define a type based system, where a particular type of content item is defined to have certain attributes (or at least fewer NULLs!), and each set of that type therefore can have its own table which is assumed to have fewer NULL values. Mixins, sets of properties that live across types can potentially be added to this model, as can inheritance schemes, but the basic idea is one table per type. This gives a nice simple direct database programming model, and causes a complete nightmare if you ever want to change the schema, for example add an attribute, as for any large database most DBMSs will effectively shutdown the system while a schema change takes place, as schema changes require pretty much all locks. <a href="http://www.silverstripe.com">Silverstripe</a> is one example of a content management system built like this; there are many others.</p>

<p>The alternative is the <a href="http://en.wikipedia.org/wiki/Entity-attribute-value_model">entity attribute value</a> (EAV) model (terrible Wikipedia article, please fix), where rather than a direct mapping of the attributes to relations, you indirectly map, creating a table that joins entites, attributes and values; this table of course looks just like RDF triples. Doing this though loses everything that makes a relational database useful: constraints, typing, query optimization. It adds an extra layer of logical schema above the physical schema which the database layer does not understand. This is a pretty common relational mapping for content management systems, as it allows full flexibility in defining and redefining attributes. To implement well it needs a large mid layer to manage the constraints, provide an API layer, generate efficient queries, effectively to manage the logical layer to physical layer map. The <a href="http://drupal.org/node/82661">Drupal CCK</a> is an example of this model.</p>

<p>Of course this is not to say that neither of the two relational models do not work. The direct mapping works well with simple, unchanging content types in small websites, for example, or in models where attributes are not very sparse, or the sparseness is worth the overhead, and changing the schema is rare. EAV works well too, if managed carefully; it helps if the type of queries required on the model are not too complex.</p>

<p>Once you add relations as well as attributes, the already difficult mapping layer gets harder; you add another set of operations (recursion to handle tree structures) that the relational model does not handle well, so you may need to add more into the mapping layer. The promise of NoSQL is that you can bypass this for these types of applications, and program directly to a database model that handles sparse attributes and relations natively. But how much do the NoSQL databases get you? You can argue that if you are already looking at EAV, then you are already not getting much from a relational database, and you are building a modeling layer on top of it, so dropping that and going for something that maps the logical data layer directly does make sense from a development point of view. Whether that really helps performance is less clear; much of the original work for NoSQL has come out of huge scaling, big problems, not actually providing efficient solutions to the types of data mapping problem we are seeing here on a medium scale; of course for huge sites there may be benefits.</p>

<p>The types of NoSQL database vary in their level of support for attributes and relations as they are used in content management. Document oriented databases do not give you much more than retrieval of content items; associative ones give key value type attribute lookups; graph databases should let you query relations directly, expressing the types of queries that are needed for information architecture problems directly, in principle. Examples I am thinking of are things like tag clouds, which is simple to express as a graph problem as it is simple a count of the number of edges from a set of nodes. Indeed most information architecture problems look like graph problems, and also like <a href="http://en.wikipedia.org/wiki/OLAP_cube">OLAP processing operations</a> which also do not work well on relational databases. And of course one of the things that NoSQL has shared with OLAP is the use of denormalization; you can use simpler models if you denormalize data to match the queries you will be using, rather than assuming that the types of query you will use can necessarily be optimized and made efficient by a general purpose system.</p>

<p>Denormalization is not without its difficulties, although arguably it could become a tool embedded in databases like indexes are now. One of the issues with NoSQL is most of the database systems leave denormalization to the user: you need to use it because joins are not available, but you have to manage that yourself. Building an infrastructure to explicitly manage denormalization as a first class database item akin to an index might be interesting. So that gives us a first issue, as in any NoSQL system except a graph database we will either need to denormalize or compose queries to get the results we want.</p>

<p>So I think there are four realistic models for content management backends going forward:</p>

<ol>
<li>The direct relational model for small systems with simple data models, rare attribute changes, little or no use of relations.</li>
<li>EAV models wrapped in a content modeling layer; JCR is an example of this, hiding the underlying SQL layer very well, and indeed allowing it to be replaced with another underlying storage model potentially; I am sure someone is testing a Neo4J backend somewhere. This is where most production solutions are at now.</li>
<li>Direct, nondenormalized graph database backends, with the raw content stored in a document store. Cuts out a special purpose middle level by mapping the domain more directly. As <a href="http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html">Emil Neo</a> says, it may not scale right up as far as the othe NoSQL technologies, but it cuts complexity of implementation; there are also issues about whether all the kinds of queries required are available efficiently. I think this will be the sweet spot in a few years once the products mature and we see more open source activity in the field. Of course RDF based solutions, for example using SPARQL fall into this category too, and the maturity of products around these technologies will help drive this category as well as the NoSQL models.</li>
<li>Big, denormalized systems, probably with software support for managing the denormalization, and using underlying simple but scalable technologies like key-value stores. These already exist in large scale web applications, but may remain niche if the development effort remains high. If frameworks for modelling more easily on these turn up they may trickle down for performance reasons even on smaller datasets; a key value store runs fine on a relational database backend, although the types of processing required probably means a specialized backend is useful.</li>
</ol>

<p>Note that the <a href="http://lilycms.org/">Lily CMS</a> which there was a talk about fits very much into the fourth option above; this is where the NoSQL technologies have perhaps seen most use, but I think there will be a lot of work in order to build a CMS like this now, in particular in terms of tools to support denormalization strategies that are needed. The outlined approach sounded much like the outlines I have been thinking about for this type of model, although I would focus more on tooling for denormalized queries and less on scaling other parts like full text search right now. It will be interesting to follow the progress of this project.</p>

<p>We are at an interesting juncture, where it looks like there are some options that will let us do domain modelling in a way that corresponds more directly to the domain, but there are a lot of interesting challenges on the way.</p>

<p><a href="http://dilbert.com/strips/comic/2008-02-12/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/800/1869/1869.strip.gif" border="0" alt="Dilbert.com" width="440"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Standards Diagram for Content Management</title>
		<link>http://blog.technologyofcontent.com/2010/01/standards-diagram/</link>
		<comments>http://blog.technologyofcontent.com/2010/01/standards-diagram/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 23:17:33 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=191</guid>
		<description><![CDATA[This is attempt number 1 of a diagram which I promised Jon Marks after his post. No it still does not have OSGI in! As Jon&#8217;s presentation used Prezi I thought I would give it a go. It takes a while to get the hang of it but it is fun. I can&#8217;t work out [...]]]></description>
			<content:encoded><![CDATA[<p>This is attempt number 1 of a diagram which I promised <a href="http://jonontech.com/2010/01/10/an-incomplete-directory-of-open-standards/">Jon Marks</a> after his post. No it still does not have OSGI in! As Jon&#8217;s presentation used <a href="http://prezi.com/">Prezi</a> I thought I would give it a go. It takes a while to get the hang of it but it is fun. I can&#8217;t work out how to get an overview at the end though&#8230;</p>

<p>Just press the play button to move around.</p>

<iframe height="300" src="http://prezi.com/nifkatyvrk02/view" width="450"></iframe>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/01/standards-diagram/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Open source and Content Management (for Janus Boye)</title>
		<link>http://blog.technologyofcontent.com/2010/01/open-source-and-content-management-for-janus-boye/</link>
		<comments>http://blog.technologyofcontent.com/2010/01/open-source-and-content-management-for-janus-boye/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 14:45:31 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=186</guid>
		<description><![CDATA[Janus Boye said the other day after the BCS open source seminar in London @McBoof I left London dazed and confused when it comes to open source. Somebody pls. help me explain what open source really means #idiot Now I only spoke to him very briefly before he had to rush to the airport, but [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://twitter.com/janusboye">Janus Boye</a> said the other day after the BCS open source seminar in London</p>

<blockquote>
  <p>@<a href="http://twitter.com/McBoof">McBoof</a> I left London dazed and confused when it comes to open source. Somebody pls. help me explain what open source really means #idiot</p>
</blockquote>

<p>Now I only spoke to him very briefly before he had to rush to the airport, but hopefully the following will be helpful: first an overview of the important things about open source in general, and then how they are and will affect content management in particular.</p>

<h2>Open Source</h2>

<p>I think it is  easiest for software developers to understand open source. It came from that community, and it addresses our needs. For a long time no one outside that community was really concerned with it. I think the first time I noticed someone who was not a developer showing an interest was when I was stopped at a tram stop in Vienna by an American as I was wearing a Redhat T-shirt just after their float and was asked who the next open source IPO was going to be, that was 1999 ten years ago now. I suppose that IPO was a big event in the spreading awareness of open source, although it did not perhaps spread much information about what it was really about.</p>

<p>I think the best place to start with trying to understand open source is with three things. I tend to have a bit of a historical approach to things&#8230;</p>

<p>The first is Richard Stallman. I recommend <a href="http://www.fsf.org/events/rms-speeches.html">him in person</a> rather than in writing actually. Actually that reminds me the first time I ever saw him I was sitting in the <a href="http://www.foundry.tv/">Foundry in Old Street</a> and he walked in and proceeded to autograph a woman&#8217;s breasts. Anyone who wants to understand open source should hear him explain the roots of the open source movement. I will not really try to explain all that here, but openness is what created the scientific method, and the idea that software got to the point where it was no longer possible to make it do what you wanted because you did not have access to source code, the point where you could not build on stuff any more or fix it, where control of your tools is taken away is a key part of it. Some people have tried to sanitize Richard out of things (the open source vs free software mess) but that is a mistake.</p>

<p>Second is Eric Raymond&#8217;s essay <a href="http://catb.org/~esr/writings/homesteading/">The Cathedral and the Bazaar</a> which wass very influential at the birth of commercial open source. It is strictly about software development methodologies, and much of the discussion about the cathedral methods is applicable to open source software too. It is about the huge changes that the internet brought in open source development, the birth of a development method that no longer copied the methods of closed source development but utilised the openness to create true large scale community development in a way that was not possible before, and which closed source cannot replicate. Linux is of course the classic early example of this.</p>

<p>Which brings us to the third thing, community. Open source is first of all participatory, not just for consumption, perhaps a bit against the grain of late twentieth century culture. Actually I am an optimist, <a href="http://www.herecomeseverybody.org/2008/04/looking-for-the-mouse.html">with Clay Shirky and against the sitcom</a>, and think culture is swinging this way but we shall see. So for open source, start by using it, then participate. No you do not have to code, although you can learn, there are other ways, bugs, documentation, all sorts. If you just want too see what the community looks like, I can&#8217;t recommend anything better than going to a good conference, like <a href="http://fosdem.org/">FOSDEM next month in Brussels</a>.</p>

<h2>Open source in content management</h2>

<p>Open source has not affected content management much yet. Almost all content management by volume takes place on open source products (by volume WordPress, Joomla! and Drupal far outweigh anything else). By value it is less clear, open source always has an issue with by value calculations as the revenue models are different, Linux is not the leading server operating system by value, but is by installed base, but is also probably by the value of the services running on it.</p>

<p>But arguably open source content management software has not affected the industry yet, looking now at the larger installations, and the areas that Janus is interested in, indeed that I am. The industry has grown up in a mess as far as standards, ideas, infrastructure are concerned, but the <a href="http://en.wikipedia.org/wiki/Reality_Checkpoint">reality checkpoint</a> has been reached. Two standards have so far started to change the technology landscape of content management, JCR and CMIS, and almost all the implementations of these are open source, and most are cross-vendor projects. This change will grow as more standardization and commoditization sweeps the industry, as the industry adopts a web infrastructure rather than the pre-web legacies inherited from the document management history of the business. Everything that this business deals with will be served through the web; almost all web infrastructure is open source software; content management will be no different.</p>

<p>In this field this is all just beginning. Like open source as I said above, it started with developers, about more efficient ways of building, architecting and delivering software; in terms of influence on the end users it is still small. But things are turning as people become aware of open source in the industry, but they clearly still need some help understanding it. I hope this has helped.</p>

<p><a href="http://dilbert.com/strips/comic/2007-08-03/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/600/1676/1676.strip.gif" border="0" alt="Dilbert.com" width="480"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/01/open-source-and-content-management-for-janus-boye/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Wave experiment: Things We Hate About Content Management</title>
		<link>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/#comments</comments>
		<pubDate>Sat, 24 Oct 2009 13:33:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Wave]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=135</guid>
		<description><![CDATA[Experiment with writing in the wave.]]></description>
			<content:encoded><![CDATA[<p>Well, that was the story, six people in content management writing a blog about stuff using Google Wave. Mostly for the first time I think; something to do with those fresh invites.</p>

<p>Other links are here: <a href="http://jonontech.com/2009/10/23/a-collaborative-google-wave-blog-post/">Jon Marks</a>, <a href="http://irinaguseva.wordpress.com/2009/10/23/things-we-hate-about-content-management/">Irina Guseva</a>, <a href="http://www.persuasivecontent.com/i-predict-a-cms-riot-1-hour-6-people-1-wave">Ian Truscott</a>; other participants Adriaan Bloem, Andrew Liles, <a href="http://contentedmanagement.net/blog/bove-the-contentious-waves-he-kept/">Philippe Parker</a> (first use of Wave over GotoMeeting?)</p>

<p>Well it was fun. Technical difficulties, lost sync and crashed a few browsers, some people lost whole machines though. Safari coped better than Firefox. It took a while to realize what was happening here, hey but this is in beta!</p>

<p>As a brainstorming tool at worked pretty well. I thought it scaled pretty well. The named cursors indicate who is in the bit you are in, but for brainstorming you can look, write another point, move, continue, not edit much. After half an hour of getting to bulletted lists, a bit of moving around the heavy writing started (after a discussion at the top in our proxy process section; we should have split the thing up a bit).</p>

<p>There is a great tendancy to write temporary notes about the discussion and then just delete them. Which feels odd, data and metadata together of course. The editing process was odd, you would find orphaned bits, move things, try to join stuff up to make it flow, while it was all changing around you. Pretty chaotic. Bits that no one expanded into prose got junked (quite a good edit method, as they couldnt stand up themselves).</p>

<p>Here is the &#8220;finished&#8221; article&#8230; which cannot be attributed to anyone individually of course&#8230; the subject was chosen about 10 minutes in, just as something people would have something they could easily contribute into this situation, there are some good points in there though!</p>

<p><strong>Things We Hate About Content Management</strong></p>

<p><em>- By The Motley Crew</em></p>

<p>It was a lovely Friday morning/afternoon, and we were Waving. The experiment initiated by McBoof (yes, that one) brought together 6 CMS folks from around the world. The event gathered together analysts, journalists, vendors, system integrators to Wave on a topic that was decided at that very moment. We had one hour (in between conference calls and other job thingys) to pick a topic and Wave it.</p>

<p>A little collab on what exactly to Wave about later, we decided to do &#8220;a mindmap of things we find annoying in CMSs.&#8221; To up the ante, we also decided to take the original bullet points (deemed &#8220;too easy&#8221;) and convert the whole thing to prose. Was the tool given really up to the task? Were our minds flexible enough to wrap around this kind of realtime collaboration?</p>

<p>In the beginning &#8212; we blame the tool <img src='http://blog.edge3.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  &#8212; we were Drowning, not Waving. We (almost) didn&#8217;t fight about edits. We almost didn&#8217;t step on each other&#8217;s toes. All in all, it turned out to be a fun and productive collaborative exercise. Read on to see for yourself.</p>

<p><strong>Cosmetic Issues</strong></p>

<p>There really should be a CMS UI fashion police. As there should be a Magic Quadrant for shoes and handbags. Why? Well, there&#8217;s a couple of issues.</p>

<p>For instance, sloppy, non-designed design. You know the kind of thing that has not been thought about and reworked and made to feel right. The sort of thing coders do if you don&#8217;t force them. But at the same time, over-designed interfaces can be just as bad: the designers and developers really need to be on speaking terms.</p>

<p>When building a system that works, you can&#8217;t have the development team in the basement on a sustenance of Jolt coding away into the night, and the designers in the penthouse in turtleneck sweaters sipping espressos. Too many CMS designs end up being programmer vs. end-user friendly. And this is not the best way to charm away those marketing and web content folks.</p>

<p>Developers and designers need to talk to each other and essentially, both should talk to users &#8211; not just eat your own dogfood &#8211; but listen to what dogs like to eat. A developer or UI designer are not content editors, marketers or knowledge and information workers.</p>

<p>Some vendors say that the agonizingly and depressingly black UI backgrounds are hip and modern. Well, they are not, really. Who told you that? Especially if you add a Star Trek theme to it and sprinkle in some stars and cosmic swirls, because if Apple does it, it must be cool right? Not pointing any fingers, but I would quit if I were a content manager having to spend my 9-5 staring into the &#8220;black hole&#8221; of some of the CMS UIs that are out there on the market.</p>

<p>Even pop-ups seem less annoying when compared to dark UIs. Which brings us onto&#8230;</p>

<p><strong>Interface Issues</strong></p>

<p>Interfaces need a comfortable lived in feel. Content management is something people work with every day, it is their interface to their job. You meet people who hate the interface, and that makes their work a heap of pain. I have seen people who describe the 44 clicks it takes to insert an image. You have a responsibility to these people, to make them love the content and make the tool disappear.</p>

<p>We all hate it when the interface does something on its own that ruins your context. E.g. a page refresh, or in Wave the jumping around of the scrolled window in some cases <img src='http://blog.edge3.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  Or the lack of an easy way to bookmark, so you can reference someone to the content. Remember people will be collaborating and need to send links around. Make sure the UI is a proper web application with URLs. And why do tasks that are easy to describe and often repeated in exactly the same way still take more than a few clicks? (Or maybe even dozens of clicks.) With bonus points for forcing users to use dialogs or tabs to enter mandatory information. Remember people do not have all the information in the right order.</p>

<p>Also, we need sane conflict merges. Check in and check out is too extreme for most uses. But people want to edit offline still. Of course Wave doesn&#8217;t have an offline: Google thinks this problem is going away, it&#8217;s real time so there are never conflicts (that&#8217;s defined in the XML protocol; it&#8217;s quite interesting if you are that way geeky). Does Google have the right answer here? Well, the Motley Crew is struggling here, and some browsers lost sync during this experiment.</p>

<p>&#8220;Power users&#8221; (those who use it all day long) of CMSs needed to have a &#8220;Desktop&#8221; experience. What does Desktop Experience mean? Well, it doesn&#8217;t really have to be on the desktop &#8212; these days it is perfectly possible to get very close to a hitherto Desktop experience in a browser or similar. these are qualities: very low latency from action to response, no page refreshes, modal and modal-less dialog boxes as appropriate, &#8220;push&#8221; notification.</p>

<p><strong>Architectural Issues</strong></p>

<p>Architectural issues of the wave overtook any architectural issues of Content Management Systems. The fact that we authored this entire article in a single blip didn&#8217;t help, and slowed everything down enormously. McBoof learned the hard way that he really need a new laptop and spent most of the session giving his machine CPR. Next time we&#8217;ll do each paragraph in its own blip to stop FireFox going down like a Led Zeppelin.</p>

<p>Monolithic systems. Build it out of pieces that the client can not use all of. Obviously your pieces may work together better, but there should be components. Do not try to reinvent all kinds of wheel. &#8220;Best of breed,&#8221; though, is just another weasel marketing idea, as if systems are pinnacles not about meeting requirements.</p>

<p>Marketeers are adroit at using the term Best Practice to position Their Way as the only way that a particular matter can be solved. (Many of us live in that netherland of having to pedal that point of view, but it is a falsehood that the careful buyer should try to see through.) I think this devalues genuine best practice, vendors should cite references</p>

<p>Most often a marketeer&#8217;s Best Practice view is the only one they subscribe to as their product development has paddled up the wrong stream and cannot or won&#8217;t reverse their architectural design (probably because of the cost of doing so). This intransigence most often causes a product to doom itself. (Think of IBM and The Mainframe Is The Only Way To Do Serious Business).</p>

<p>Who really still believes that there is a place in this world for Flash or Java Applet based Rich Text Editors? TinyMCE, FCKeditor and others are filling the gap left by Ektron when they bit the hand that feeds and entered the CMS market. Ephox is trying to spread, but I find it difficult to come up with an excuse to use an Applet over HTML with javascript these days. Stick with the standard.</p>

<p><strong>Business Issues</strong></p>

<p>Where you are buying into something that you may very well need to change or integrate with there is strong benefit in considering Open Source. Open Source used to frighten commercial software companies but we have come along way on that road to understand that commercial organisation can operate in an Open Source world and benefit. This does not necessarily mean that their prized system needs to be fully opened up, but taking the spirit of it to mean that you are completely open to people seeing and learning from your code how it operates.</p>

<p>Exactly what you need to see opened up varies. In a CMS there may be a subsystem that stores the content or one that allows a Rich Text Editor. These arguably don&#8217;t need to be opened up, but when a CMS ships with modules for, for example, an RSS feed widget, calendaring tool, prebuilt webforms, users who then want a variation on this module can benefit from seeing how the &#8220;pros&#8221; did it, they can then use it as a starting point for their own different implementation.</p>

<p>We really don&#8217;t need vendors that pay lip service to the buzzwords. When they think the new CMS buzzword &#8220;engagement&#8221; is just a screenshot of Google Analytics. Or when they add an image picker and call it DAM. And a cross-over between WCM and ECM? Don&#8217;t think WCM is like ECM and it&#8217;s about organizing content, not about effectively communicating with the audience. And don&#8217;t think that if you organize the content, you can automatically communicate effectively.</p>

<p>Completely different, but equally frustrating, is procurement (and the procedures that go with it.) Procurement folk don&#8217;t recognise the importance of user adoption to the success of the project &#8212; of the black background and all the UI issues pointed out previously. If a CMS is procured according to procedure, the selection is a success to them. But those same rules are often a recipe for ignoring what the users really need.</p>

<p>At the same time, budgets that aren&#8217;t transparent are an issue &#8211; customer and vendor should be able to have a sensible grown up conversation. As a customer, of course you want good value, but how cheap are you? But to vendors: many licensing models don&#8217;t make any sense, and force you to do stupid things. People are scared to have that conversation &#8211; the best architectural fit first I say, lets figure out an appropriate license around that.</p>

<p><strong>Conclusion</strong></p>

<p>So much hatred rolled up into a tight little ball of anti-CMS rage. Who would have expected it from such a respected bunch of CMS folk. We hate the designs, the interfaces, the architectures and the business. Time for a beer/wine? Wave good bye!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Content enabled vertical applications (composite content applications) &#8211; executive briefing</title>
		<link>http://blog.technologyofcontent.com/2009/10/content-applications-briefing/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/content-applications-briefing/#comments</comments>
		<pubDate>Sun, 11 Oct 2009 21:39:29 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[CEVA]]></category>
		<category><![CDATA[content]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=124</guid>
		<description><![CDATA[Content enabled vertical applications (composite content applications) - executive briefing]]></description>
			<content:encoded><![CDATA[<p>I noticed that content enabled vertical application recently became the top search entry point to my blog. Now as I have only written one article about this, and it rapidly rose up the Google rankings, I reckon there must be a dearth of content on this subject. I guess that may not be too surprising, as it was initially a Gartner attempt to describe a path people wanted to follow rather than a generally used description. Gartner have also decided that the content enabled vertical application (CEVA) is now called the composite content application (CCA), possible to confuse everyone even further. But they do matter to your content strategy.</p>

<p>So what are these, why do you want them, and what do they mean for your content strategy?</p>

<p>Toby Bell at Gartner says</p>

<blockquote>
  <p>Smart companies have begun linking more of their content to industry-specific, human-centric
  processes, such as insurance claims handling, or supporting research on new drug development.
  This approach usually means building or modifying the content-enabled vertical applications
  (CEVAs) on top of ECM environments. CEVAs typically help to automate complex processes that
  previously required workers to manually sort through paper documents and other forms of content
  (in effect, a way to manage down costs of exception handling) and optimize the remainder of the
  work.</p>
</blockquote>

<p>This seems to be not a great summary though. Look at it from another view. Your enterprise content was disorganized, living on network shares, random websites, legacy systems, all over the place. Your ECM strategy was to first to consolidate to reduce costs of multiple systems, and to improve findability, get to a base enterprise content management position. But what next? Where is the next value?</p>

<p>Content strategy starts here. There are many parts to this, covering creation, lifecycle and reuse, audit, consolidation quality and so on. The CEVA part is about the delivery and interaction with content within other non content focussed areas of the business. Content management often seems to be about specific content focussed parts of a business, such as, historically, technical documentation and, more recently, online marketing material. Plus a load of unstuctured stuff like emails and generic office documents. The areas such as technical documentation have had high value often for legal and regulatory reasons, so structured processes were created early; these effectively created the content management industry initially. Web content historically had different solutions because it turned up and became important when the general purpose tools (Word!) could not usefully author it.</p>

<p>But all the stuff classified as &#8220;other&#8221; does have underlying processes, content processes. Some are formalized in systems, the original paperless office systems, the roots of the document management industry in forms and scanned paperwork processing. This stuff generally sits more on the &#8220;data&#8221; not &#8220;content&#8221; side of business processes. Long term this distinction is not such a useful one, and data and content resources will merge together into a single enterprise resource architecture. The majority of processes with content though take place through informal channels, particularly email with Microsoft Office documents. These are the document types you tried to take control of through ECM.</p>

<p>So ECM took the documents that were behind many processes and made them findable and organized them. But at the base level a content repository is just that, a repository. It deals with basic issues such as versioning and permissions, search and findability, and some organization, but it does not really deal with process and processing.</p>

<p>Process and processing are the valuable parts in the lifetime of most content. Imagine the lifetime of an insurance contract say, with payments and claims and disputes, or an employees personnel file, or a technical manual over the lifetime of a product. A CEVA or CCA is an application to support these lifetime processes.</p>

<p>It is also an application to support the relation of a document&#8217;s lifetimes processes to other systems. Your CRM system may need to know about insurance claims, your sales department may need to know about expiry, your website may need to know about new documentation releases, content changes do not happen in a vacuum.</p>

<p>One class of CCA that is common but is rarely perceived as that is a software application with embedded content. Once that was just embedded &#8220;help screens&#8221; with content tools to manage them, then came internationalization, with a different set of tools. But these desktop applications are being rapidly replaced by web applications. Web applications are much more content driven, they may live in an SEO facing world, they may live in a customer facing world that may consider usability, they may be multilingual, and they are not driven by the developer-centric ideas of help screens and manuals. Content and application can live together, but this requires new ways of using, reusing and versioning content, and pulling content out of application release cycles so it can reflect non application changes, such as the marketing environment, usability improvements, corrections and enhancements. These applications were historically development led but as they mature the content aspects become key business drivers, needing content management integration.</p>

<p>So what do you need to build this type of application, and what should your decision criteria on platforms be?</p>

<p><a href="http://stephanecroisier.jahia.com/from-content-composite-to-content-solutions">Stéphane Croisier says in a good survey</a> &#8220;So rapid raw composite assembly, fast integration and ease of use are the three new pillars of next generation content solutions.&#8221;</p>

<p>The first thing to bear in mind is that you need more than a repository, you are looking for an application platform too. The ease of use issue is important. Long term you need to be looking for something that staff can build simple tools from, even if you are hiring specialists for the complex projects. Ease of use is a two way thing, as you need an easy to use platform that lets you build easy to use applications. And ease of modification and maintenance is equally important as these applications may need to be fluid. You are likely to need external support to build more complex applications on the same platform, so availability of this is important too. Ignore the jargon of portlets, widgets and mashups: none of these so called standardisations have much traction; we are talking application development, use what you have or can hire developers to do. Ask the vendors what their platform strategy is.</p>

<p>Stéphane identifies a trend towards solutions, ready to go solutions for common problems; these may be useful but I would not choose a development platform on the basis of the availability of particular solutions or you may end up buying a platform for every solution. A longer term view of the viability of a platform for other solutions is necessary too.</p>

<p>Long term, remember that content application strategy is part of content strategy, and comes after that. You need to know what your content applications are and will be, and have built an underlying respository, authoring and reuse strategy first. Applications are where developers need to interact with this to achieve the long term goals.</p>

<p><a href="http://xkcd.com/388/"><img src="http://imgs.xkcd.com/comics/fuck_grapefruit.png" alt="fruit magic quadrant" width="450"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/content-applications-briefing/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Content Delivery Network via Amazon Web Services: CloudFront: blog.edge3.org

Served from: blog.technologyofcontent.com @ 2012-02-04 14:05:15 -->
