<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; CMS</title>
	<atom:link href="http://blog.technologyofcontent.com/category/cms/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 25 Apr 2010 21:45:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Trends in content management 2010</title>
		<link>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/</link>
		<comments>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:05:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=226</guid>
		<description><![CDATA[This is a an overview of the medium term trends in content management, from a mostly technology point of view.

Standards


repository
feeds
CMIS
JCR
terminology, ways of thinking, industry model


Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer focussed. This has changed rapidly, with the [...]]]></description>
			<content:encoded><![CDATA[<p>This is a an overview of the medium term trends in content management, from a mostly technology point of view.</p>

<h2>Standards</h2>

<ul>
<li>repository</li>
<li>feeds</li>
<li>CMIS</li>
<li>JCR</li>
<li>terminology, ways of thinking, industry model</li>
</ul>

<p>Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer focussed. This has changed rapidly, with the wide adoption of the JCR standards, but particularly with process around CMIS. What is being set now is the model of the industry for the next five years, what the customers expect and what the products will deliver. Setting the agenda matters, and now is the opportunity to participate.</p>

<h2>CMS as a platform</h2>

<ul>
<li>build applications on a content platform</li>
<li>API driven development</li>
<li>SOA</li>
<li>embed code everywhere in domain level scripting languages</li>
</ul>

<p>A content management system is at last becoming less of a product that lets you do some stuff and more of a platform for working with content and building content centered applications and a service oriented world. Pervasive invasion of scripting languages such as Javascript into this is coming. The web programming model of pervasive agile scripting and rich REST APIs is going to be the norm, not large scale Java programming or application specific templating languages.</p>

<h2>Co-opetition and community</h2>

<ul>
<li>collaboration on standards, infrastructure</li>
<li>open source as community</li>
<li>twitter, blogs, enterprise 2.0</li>
<li>end of NIH</li>
<li>customers are community too</li>
</ul>

<p>In the last year especially the landscape of content management as a community has changed. First through the standards processes, particularly CMIS and JCR, and then through social media, particularly twitter, as well as via events and blogs, there is now a growing cross vendor technical content management community, particularly with the open source players, and joint projects, for example with CMIS. This is in addition to the developer communities that are strongest around the open source products, although the .net products are trying hard to build around the Microsoft developer relations model. And of course the community of customers, who are becoming more vocal.</p>

<h2>Rich content</h2>

<ul>
<li>richer xhtml and xml</li>
<li>enhanced metadata; richer metadata in other formats</li>
<li>constraints not just validation</li>
<li>RDF and semantic web, linked data</li>
<li>relations and IA expressed in metadata</li>
<li>enhancement via deeply integrated search</li>
<li>document management, DAM and WCM converge</li>
<li>richer presentation layers, richer APIs</li>
<li>Flash is dead, plugins are dead, HTML5 is winning faster than anyone thought</li>
</ul>

<p>As we have moved from document management, where the focus was on whole documents, to web content management, which is more component and assembly based, there has been a gradual push to do more with the documents. Standardized rich document semantics are after all one of the main advantages of web documents. It is taking a while but making use of the potential here is beginning to happen, now we have <a href="http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html">Google indexing rich snippets</a> and even <a href="http://rdfa.info/2010/01/20/uk-retail-chain-tesco-adopts-rdfa/">Tesco using RDFa</a>. There is a lot more standardization work to do here.</p>

<p>In the front end the aim of this backend information enhancement is to build richer interfaces more easily, and to enhance findability, search and navigation, as well as to enable repurposing, richer APIs, and linked data. Authoring is the biggest challenge, as the majority of users need to be given interfaces that are independent of the IA, simple to use, but support generation and modification of complex data structures.</p>

<h2>SAAS and the service business</h2>

<ul>
<li>cloud</li>
<li>internal delivery in a SAAS way</li>
<li>devops</li>
<li>APIs and standardization forced by SAAS</li>
<li>changes to customer service model</li>
</ul>

<p>Software as a service models are winning because no one wants to buy software as a product any more. I will cover more of this in another article I have been working on for a bit, but the main point is that enterprise software is a paid big ticket product is dead. The replacements are open source software and SAAS. These are not alternatives though, as people want the open source software delivered as a service, albeit maybe a more commoditized one if there are multiple providers, and many of the SAAS products delivered will be largely built of open source components by companies that run a mixed model. Microsoft is <a href="http://www.theregister.co.uk/2010/03/04/ballmer_on_azure/">going headlong into cloud</a> in a way that redefines what the operating system is. Even purchased software will be delivered in internal clouds.</p>

<p>This changes both how code as written and administered, with the <a href="http://lethargy.org/~jesus/writes/a-job,-a-mission,-a-career-all-without-a-path-or-a-name.">web operations</a> joining up into rolling delivery and creating the emerging field of <a href="http://www.devopsdays.org/">devops</a>. Developers need to understand operations and how to build code for this environment.</p>

<p>The service business as a business is different from the product business. Open source companies have got that better than product based vendors, but the less there is lockin the more key these changes become. The <a href="http://www.interwest.com/software-as-a-service/on-demand/vp-of-customer-success-critical-to-the-saas-business-model/">success of the customer using the services becomes the key business driver</a>.</p>

<h2>Performance and scaling, real time</h2>

<ul>
<li>cloud has pushed scale up out of picture</li>
<li>scale out transparently</li>
<li>new technologies beyond RDBMS that fit CMS </li>
<li>dynamic generation becoming the norm; Google pushing the performance thing; the industry norm of 100ms will fall</li>
<li>real time becomes more important &#8211; dynamic updates, forget crawling, Google is going push</li>
<li>backend: queuing (0MQ, AMQP)</li>
<li>frontend: websockets, XMPP, long polling </li>
</ul>

<p>Just buying big hardware for scale up is really becoming difficult; the web vibe has always been to scale horizontally on commodity hardware. There is a lot of development around scale out <a href="http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/">technologies such as NoSQL</a> which fit into the WCM data models, which are those of the web after all.</p>

<p>As well as scaling for volume, latency and real time are becoming key. Google&#8217;s time to crawl has been falling rapidly, to a few or less, but is <a href="http://www.readwriteweb.com/archives/google_developing_real_time_index.php">moving to real time</a> with push updates. Twitter has really pushed the boundaries of expectation for real time. Behind the scenes there are a lot of technologies for efficiently pushing around notifications and events both at the backend and on the frontend. Real time is going to become increasingly pervasive.</p>

<p>Page generation times will need to fall; the standard industry benchmark of 100ms per component will probably need to be halved; overall total times under 1s will become the norm.</p>

<h1>Security</h1>

<ul>
<li>web increasingly hostile</li>
<li>every bug is a potential security issue</li>
<li>security focussed on fewer areas, push into the OS not out to applications</li>
</ul>

<p>I read the excellent <a href="http://lwn.net">Linux Weekly News</a> every week, and every week there are <a href="http://cwe.mitre.org/top25/">security exploits</a> for many pieces of software; one that really struck me recently was the <a href="http://www.h-online.com/security/news/item/Possible-backdoor-in-the-e107-CMS-913588.html">major exploit against the CMS e107</a>. What happened here was the a group of crackers found a serious security flaw in the CMS, which they began attacking systematically. When the patch was released however, they already had control of the developer&#8217;s website via the flaw, so they replaced the patched version of the code with a version with a backdoor. Hacked websites are a vital part of the underground <a href="http://www.securitytube.net/Phishing-%28Evil-on-the-Internet%29-FOSDEM-Talk-video.aspx">online crime scene</a>, and a content management system is a high value target. Expect much more of this, and be prepared.</p>

<p>Narrowing the security into fewer points of vulnerability, sandboxing, using every available facet of the operating system&#8217;s security layers; make the most of processes, permissions, everything that you get there; I <a href="http://blog.technologyofcontent.com/2010/01/scaling-security-and-architecture-in-2010/">wrote more about this in an earlier post on emerging trends</a>. File format parsing is another area of vulnerability that is common.</p>

<p>It is war out there on the internet, and many people underestimate or ignore the issues, and too many programmers do not code defensively by habit.</p>

<h2>Summary</h2>

<p>It is an exciting time in web content management right now; the industry is growing up beyond its beginnings as a way of getting web sites up, towards being the core of the broader content management industry. The choices made now will shape the industry; the next generation of products will be a big step forward forr the industry.</p>

<p><a href="http://dilbert.com/strips/comic/2009-07-26/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/1000/700/61747/61747.strip.sunday.gif" border="0" alt="Dilbert.com" width="450"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>NoSQL and content management</title>
		<link>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 23:34:15 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[data modelling]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=216</guid>
		<description><![CDATA[I went to many of the first ever NoSQL devroom talks at FOSDEM this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from [...]]]></description>
			<content:encoded><![CDATA[<p>I went to many of the first ever <a href="http://nosql.mypopescu.com/post/385372130/your-chance-to-review-the-fosdem-nosql-event">NoSQL devroom</a> talks at <a href="http://fosdem.org">FOSDEM</a> this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from memory; Tim Anglade gave an excellent introduction where he reminded people of the historical roots, both before relational databases and since then; so not new but there is a renewed focus now. Why is that? I am going to look here at the field of content management and why you might be interested in different data models if that is your problem space, based loosely on some of the ideas from the talks at FOSDEM. There was a talk about <a href="http://outerthought.org/blog/blog/353-OTC.html">content management specifically and the Lily CMS by Evert Arckens</a> although I missed it, but I have added some comments after watching the video.</p>

<p><a href="http://www.flickr.com/photos/justincormack/4375594326/" title="FOSDEM by Justin Cormack, on Flickr"><img src="http://farm5.static.flickr.com/4029/4375594326_7ebdafd796.jpg" width="450"  alt="FOSDEM" /></a></p>

<h2>The data model for content management</h2>

<p>I have another draft post on this subject in more detail, which I am working on as parrt of my REST modelling in content management work, but I will outline some of the types of data relations that are important. I will be quite abstract here, if you want more concrete examples you will have to wait for the other post: database models like the ones we are talking about here are more easily understood in the abstract I think.</p>

<p>First we our unit of modeling. This in itself is the first issue. Content management tends to deal with, at the conceptual level, something that looks like a document. It may be a fragment, in the sense that it is say a page component (asset if you use that terminology) rather than a whole item, but the unit for the user to edit and which is usually versioned is a structured object itself. The processing model tends to treat it as almost of binary blob, except that certain properties can be extracted, such as metadata, links in HTML and so forth, but it is stored as an item rather than decomposed further.</p>

<p>OK, so we have a piece of content and some attributes extracted from it as one basic model. This corresponds pretty much to the JCR data model for example. There are variations; sometimes people do not store metadata in the file formats, as historically many file formats had poor support for arbitrary structured metadata, although that is largely obsolete now, and the advantages of actually storing metadata and relations substantially within documents are high. External storage does not change the model much, just complicates processing and storage. Another variant, often seen in document management systems is to be able to have multiple &#8217;streams&#8217; ie several document variants rolled into one, for example a video and a still from it. You can however from the modelling point of view regard these as anotehr compound document format kept together because conceptually they are a bundle of content; you might distribute them as a zip file if you havent got any other suitable container format.</p>

<p>So now we have a storage model where we have a blob, with rich media operations on it, and extracted structural and metadata information. There is also versioning to consider, but let us ignore that and treat it either as part of the blob, or as a new document with some relation to the old ones, those being the two core versioning models, this does not really affect anything else.</p>

<p>There are two kinds of metadata, although they are more similar than they appear, properties and relations. Properties are the standard attributes (this picture depicts sheep), while relations join two items in the repository (this is a cropped version of this other picture). Although this distinction seems clear, in the end richer information architectures demand that everything becomes a relation, so I can browse a sheep node and find all the sheep items, turning every attribute value of any significance into a node with relations instead. Pure attribute values are only left for the less interesting properties (this PDF file is 176k in size).</p>

<p>They are also less interesting from a relational versus non relational storage point of view, although there is one important point, which is the dense versus sparse question, so let us take a look at this. Most real world attributes are sparse, that is most attributes aare not set on most items. In the relational model we have a row for our item, and columns for all the attributes, so we are saying most are NULL. (I was brought up on matrix algorithms and still think in terms of sparse versus dense matrices as this is exactly the same problem, and matrices represent graphs anyway). Storing huge mainly null tables is not very efficient, so there are two common practices in relational mapping of attributes in content management systems. First is to define a type based system, where a particular type of content item is defined to have certain attributes (or at least fewer NULLs!), and each set of that type therefore can have its own table which is assumed to have fewer NULL values. Mixins, sets of properties that live across types can potentially be added to this model, as can inheritance schemes, but the basic idea is one table per type. This gives a nice simple direct database programming model, and causes a complete nightmare if you ever want to change the schema, for example add an attribute, as for any large database most DBMSs will effectively shutdown the system while a schema change takes place, as schema changes require pretty much all locks. <a href="http://www.silverstripe.com">Silverstripe</a> is one example of a content management system built like this; there are many others.</p>

<p>The alternative is the <a href="http://en.wikipedia.org/wiki/Entity-attribute-value_model">entity attribute value</a> (EAV) model (terrible Wikipedia article, please fix), where rather than a direct mapping of the attributes to relations, you indirectly map, creating a table that joins entites, attributes and values; this table of course looks just like RDF triples. Doing this though loses everything that makes a relational database useful: constraints, typing, query optimization. It adds an extra layer of logical schema above the physical schema which the database layer does not understand. This is a pretty common relational mapping for content management systems, as it allows full flexibility in defining and redefining attributes. To implement well it needs a large mid layer to manage the constraints, provide an API layer, generate efficient queries, effectively to manage the logical layer to physical layer map. The <a href="http://drupal.org/node/82661">Drupal CCK</a> is an example of this model.</p>

<p>Of course this is not to say that neither of the two relational models do not work. The direct mapping works well with simple, unchanging content types in small websites, for example, or in models where attributes are not very sparse, or the sparseness is worth the overhead, and changing the schema is rare. EAV works well too, if managed carefully; it helps if the type of queries required on the model are not too complex.</p>

<p>Once you add relations as well as attributes, the already difficult mapping layer gets harder; you add another set of operations (recursion to handle tree structures) that the relational model does not handle well, so you may need to add more into the mapping layer. The promise of NoSQL is that you can bypass this for these types of applications, and program directly to a database model that handles sparse attributes and relations natively. But how much do the NoSQL databases get you? You can argue that if you are already looking at EAV, then you are already not getting much from a relational database, and you are building a modeling layer on top of it, so dropping that and going for something that maps the logical data layer directly does make sense from a development point of view. Whether that really helps performance is less clear; much of the original work for NoSQL has come out of huge scaling, big problems, not actually providing efficient solutions to the types of data mapping problem we are seeing here on a medium scale; of course for huge sites there may be benefits.</p>

<p>The types of NoSQL database vary in their level of support for attributes and relations as they are used in content management. Document oriented databases do not give you much more than retrieval of content items; associative ones give key value type attribute lookups; graph databases should let you query relations directly, expressing the types of queries that are needed for information architecture problems directly, in principle. Examples I am thinking of are things like tag clouds, which is simple to express as a graph problem as it is simple a count of the number of edges from a set of nodes. Indeed most information architecture problems look like graph problems, and also like <a href="http://en.wikipedia.org/wiki/OLAP_cube">OLAP processing operations</a> which also do not work well on relational databases. And of course one of the things that NoSQL has shared with OLAP is the use of denormalization; you can use simpler models if you denormalize data to match the queries you will be using, rather than assuming that the types of query you will use can necessarily be optimized and made efficient by a general purpose system.</p>

<p>Denormalization is not without its difficulties, although arguably it could become a tool embedded in databases like indexes are now. One of the issues with NoSQL is most of the database systems leave denormalization to the user: you need to use it because joins are not available, but you have to manage that yourself. Building an infrastructure to explicitly manage denormalization as a first class database item akin to an index might be interesting. So that gives us a first issue, as in any NoSQL system except a graph database we will either need to denormalize or compose queries to get the results we want.</p>

<p>So I think there are four realistic models for content management backends going forward:</p>

<ol>
<li>The direct relational model for small systems with simple data models, rare attribute changes, little or no use of relations.</li>
<li>EAV models wrapped in a content modeling layer; JCR is an example of this, hiding the underlying SQL layer very well, and indeed allowing it to be replaced with another underlying storage model potentially; I am sure someone is testing a Neo4J backend somewhere. This is where most production solutions are at now.</li>
<li>Direct, nondenormalized graph database backends, with the raw content stored in a document store. Cuts out a special purpose middle level by mapping the domain more directly. As <a href="http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html">Emil Neo</a> says, it may not scale right up as far as the othe NoSQL technologies, but it cuts complexity of implementation; there are also issues about whether all the kinds of queries required are available efficiently. I think this will be the sweet spot in a few years once the products mature and we see more open source activity in the field. Of course RDF based solutions, for example using SPARQL fall into this category too, and the maturity of products around these technologies will help drive this category as well as the NoSQL models.</li>
<li>Big, denormalized systems, probably with software support for managing the denormalization, and using underlying simple but scalable technologies like key-value stores. These already exist in large scale web applications, but may remain niche if the development effort remains high. If frameworks for modelling more easily on these turn up they may trickle down for performance reasons even on smaller datasets; a key value store runs fine on a relational database backend, although the types of processing required probably means a specialized backend is useful.</li>
</ol>

<p>Note that the <a href="http://lilycms.org/">Lily CMS</a> which there was a talk about fits very much into the fourth option above; this is where the NoSQL technologies have perhaps seen most use, but I think there will be a lot of work in order to build a CMS like this now, in particular in terms of tools to support denormalization strategies that are needed. The outlined approach sounded much like the outlines I have been thinking about for this type of model, although I would focus more on tooling for denormalized queries and less on scaling other parts like full text search right now. It will be interesting to follow the progress of this project.</p>

<p>We are at an interesting juncture, where it looks like there are some options that will let us do domain modelling in a way that corresponds more directly to the domain, but there are a lot of interesting challenges on the way.</p>

<p><a href="http://dilbert.com/strips/comic/2008-02-12/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/800/1869/1869.strip.gif" border="0" alt="Dilbert.com" width="440"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Standards Diagram for Content Management</title>
		<link>http://blog.technologyofcontent.com/2010/01/standards-diagram/</link>
		<comments>http://blog.technologyofcontent.com/2010/01/standards-diagram/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 23:17:33 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=191</guid>
		<description><![CDATA[This is attempt number 1 of a diagram which I promised Jon Marks after his post. No it still does not have OSGI in! As Jon&#8217;s presentation used Prezi I thought I would give it a go. It takes a while to get the hang of it but it is fun. I can&#8217;t work out [...]]]></description>
			<content:encoded><![CDATA[<p>This is attempt number 1 of a diagram which I promised <a href="http://jonontech.com/2010/01/10/an-incomplete-directory-of-open-standards/">Jon Marks</a> after his post. No it still does not have OSGI in! As Jon&#8217;s presentation used <a href="http://prezi.com/">Prezi</a> I thought I would give it a go. It takes a while to get the hang of it but it is fun. I can&#8217;t work out how to get an overview at the end though&#8230;</p>

<p>Just press the play button to move around.</p>

<iframe height="300" src="http://prezi.com/nifkatyvrk02/view" width="450"></iframe>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/01/standards-diagram/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Open source and Content Management (for Janus Boye)</title>
		<link>http://blog.technologyofcontent.com/2010/01/open-source-and-content-management-for-janus-boye/</link>
		<comments>http://blog.technologyofcontent.com/2010/01/open-source-and-content-management-for-janus-boye/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 14:45:31 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=186</guid>
		<description><![CDATA[Janus Boye said the other day after the BCS open source seminar in London


  @McBoof I left London dazed and confused when it comes to open source. Somebody pls. help me explain what open source really means #idiot


Now I only spoke to him very briefly before he had to rush to the airport, but [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://twitter.com/janusboye">Janus Boye</a> said the other day after the BCS open source seminar in London</p>

<blockquote>
  <p>@<a href="http://twitter.com/McBoof">McBoof</a> I left London dazed and confused when it comes to open source. Somebody pls. help me explain what open source really means #idiot</p>
</blockquote>

<p>Now I only spoke to him very briefly before he had to rush to the airport, but hopefully the following will be helpful: first an overview of the important things about open source in general, and then how they are and will affect content management in particular.</p>

<h2>Open Source</h2>

<p>I think it is  easiest for software developers to understand open source. It came from that community, and it addresses our needs. For a long time no one outside that community was really concerned with it. I think the first time I noticed someone who was not a developer showing an interest was when I was stopped at a tram stop in Vienna by an American as I was wearing a Redhat T-shirt just after their float and was asked who the next open source IPO was going to be, that was 1999 ten years ago now. I suppose that IPO was a big event in the spreading awareness of open source, although it did not perhaps spread much information about what it was really about.</p>

<p>I think the best place to start with trying to understand open source is with three things. I tend to have a bit of a historical approach to things&#8230;</p>

<p>The first is Richard Stallman. I recommend <a href="http://www.fsf.org/events/rms-speeches.html">him in person</a> rather than in writing actually. Actually that reminds me the first time I ever saw him I was sitting in the <a href="http://www.foundry.tv/">Foundry in Old Street</a> and he walked in and proceeded to autograph a woman&#8217;s breasts. Anyone who wants to understand open source should hear him explain the roots of the open source movement. I will not really try to explain all that here, but openness is what created the scientific method, and the idea that software got to the point where it was no longer possible to make it do what you wanted because you did not have access to source code, the point where you could not build on stuff any more or fix it, where control of your tools is taken away is a key part of it. Some people have tried to sanitize Richard out of things (the open source vs free software mess) but that is a mistake.</p>

<p>Second is Eric Raymond&#8217;s essay <a href="http://catb.org/~esr/writings/homesteading/">The Cathedral and the Bazaar</a> which wass very influential at the birth of commercial open source. It is strictly about software development methodologies, and much of the discussion about the cathedral methods is applicable to open source software too. It is about the huge changes that the internet brought in open source development, the birth of a development method that no longer copied the methods of closed source development but utilised the openness to create true large scale community development in a way that was not possible before, and which closed source cannot replicate. Linux is of course the classic early example of this.</p>

<p>Which brings us to the third thing, community. Open source is first of all participatory, not just for consumption, perhaps a bit against the grain of late twentieth century culture. Actually I am an optimist, <a href="http://www.herecomeseverybody.org/2008/04/looking-for-the-mouse.html">with Clay Shirky and against the sitcom</a>, and think culture is swinging this way but we shall see. So for open source, start by using it, then participate. No you do not have to code, although you can learn, there are other ways, bugs, documentation, all sorts. If you just want too see what the community looks like, I can&#8217;t recommend anything better than going to a good conference, like <a href="http://fosdem.org/">FOSDEM next month in Brussels</a>.</p>

<h2>Open source in content management</h2>

<p>Open source has not affected content management much yet. Almost all content management by volume takes place on open source products (by volume Wordpress, Joomla! and Drupal far outweigh anything else). By value it is less clear, open source always has an issue with by value calculations as the revenue models are different, Linux is not the leading server operating system by value, but is by installed base, but is also probably by the value of the services running on it.</p>

<p>But arguably open source content management software has not affected the industry yet, looking now at the larger installations, and the areas that Janus is interested in, indeed that I am. The industry has grown up in a mess as far as standards, ideas, infrastructure are concerned, but the <a href="http://en.wikipedia.org/wiki/Reality_Checkpoint">reality checkpoint</a> has been reached. Two standards have so far started to change the technology landscape of content management, JCR and CMIS, and almost all the implementations of these are open source, and most are cross-vendor projects. This change will grow as more standardization and commoditization sweeps the industry, as the industry adopts a web infrastructure rather than the pre-web legacies inherited from the document management history of the business. Everything that this business deals with will be served through the web; almost all web infrastructure is open source software; content management will be no different.</p>

<p>In this field this is all just beginning. Like open source as I said above, it started with developers, about more efficient ways of building, architecting and delivering software; in terms of influence on the end users it is still small. But things are turning as people become aware of open source in the industry, but they clearly still need some help understanding it. I hope this has helped.</p>

<p><a href="http://dilbert.com/strips/comic/2007-08-03/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/600/1676/1676.strip.gif" border="0" alt="Dilbert.com" width="480"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/01/open-source-and-content-management-for-janus-boye/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Wave experiment: Things We Hate About Content Management</title>
		<link>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/#comments</comments>
		<pubDate>Sat, 24 Oct 2009 13:33:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Wave]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=135</guid>
		<description><![CDATA[Experiment with writing in the wave.]]></description>
			<content:encoded><![CDATA[<p>Well, that was the story, six people in content management writing a blog about stuff using Google Wave. Mostly for the first time I think; something to do with those fresh invites.</p>

<p>Other links are here: <a href="http://jonontech.com/2009/10/23/a-collaborative-google-wave-blog-post/">Jon Marks</a>, <a href="http://irinaguseva.wordpress.com/2009/10/23/things-we-hate-about-content-management/">Irina Guseva</a>, <a href="http://www.persuasivecontent.com/i-predict-a-cms-riot-1-hour-6-people-1-wave">Ian Truscott</a>; other participants Adriaan Bloem, Andrew Liles, <a href="http://contentedmanagement.net/blog/bove-the-contentious-waves-he-kept/">Philippe Parker</a> (first use of Wave over GotoMeeting?)</p>

<p>Well it was fun. Technical difficulties, lost sync and crashed a few browsers, some people lost whole machines though. Safari coped better than Firefox. It took a while to realize what was happening here, hey but this is in beta!</p>

<p>As a brainstorming tool at worked pretty well. I thought it scaled pretty well. The named cursors indicate who is in the bit you are in, but for brainstorming you can look, write another point, move, continue, not edit much. After half an hour of getting to bulletted lists, a bit of moving around the heavy writing started (after a discussion at the top in our proxy process section; we should have split the thing up a bit).</p>

<p>There is a great tendancy to write temporary notes about the discussion and then just delete them. Which feels odd, data and metadata together of course. The editing process was odd, you would find orphaned bits, move things, try to join stuff up to make it flow, while it was all changing around you. Pretty chaotic. Bits that no one expanded into prose got junked (quite a good edit method, as they couldnt stand up themselves).</p>

<p>Here is the &#8220;finished&#8221; article&#8230; which cannot be attributed to anyone individually of course&#8230; the subject was chosen about 10 minutes in, just as something people would have something they could easily contribute into this situation, there are some good points in there though!</p>

<p><strong>Things We Hate About Content Management</strong></p>

<p><em>- By The Motley Crew</em></p>

<p>It was a lovely Friday morning/afternoon, and we were Waving. The experiment initiated by McBoof (yes, that one) brought together 6 CMS folks from around the world. The event gathered together analysts, journalists, vendors, system integrators to Wave on a topic that was decided at that very moment. We had one hour (in between conference calls and other job thingys) to pick a topic and Wave it.</p>

<p>A little collab on what exactly to Wave about later, we decided to do &#8220;a mindmap of things we find annoying in CMSs.&#8221; To up the ante, we also decided to take the original bullet points (deemed &#8220;too easy&#8221;) and convert the whole thing to prose. Was the tool given really up to the task? Were our minds flexible enough to wrap around this kind of realtime collaboration?</p>

<p>In the beginning &#8212; we blame the tool <img src='http://blog.technologyofcontent.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  &#8212; we were Drowning, not Waving. We (almost) didn&#8217;t fight about edits. We almost didn&#8217;t step on each other&#8217;s toes. All in all, it turned out to be a fun and productive collaborative exercise. Read on to see for yourself.</p>

<p><strong>Cosmetic Issues</strong></p>

<p>There really should be a CMS UI fashion police. As there should be a Magic Quadrant for shoes and handbags. Why? Well, there&#8217;s a couple of issues.</p>

<p>For instance, sloppy, non-designed design. You know the kind of thing that has not been thought about and reworked and made to feel right. The sort of thing coders do if you don&#8217;t force them. But at the same time, over-designed interfaces can be just as bad: the designers and developers really need to be on speaking terms.</p>

<p>When building a system that works, you can&#8217;t have the development team in the basement on a sustenance of Jolt coding away into the night, and the designers in the penthouse in turtleneck sweaters sipping espressos. Too many CMS designs end up being programmer vs. end-user friendly. And this is not the best way to charm away those marketing and web content folks.</p>

<p>Developers and designers need to talk to each other and essentially, both should talk to users &#8211; not just eat your own dogfood &#8211; but listen to what dogs like to eat. A developer or UI designer are not content editors, marketers or knowledge and information workers.</p>

<p>Some vendors say that the agonizingly and depressingly black UI backgrounds are hip and modern. Well, they are not, really. Who told you that? Especially if you add a Star Trek theme to it and sprinkle in some stars and cosmic swirls, because if Apple does it, it must be cool right? Not pointing any fingers, but I would quit if I were a content manager having to spend my 9-5 staring into the &#8220;black hole&#8221; of some of the CMS UIs that are out there on the market.</p>

<p>Even pop-ups seem less annoying when compared to dark UIs. Which brings us onto&#8230;</p>

<p><strong>Interface Issues</strong></p>

<p>Interfaces need a comfortable lived in feel. Content management is something people work with every day, it is their interface to their job. You meet people who hate the interface, and that makes their work a heap of pain. I have seen people who describe the 44 clicks it takes to insert an image. You have a responsibility to these people, to make them love the content and make the tool disappear.</p>

<p>We all hate it when the interface does something on its own that ruins your context. E.g. a page refresh, or in Wave the jumping around of the scrolled window in some cases <img src='http://blog.technologyofcontent.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  Or the lack of an easy way to bookmark, so you can reference someone to the content. Remember people will be collaborating and need to send links around. Make sure the UI is a proper web application with URLs. And why do tasks that are easy to describe and often repeated in exactly the same way still take more than a few clicks? (Or maybe even dozens of clicks.) With bonus points for forcing users to use dialogs or tabs to enter mandatory information. Remember people do not have all the information in the right order.</p>

<p>Also, we need sane conflict merges. Check in and check out is too extreme for most uses. But people want to edit offline still. Of course Wave doesn&#8217;t have an offline: Google thinks this problem is going away, it&#8217;s real time so there are never conflicts (that&#8217;s defined in the XML protocol; it&#8217;s quite interesting if you are that way geeky). Does Google have the right answer here? Well, the Motley Crew is struggling here, and some browsers lost sync during this experiment.</p>

<p>&#8220;Power users&#8221; (those who use it all day long) of CMSs needed to have a &#8220;Desktop&#8221; experience. What does Desktop Experience mean? Well, it doesn&#8217;t really have to be on the desktop &#8212; these days it is perfectly possible to get very close to a hitherto Desktop experience in a browser or similar. these are qualities: very low latency from action to response, no page refreshes, modal and modal-less dialog boxes as appropriate, &#8220;push&#8221; notification.</p>

<p><strong>Architectural Issues</strong></p>

<p>Architectural issues of the wave overtook any architectural issues of Content Management Systems. The fact that we authored this entire article in a single blip didn&#8217;t help, and slowed everything down enormously. McBoof learned the hard way that he really need a new laptop and spent most of the session giving his machine CPR. Next time we&#8217;ll do each paragraph in its own blip to stop FireFox going down like a Led Zeppelin.</p>

<p>Monolithic systems. Build it out of pieces that the client can not use all of. Obviously your pieces may work together better, but there should be components. Do not try to reinvent all kinds of wheel. &#8220;Best of breed,&#8221; though, is just another weasel marketing idea, as if systems are pinnacles not about meeting requirements.</p>

<p>Marketeers are adroit at using the term Best Practice to position Their Way as the only way that a particular matter can be solved. (Many of us live in that netherland of having to pedal that point of view, but it is a falsehood that the careful buyer should try to see through.) I think this devalues genuine best practice, vendors should cite references</p>

<p>Most often a marketeer&#8217;s Best Practice view is the only one they subscribe to as their product development has paddled up the wrong stream and cannot or won&#8217;t reverse their architectural design (probably because of the cost of doing so). This intransigence most often causes a product to doom itself. (Think of IBM and The Mainframe Is The Only Way To Do Serious Business).</p>

<p>Who really still believes that there is a place in this world for Flash or Java Applet based Rich Text Editors? TinyMCE, FCKeditor and others are filling the gap left by Ektron when they bit the hand that feeds and entered the CMS market. Ephox is trying to spread, but I find it difficult to come up with an excuse to use an Applet over HTML with javascript these days. Stick with the standard.</p>

<p><strong>Business Issues</strong></p>

<p>Where you are buying into something that you may very well need to change or integrate with there is strong benefit in considering Open Source. Open Source used to frighten commercial software companies but we have come along way on that road to understand that commercial organisation can operate in an Open Source world and benefit. This does not necessarily mean that their prized system needs to be fully opened up, but taking the spirit of it to mean that you are completely open to people seeing and learning from your code how it operates.</p>

<p>Exactly what you need to see opened up varies. In a CMS there may be a subsystem that stores the content or one that allows a Rich Text Editor. These arguably don&#8217;t need to be opened up, but when a CMS ships with modules for, for example, an RSS feed widget, calendaring tool, prebuilt webforms, users who then want a variation on this module can benefit from seeing how the &#8220;pros&#8221; did it, they can then use it as a starting point for their own different implementation.</p>

<p>We really don&#8217;t need vendors that pay lip service to the buzzwords. When they think the new CMS buzzword &#8220;engagement&#8221; is just a screenshot of Google Analytics. Or when they add an image picker and call it DAM. And a cross-over between WCM and ECM? Don&#8217;t think WCM is like ECM and it&#8217;s about organizing content, not about effectively communicating with the audience. And don&#8217;t think that if you organize the content, you can automatically communicate effectively.</p>

<p>Completely different, but equally frustrating, is procurement (and the procedures that go with it.) Procurement folk don&#8217;t recognise the importance of user adoption to the success of the project &#8212; of the black background and all the UI issues pointed out previously. If a CMS is procured according to procedure, the selection is a success to them. But those same rules are often a recipe for ignoring what the users really need.</p>

<p>At the same time, budgets that aren&#8217;t transparent are an issue &#8211; customer and vendor should be able to have a sensible grown up conversation. As a customer, of course you want good value, but how cheap are you? But to vendors: many licensing models don&#8217;t make any sense, and force you to do stupid things. People are scared to have that conversation &#8211; the best architectural fit first I say, lets figure out an appropriate license around that.</p>

<p><strong>Conclusion</strong></p>

<p>So much hatred rolled up into a tight little ball of anti-CMS rage. Who would have expected it from such a respected bunch of CMS folk. We hate the designs, the interfaces, the architectures and the business. Time for a beer/wine? Wave good bye!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Content enabled vertical applications (composite content applications) &#8211; executive briefing</title>
		<link>http://blog.technologyofcontent.com/2009/10/content-applications-briefing/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/content-applications-briefing/#comments</comments>
		<pubDate>Sun, 11 Oct 2009 21:39:29 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[CEVA]]></category>
		<category><![CDATA[content]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=124</guid>
		<description><![CDATA[Content enabled vertical applications (composite content applications) - executive briefing]]></description>
			<content:encoded><![CDATA[<p>I noticed that content enabled vertical application recently became the top search entry point to my blog. Now as I have only written one article about this, and it rapidly rose up the Google rankings, I reckon there must be a dearth of content on this subject. I guess that may not be too surprising, as it was initially a Gartner attempt to describe a path people wanted to follow rather than a generally used description. Gartner have also decided that the content enabled vertical application (CEVA) is now called the composite content application (CCA), possible to confuse everyone even further. But they do matter to your content strategy.</p>

<p>So what are these, why do you want them, and what do they mean for your content strategy?</p>

<p>Toby Bell at Gartner says</p>

<blockquote>
  <p>Smart companies have begun linking more of their content to industry-specific, human-centric
  processes, such as insurance claims handling, or supporting research on new drug development.
  This approach usually means building or modifying the content-enabled vertical applications
  (CEVAs) on top of ECM environments. CEVAs typically help to automate complex processes that
  previously required workers to manually sort through paper documents and other forms of content
  (in effect, a way to manage down costs of exception handling) and optimize the remainder of the
  work.</p>
</blockquote>

<p>This seems to be not a great summary though. Look at it from another view. Your enterprise content was disorganized, living on network shares, random websites, legacy systems, all over the place. Your ECM strategy was to first to consolidate to reduce costs of multiple systems, and to improve findability, get to a base enterprise content management position. But what next? Where is the next value?</p>

<p>Content strategy starts here. There are many parts to this, covering creation, lifecycle and reuse, audit, consolidation quality and so on. The CEVA part is about the delivery and interaction with content within other non content focussed areas of the business. Content management often seems to be about specific content focussed parts of a business, such as, historically, technical documentation and, more recently, online marketing material. Plus a load of unstuctured stuff like emails and generic office documents. The areas such as technical documentation have had high value often for legal and regulatory reasons, so structured processes were created early; these effectively created the content management industry initially. Web content historically had different solutions because it turned up and became important when the general purpose tools (Word!) could not usefully author it.</p>

<p>But all the stuff classified as &#8220;other&#8221; does have underlying processes, content processes. Some are formalized in systems, the original paperless office systems, the roots of the document management industry in forms and scanned paperwork processing. This stuff generally sits more on the &#8220;data&#8221; not &#8220;content&#8221; side of business processes. Long term this distinction is not such a useful one, and data and content resources will merge together into a single enterprise resource architecture. The majority of processes with content though take place through informal channels, particularly email with Microsoft Office documents. These are the document types you tried to take control of through ECM.</p>

<p>So ECM took the documents that were behind many processes and made them findable and organized them. But at the base level a content repository is just that, a repository. It deals with basic issues such as versioning and permissions, search and findability, and some organization, but it does not really deal with process and processing.</p>

<p>Process and processing are the valuable parts in the lifetime of most content. Imagine the lifetime of an insurance contract say, with payments and claims and disputes, or an employees personnel file, or a technical manual over the lifetime of a product. A CEVA or CCA is an application to support these lifetime processes.</p>

<p>It is also an application to support the relation of a document&#8217;s lifetimes processes to other systems. Your CRM system may need to know about insurance claims, your sales department may need to know about expiry, your website may need to know about new documentation releases, content changes do not happen in a vacuum.</p>

<p>One class of CCA that is common but is rarely perceived as that is a software application with embedded content. Once that was just embedded &#8220;help screens&#8221; with content tools to manage them, then came internationalization, with a different set of tools. But these desktop applications are being rapidly replaced by web applications. Web applications are much more content driven, they may live in an SEO facing world, they may live in a customer facing world that may consider usability, they may be multilingual, and they are not driven by the developer-centric ideas of help screens and manuals. Content and application can live together, but this requires new ways of using, reusing and versioning content, and pulling content out of application release cycles so it can reflect non application changes, such as the marketing environment, usability improvements, corrections and enhancements. These applications were historically development led but as they mature the content aspects become key business drivers, needing content management integration.</p>

<p>So what do you need to build this type of application, and what should your decision criteria on platforms be?</p>

<p><a href="http://stephanecroisier.jahia.com/from-content-composite-to-content-solutions">Stéphane Croisier says in a good survey</a> &#8220;So rapid raw composite assembly, fast integration and ease of use are the three new pillars of next generation content solutions.&#8221;</p>

<p>The first thing to bear in mind is that you need more than a repository, you are looking for an application platform too. The ease of use issue is important. Long term you need to be looking for something that staff can build simple tools from, even if you are hiring specialists for the complex projects. Ease of use is a two way thing, as you need an easy to use platform that lets you build easy to use applications. And ease of modification and maintenance is equally important as these applications may need to be fluid. You are likely to need external support to build more complex applications on the same platform, so availability of this is important too. Ignore the jargon of portlets, widgets and mashups: none of these so called standardisations have much traction; we are talking application development, use what you have or can hire developers to do. Ask the vendors what their platform strategy is.</p>

<p>Stéphane identifies a trend towards solutions, ready to go solutions for common problems; these may be useful but I would not choose a development platform on the basis of the availability of particular solutions or you may end up buying a platform for every solution. A longer term view of the viability of a platform for other solutions is necessary too.</p>

<p>Long term, remember that content application strategy is part of content strategy, and comes after that. You need to know what your content applications are and will be, and have built an underlying respository, authoring and reuse strategy first. Applications are where developers need to interact with this to achieve the long term goals.</p>

<p><a href="http://xkcd.com/388/"><img src="http://imgs.xkcd.com/comics/fuck_grapefruit.png" alt="fruit magic quadrant" width="450"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/content-applications-briefing/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>RESTful daydream #4</title>
		<link>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 15:34:41 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[CMIS]]></category>
		<category><![CDATA[jcr]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=110</guid>
		<description><![CDATA[In favour of a REST architecture for a web content repository]]></description>
			<content:encoded><![CDATA[<p>This blog post has gone through far too many iterations, and taken far too long to write! It got much shorter in the process though.</p>

<p>It started with an idea I had, in an innocent sort of way. I thought if I looked at the JCR specs for a bit I might find some kind of way of building a non Java interface with them. You know, maybe there might be a nice REST architecture waiting to get out. But of course there is no such thing. It is an application definition. There are not even that many ways of implementing it, other than choosing your object persistence method to be a database, file, or something else.</p>

<p>The REST architecture is notionally provided by another layer, such as Apache Sling, but Sling is in no way a REST layer, it is a URL dispatcher and scripting and application layer with which some REST style applications can be developed. With that you end up with a pretty heavyweight development framework, indeed together you have much of Day&#8217;s CMS offering in effect, rather than a lightweight REST repository solution.</p>

<p>I had a look at CMIS again. Fielding once <a href="http://roy.gbiv.com/untangled/2008/no-rest-in-cmis">laid into CMIS for not being REST</a> and you can see why, although some improvements have been made since that. Although resources are discoverable through hypertext, there is a fair amount of semantics that needs to be known to understand what a type or a checkout means, and the search queries are obviously just RPC wrappers. It is not too bad though, but unfortunately the data model does not map well onto web content management right now for obvious historical document management reasons. Fixable? I think it serves a part‎icular purpose well and should probably not be forced into anything else, as we need it to succeed in its field.</p>

<p>Day claims that JCR is <a href="http://dev.day.com/microsling/content/blogs/main/fudbusting2.html">not a Java standard</a> in an odd way, that you can implement the API in another language. Thats a strange argument to make, especially as the types are defined as Java types, and standards without interoperability are pretty vague. Without some sort of wire format or ABI this is meaningless outside the JVM world. People are making <a href="http://www.simpcore.org/">JCR like repositories in PHP</a> but outside any standards process, so in the end this just becomes a PHP repository project; Typo3 seems to be building another, also closely aligned to JCR.</p>

<p>The problem with these efforts is that it is not helping the balkanization of web CMS, which is already fragmented by language and API, which is ridiculous in an industry that is about the web. The web has an architecture (REST) and an API (HTTP). Building web content management on Java APIs or PHP APIs or .NET is a legacy way of thinking; it is acceptable for document management given its role in existing enterprise architectures, but it is not going to work if we want to get widespread acceptance in web development; in the short term it is the easy path, it is what people are used to, but a forward thinking industry needs to look at defragmenting the landscape and building future proof tools.</p>

<p>The odd thing is that a web content repository alone surely lends itself to a simple REST architecture. Content is after all lots of small resources with relations. Hypertext. It is pretty much in presentation a fairly dumb web application, although with a fair amount going on behind the scenes. It takes content, relates it to other content, and serves it back, with authentication and versioning. Everything else is in other system layers, transforming it and so on. Not simple, but well defined; lower level than JCR + Sling say</p>

<p>So we need to work on a web content repository model, as a community. Process wise, it makes sense for this to sit in an organization like AIIM, as a content management based industry body. It may well be that what ends up coming out of this is more standardized architectures and semantics and open source implementations rather than the tighter prescriptions of JCR and CMIS; I have some ideas along these lines that I need to code up. I have had some discussions and there is a degree of interest in some sort of solution; who is interested? Or is infrastructure dead, everything ust wants interfaces?</p>

<p><a href="http://dilbert.com/strips/comic/2009-09-02/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/6000/400/66480/66480.strip.gif" width="480" alt="Dilbert.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Metadata is not what it used to be</title>
		<link>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/#comments</comments>
		<pubDate>Mon, 31 Aug 2009 14:54:05 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[metadata]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=95</guid>
		<description><![CDATA[
  &#8220;Spent a week in a dusty library waiting for some words to jump at me&#8221; Camera Obscura.


Kas Thomas in his contribution to Julian Wraith&#8217;s popular thread on the future of content management really managed to make me disagree, the first of the posts that has!

The theme of this blog (yes it has got [...]]]></description>
			<content:encoded><![CDATA[<blockquote>
  <p>&#8220;Spent a week in a dusty library waiting for some words to jump at me&#8221; Camera Obscura.</p>
</blockquote>

<p><a href="http://www.cmswatch.com/Trends/1679-Future-CMS-Metadata">Kas Thomas</a> in his contribution to <a href="http://www.julianwraith.com/?p=328">Julian Wraith&#8217;s popular thread on the future of content management</a> really managed to make me disagree, the first of the posts that has!</p>

<p>The theme of this blog (yes it has got one) is that content management is changing as the web way of working starts to infiltrate the enterprise. And the web way of metadata is not what the old document oriented way was.</p>

<p>Kas says &#8220;Keeping knowledge about a file separate from the file itself is a hugely important concept.&#8221;</p>

<p>Looking at that point first. Look at any newish document format, from <a href="http://en.wikipedia.org/wiki/PDF/A">PDF/A</a> to <a href="http://en.wikipedia.org/wiki/EXIF">EXIF</a> via the HTML <code>meta</code> tag, even a Word document and you will find embedded metadata. Remember that documents are emailed around, generally get lost. Metadata is the dog collar with a name and phone number on, saying version me and send me home. This trend is not going away, documents are becoming self contained.</p>

<p>Kas then continues &#8220;A file&#8217;s metadata becomes its interface to the outside world. It&#8217;s like a service descriptor.&#8221; That is simply not the way the web works. Resources use self describing formats like HTML for core data, and then all the other important metadata is linking information. It used to be that you had a blob file type and a program that could understand it, that was the basis for the desktop architecture, but that is not the case any more, even on the desktop you have a choice of applications that can understand a given file type, and your choice depends on what you can do with them. The web architecture goes much further, and resources become fully self-describing, a browser can understand all the web as every resource carries its own description, and bundled code to help you interpret it. A web page is its own service descriptor, and defines application state through hyperlinks. The web architecture has never had service descriptions.</p>

<p>There is one vital part of metadata that is not kept with a file, that is the link. Kas says &#8220;content, on the whole, is becoming richer, less structured&#8221;, missing out completely on the big picture that content is being structured by the imposition of links onto it, by its transformation into a hypertext, that creates a much richer structure than the individual documents have in themselves. Documents contain metadata about other documents in the form of links. The semantic web project is an attempt to add further richness to this structure. &#8220;What does the trend toward richer, less structured content mean for management of content?&#8221; well that means that content management is going to be about managing those links and relations between items, a lot more than it is now, when it came from a background of just managing documents, each an isolated item.</p>

<p>In a way that does come back to &#8220;Keeping knowledge about a file separate from the file itself&#8221; but not at all in the way Kas was trying to argue. Now time to link this to his argument to create a structured discussion&#8230;</p>

<p><a href="http://www.threadless.com/product/1053/Now_That_s_Dope"><img src="http://www.threadless.com//product/1053/zoom.gif" width="450px"/></a></p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Content Enabled Vertical Applications and taking the CMS apart</title>
		<link>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/#comments</comments>
		<pubDate>Wed, 26 Aug 2009 23:00:25 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[CEVA]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=70</guid>
		<description><![CDATA[Part 2 of my response to Julian Wraith's future of content management meme. This part looks at CMS architecture, rather than the technology choices of the first part, talking about application development and content repositories.]]></description>
			<content:encoded><![CDATA[<p><em>This is a second part of my response to Julian Wraith&#8217;s <a href="http://www.julianwraith.com/?p=313">future of content management thread</a>; the <a href="http://blog.technologyofcontent.com/2009/08/cms-technology-choices/">first part</a> was more about the technical decisions, this one more about the architecture, and responding to some of the other issues. I have a new post which is more of a general view on <a href="http://blog.technologyofcontent.com/2009/10/content-applications-briefing/">content enabled vertical applications</a></em></p>

<p>Stéphane Croisier in <a href="http://stephanecroisier.jahia.com/new-blog-post-what-is-the-future-of-content-m">his post in the thread</a> says</p>

<blockquote>
  <p>There is currently an unclear separation between applications frameworks and content infrastructure. But at the end of the day everything is content and every application has first to deal with content items rather than with processes, states, UI components or other application oriented paradigms.</p>
</blockquote>

<p>In my general work in content management I think this is one of the things that has become very clear, and that &#8220;unclear separation&#8221; is very apparent. First content; it is a very good start, and every project needs to be grounded in content, and in the structure of content, the architecture of content and the user IA. However, the processes, states, UI and other parts of the web application are beginning to dominate projects. In the Gartner terminology we are building CEVAs (Content Enabled Vertical Applications that is), as integration, process, e-commerce, CRM parts of the project start to dominate the requirements over the purely content based parts.</p>

<p>It seems that most of the contributors to Julian Wraith&#8217;s future of content management thread who mention it see content management moving to a clear split between repositories (Common Content Information Infrastructure as Stéphane calls them) and applications and content management systems and CEVAs implemented on top of these.</p>

<p>I don&#8217;t think we can yet see what the successful content infrastructure stack will be; as I said in my earlier post there are technical decisions that have to be made that there is not yet agreement on (except between me and <a href="http://blogs.alfresco.com/wp/pmonks/2009/08/07/the-future-of-cms-technologies/#comment-148">Peter Monks</a>!) and the existing putative standards (CMIS and JCR) do not extend far enough to take a position on. But we can see that this is the way things are going. Quite clearly the standards for the infrastructure will be open, and most implmentations will be open source. There will be some vendors who do not embrace standards, but they will need to be the few large ones or they will lose out. Infrastructure environments remember (think Linux, Apache) are mainly open source, although there is scope for proprietary layers at the very high end (think Amazon, Google).</p>

<p>At the application layer, as Stéphane says, everything is a mashup, content from different systems, content from other APIs, this is the we application layer. It needs to be content aware, very much so, but it needs to be an application development environment. This is where most people will see the value added in the content management business, although in fact the value here is in implementation, design and integration services, not the technology itself. Application development environments no longer make a lot of money, and again they are dominated by open source (think Java, Eclipse, JBoss, Django).</p>

<p>Once you take out content infrastructure and application development, and the other tools like search, workflow, there is a core of tools for working with content, to support reuse, refactoring, cleaning, import and export, that one might call a Content Workbench. There is a lot of potential value if these types of tools are the value added end of the business, as they can differentiate vendors and add value. Interfaces for merging changes and so on would be part of this type of toolkit. This is the stuff where good UX means timesaving for content workers, but it is difficult to build on a customized per-project basis, so this still offers value from a particular vendor.</p>

<p>Overall then we see a picture where the monolithic CMS starts to break apart into infrastructure, application and toolkit layers, that can perhaps gradually be mixed and matched together to build content applications. We are just seeing the beginnings of this now.</p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>CMS technology choices</title>
		<link>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/#comments</comments>
		<pubDate>Sun, 02 Aug 2009 20:40:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=33</guid>
		<description><![CDATA[Response to Julian Wraith's "The future of Content Management…" post covering some of my arguments with Jon and some technical decisions that the content management community will have to make to get to that future…]]></description>
			<content:encoded><![CDATA[<p>The future of Content Management is what we make of it right now, it has not been decided or built yet. Remarkably for a market with so many people in it there are no hard and fast rules and nothing definitive. However we are coming to the end of the experimental phase and the hard decisions are going to be made now, and the future for a fairly long period will be determined pretty soon now.</p>

<p>Although the vast majority (but not all) of open source content management systems are continually trying to reinvent the blog, we are talking about internet infrastructure here, and the future of content is going to be open source, like the rest of internet development. I also believe that long term the web project will overwhelm the legacy areas of document management, although it may take some time. Hypertext, the web architecture, XML, HTML, and all those standards are here to stay and to dominate long term. Content management will also become pervasive long term, as the blogging projects show, as the right tools make content management a natural part of workflow. Content management succeeds when it replaces the file and folder paradigm with a content-led paradigm.</p>

<p>In my conversation the other day with <a href="http://jonontech.com/2009/08/01/i-have-a-dream-of-the-cms-future" title="Jon's post on the subject">Jon</a> I was arguing that although we agree on many of the technical issues there are real decisions that need to be made about what needs to be built to get the the content management future. Below are some of my lists of differences. Generally, I think the future of content management is going for the left hand one of the pairs, although some are not clear yet. I have probably missed a lot of the things to determine, but it is a start.</p>

<h2>Architecture &ndash; API differences</h2>

<p>These may cause API and other more significant differences, though some may not matter (eg git can read svn repos, but not vice versa).</p>

<ul>
<li>REST vs SOAP</li>
<li>REST vs Java native interfaces</li>
<li>distributed version control (git) vs file based (SVN)</li>
<li>compositional vs monolithic</li>
<li>structured content vs files</li>
<li>relations vs metadata</li>
<li>web (hypertext) content vs documents</li>
<li>URIs vs referential integrity</li>
<li>web applications with content management vs content management systems</li>
</ul>

<h2>Architecture &ndash; performance differences</h2>

<p>These could have different implementations with different performance characteristics potentially. These are basically IA differences to a large extent, so they do depend on the type of problem being modelled and the modelling process. Models and performance are linked though, and the best we can do is to make parts of this pluggable so that a range of performance characteristics can be used.</p>

<ul>
<li>unstructured vs structured</li>
<li>sparse vs dense</li>
<li>untyped vs typed</li>
<li>NoSQL vs RDBMS</li>
<li>permission hierarchy vs permission graph</li>
<li>scaleable vs local</li>
</ul>

<h2>Development process</h2>

<p>This is key to getting the product to where you want it to be.</p>

<ul>
<li>open source vs proprietary</li>
<li>API driven vs UX driven</li>
<li>ubiquitous content management vs isolated systems</li>
<li>agile vs monolithic</li>
</ul>

<h2>Architecture &ndash; usage differences</h2>

<p>These could potentially just come down to the ways or tools with which components are joined together, maybe they do not affect architecture per se.</p>

<ul>
<li>social media vs controlled content</li>
<li>programming languages (Javascript, XSLT) vs templating systems</li>
</ul>

<p><a href="http://browsertoolkit.com/fault-tolerance.png"><img src="http://browsertoolkit.com/fault-tolerance.png" alt="fault tolerance" width="500px"/></a></p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
