<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; CMS</title>
	<atom:link href="http://blog.technologyofcontent.com/tag/cms/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 25 Apr 2010 21:45:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Trends in content management 2010</title>
		<link>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/</link>
		<comments>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:05:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=226</guid>
		<description><![CDATA[This is a an overview of the medium term trends in content management, from a mostly technology point of view.

Standards


repository
feeds
CMIS
JCR
terminology, ways of thinking, industry model


Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer focussed. This has changed rapidly, with the [...]]]></description>
			<content:encoded><![CDATA[<p>This is a an overview of the medium term trends in content management, from a mostly technology point of view.</p>

<h2>Standards</h2>

<ul>
<li>repository</li>
<li>feeds</li>
<li>CMIS</li>
<li>JCR</li>
<li>terminology, ways of thinking, industry model</li>
</ul>

<p>Standardization has really started to affect the content management industry. The industry was very immature, a bit of a landgrab, and not very customer focussed. This has changed rapidly, with the wide adoption of the JCR standards, but particularly with process around CMIS. What is being set now is the model of the industry for the next five years, what the customers expect and what the products will deliver. Setting the agenda matters, and now is the opportunity to participate.</p>

<h2>CMS as a platform</h2>

<ul>
<li>build applications on a content platform</li>
<li>API driven development</li>
<li>SOA</li>
<li>embed code everywhere in domain level scripting languages</li>
</ul>

<p>A content management system is at last becoming less of a product that lets you do some stuff and more of a platform for working with content and building content centered applications and a service oriented world. Pervasive invasion of scripting languages such as Javascript into this is coming. The web programming model of pervasive agile scripting and rich REST APIs is going to be the norm, not large scale Java programming or application specific templating languages.</p>

<h2>Co-opetition and community</h2>

<ul>
<li>collaboration on standards, infrastructure</li>
<li>open source as community</li>
<li>twitter, blogs, enterprise 2.0</li>
<li>end of NIH</li>
<li>customers are community too</li>
</ul>

<p>In the last year especially the landscape of content management as a community has changed. First through the standards processes, particularly CMIS and JCR, and then through social media, particularly twitter, as well as via events and blogs, there is now a growing cross vendor technical content management community, particularly with the open source players, and joint projects, for example with CMIS. This is in addition to the developer communities that are strongest around the open source products, although the .net products are trying hard to build around the Microsoft developer relations model. And of course the community of customers, who are becoming more vocal.</p>

<h2>Rich content</h2>

<ul>
<li>richer xhtml and xml</li>
<li>enhanced metadata; richer metadata in other formats</li>
<li>constraints not just validation</li>
<li>RDF and semantic web, linked data</li>
<li>relations and IA expressed in metadata</li>
<li>enhancement via deeply integrated search</li>
<li>document management, DAM and WCM converge</li>
<li>richer presentation layers, richer APIs</li>
<li>Flash is dead, plugins are dead, HTML5 is winning faster than anyone thought</li>
</ul>

<p>As we have moved from document management, where the focus was on whole documents, to web content management, which is more component and assembly based, there has been a gradual push to do more with the documents. Standardized rich document semantics are after all one of the main advantages of web documents. It is taking a while but making use of the potential here is beginning to happen, now we have <a href="http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html">Google indexing rich snippets</a> and even <a href="http://rdfa.info/2010/01/20/uk-retail-chain-tesco-adopts-rdfa/">Tesco using RDFa</a>. There is a lot more standardization work to do here.</p>

<p>In the front end the aim of this backend information enhancement is to build richer interfaces more easily, and to enhance findability, search and navigation, as well as to enable repurposing, richer APIs, and linked data. Authoring is the biggest challenge, as the majority of users need to be given interfaces that are independent of the IA, simple to use, but support generation and modification of complex data structures.</p>

<h2>SAAS and the service business</h2>

<ul>
<li>cloud</li>
<li>internal delivery in a SAAS way</li>
<li>devops</li>
<li>APIs and standardization forced by SAAS</li>
<li>changes to customer service model</li>
</ul>

<p>Software as a service models are winning because no one wants to buy software as a product any more. I will cover more of this in another article I have been working on for a bit, but the main point is that enterprise software is a paid big ticket product is dead. The replacements are open source software and SAAS. These are not alternatives though, as people want the open source software delivered as a service, albeit maybe a more commoditized one if there are multiple providers, and many of the SAAS products delivered will be largely built of open source components by companies that run a mixed model. Microsoft is <a href="http://www.theregister.co.uk/2010/03/04/ballmer_on_azure/">going headlong into cloud</a> in a way that redefines what the operating system is. Even purchased software will be delivered in internal clouds.</p>

<p>This changes both how code as written and administered, with the <a href="http://lethargy.org/~jesus/writes/a-job,-a-mission,-a-career-all-without-a-path-or-a-name.">web operations</a> joining up into rolling delivery and creating the emerging field of <a href="http://www.devopsdays.org/">devops</a>. Developers need to understand operations and how to build code for this environment.</p>

<p>The service business as a business is different from the product business. Open source companies have got that better than product based vendors, but the less there is lockin the more key these changes become. The <a href="http://www.interwest.com/software-as-a-service/on-demand/vp-of-customer-success-critical-to-the-saas-business-model/">success of the customer using the services becomes the key business driver</a>.</p>

<h2>Performance and scaling, real time</h2>

<ul>
<li>cloud has pushed scale up out of picture</li>
<li>scale out transparently</li>
<li>new technologies beyond RDBMS that fit CMS </li>
<li>dynamic generation becoming the norm; Google pushing the performance thing; the industry norm of 100ms will fall</li>
<li>real time becomes more important &#8211; dynamic updates, forget crawling, Google is going push</li>
<li>backend: queuing (0MQ, AMQP)</li>
<li>frontend: websockets, XMPP, long polling </li>
</ul>

<p>Just buying big hardware for scale up is really becoming difficult; the web vibe has always been to scale horizontally on commodity hardware. There is a lot of development around scale out <a href="http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/">technologies such as NoSQL</a> which fit into the WCM data models, which are those of the web after all.</p>

<p>As well as scaling for volume, latency and real time are becoming key. Google&#8217;s time to crawl has been falling rapidly, to a few or less, but is <a href="http://www.readwriteweb.com/archives/google_developing_real_time_index.php">moving to real time</a> with push updates. Twitter has really pushed the boundaries of expectation for real time. Behind the scenes there are a lot of technologies for efficiently pushing around notifications and events both at the backend and on the frontend. Real time is going to become increasingly pervasive.</p>

<p>Page generation times will need to fall; the standard industry benchmark of 100ms per component will probably need to be halved; overall total times under 1s will become the norm.</p>

<h1>Security</h1>

<ul>
<li>web increasingly hostile</li>
<li>every bug is a potential security issue</li>
<li>security focussed on fewer areas, push into the OS not out to applications</li>
</ul>

<p>I read the excellent <a href="http://lwn.net">Linux Weekly News</a> every week, and every week there are <a href="http://cwe.mitre.org/top25/">security exploits</a> for many pieces of software; one that really struck me recently was the <a href="http://www.h-online.com/security/news/item/Possible-backdoor-in-the-e107-CMS-913588.html">major exploit against the CMS e107</a>. What happened here was the a group of crackers found a serious security flaw in the CMS, which they began attacking systematically. When the patch was released however, they already had control of the developer&#8217;s website via the flaw, so they replaced the patched version of the code with a version with a backdoor. Hacked websites are a vital part of the underground <a href="http://www.securitytube.net/Phishing-%28Evil-on-the-Internet%29-FOSDEM-Talk-video.aspx">online crime scene</a>, and a content management system is a high value target. Expect much more of this, and be prepared.</p>

<p>Narrowing the security into fewer points of vulnerability, sandboxing, using every available facet of the operating system&#8217;s security layers; make the most of processes, permissions, everything that you get there; I <a href="http://blog.technologyofcontent.com/2010/01/scaling-security-and-architecture-in-2010/">wrote more about this in an earlier post on emerging trends</a>. File format parsing is another area of vulnerability that is common.</p>

<p>It is war out there on the internet, and many people underestimate or ignore the issues, and too many programmers do not code defensively by habit.</p>

<h2>Summary</h2>

<p>It is an exciting time in web content management right now; the industry is growing up beyond its beginnings as a way of getting web sites up, towards being the core of the broader content management industry. The choices made now will shape the industry; the next generation of products will be a big step forward forr the industry.</p>

<p><a href="http://dilbert.com/strips/comic/2009-07-26/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/1000/700/61747/61747.strip.sunday.gif" border="0" alt="Dilbert.com" width="450"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/03/trends-in-content-management-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>NoSQL and content management</title>
		<link>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 23:34:15 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[data modelling]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=216</guid>
		<description><![CDATA[I went to many of the first ever NoSQL devroom talks at FOSDEM this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from [...]]]></description>
			<content:encoded><![CDATA[<p>I went to many of the first ever <a href="http://nosql.mypopescu.com/post/385372130/your-chance-to-review-the-fosdem-nosql-event">NoSQL devroom</a> talks at <a href="http://fosdem.org">FOSDEM</a> this year. For anyone who hasn&#8217;t been, FOSDEM is a great place, and the NoSQL room was well organized and full of interest. The term NoSQL is not even a year old; I first came across CouchDB around a year ago from memory; Tim Anglade gave an excellent introduction where he reminded people of the historical roots, both before relational databases and since then; so not new but there is a renewed focus now. Why is that? I am going to look here at the field of content management and why you might be interested in different data models if that is your problem space, based loosely on some of the ideas from the talks at FOSDEM. There was a talk about <a href="http://outerthought.org/blog/blog/353-OTC.html">content management specifically and the Lily CMS by Evert Arckens</a> although I missed it, but I have added some comments after watching the video.</p>

<p><a href="http://www.flickr.com/photos/justincormack/4375594326/" title="FOSDEM by Justin Cormack, on Flickr"><img src="http://farm5.static.flickr.com/4029/4375594326_7ebdafd796.jpg" width="450"  alt="FOSDEM" /></a></p>

<h2>The data model for content management</h2>

<p>I have another draft post on this subject in more detail, which I am working on as parrt of my REST modelling in content management work, but I will outline some of the types of data relations that are important. I will be quite abstract here, if you want more concrete examples you will have to wait for the other post: database models like the ones we are talking about here are more easily understood in the abstract I think.</p>

<p>First we our unit of modeling. This in itself is the first issue. Content management tends to deal with, at the conceptual level, something that looks like a document. It may be a fragment, in the sense that it is say a page component (asset if you use that terminology) rather than a whole item, but the unit for the user to edit and which is usually versioned is a structured object itself. The processing model tends to treat it as almost of binary blob, except that certain properties can be extracted, such as metadata, links in HTML and so forth, but it is stored as an item rather than decomposed further.</p>

<p>OK, so we have a piece of content and some attributes extracted from it as one basic model. This corresponds pretty much to the JCR data model for example. There are variations; sometimes people do not store metadata in the file formats, as historically many file formats had poor support for arbitrary structured metadata, although that is largely obsolete now, and the advantages of actually storing metadata and relations substantially within documents are high. External storage does not change the model much, just complicates processing and storage. Another variant, often seen in document management systems is to be able to have multiple &#8217;streams&#8217; ie several document variants rolled into one, for example a video and a still from it. You can however from the modelling point of view regard these as anotehr compound document format kept together because conceptually they are a bundle of content; you might distribute them as a zip file if you havent got any other suitable container format.</p>

<p>So now we have a storage model where we have a blob, with rich media operations on it, and extracted structural and metadata information. There is also versioning to consider, but let us ignore that and treat it either as part of the blob, or as a new document with some relation to the old ones, those being the two core versioning models, this does not really affect anything else.</p>

<p>There are two kinds of metadata, although they are more similar than they appear, properties and relations. Properties are the standard attributes (this picture depicts sheep), while relations join two items in the repository (this is a cropped version of this other picture). Although this distinction seems clear, in the end richer information architectures demand that everything becomes a relation, so I can browse a sheep node and find all the sheep items, turning every attribute value of any significance into a node with relations instead. Pure attribute values are only left for the less interesting properties (this PDF file is 176k in size).</p>

<p>They are also less interesting from a relational versus non relational storage point of view, although there is one important point, which is the dense versus sparse question, so let us take a look at this. Most real world attributes are sparse, that is most attributes aare not set on most items. In the relational model we have a row for our item, and columns for all the attributes, so we are saying most are NULL. (I was brought up on matrix algorithms and still think in terms of sparse versus dense matrices as this is exactly the same problem, and matrices represent graphs anyway). Storing huge mainly null tables is not very efficient, so there are two common practices in relational mapping of attributes in content management systems. First is to define a type based system, where a particular type of content item is defined to have certain attributes (or at least fewer NULLs!), and each set of that type therefore can have its own table which is assumed to have fewer NULL values. Mixins, sets of properties that live across types can potentially be added to this model, as can inheritance schemes, but the basic idea is one table per type. This gives a nice simple direct database programming model, and causes a complete nightmare if you ever want to change the schema, for example add an attribute, as for any large database most DBMSs will effectively shutdown the system while a schema change takes place, as schema changes require pretty much all locks. <a href="http://www.silverstripe.com">Silverstripe</a> is one example of a content management system built like this; there are many others.</p>

<p>The alternative is the <a href="http://en.wikipedia.org/wiki/Entity-attribute-value_model">entity attribute value</a> (EAV) model (terrible Wikipedia article, please fix), where rather than a direct mapping of the attributes to relations, you indirectly map, creating a table that joins entites, attributes and values; this table of course looks just like RDF triples. Doing this though loses everything that makes a relational database useful: constraints, typing, query optimization. It adds an extra layer of logical schema above the physical schema which the database layer does not understand. This is a pretty common relational mapping for content management systems, as it allows full flexibility in defining and redefining attributes. To implement well it needs a large mid layer to manage the constraints, provide an API layer, generate efficient queries, effectively to manage the logical layer to physical layer map. The <a href="http://drupal.org/node/82661">Drupal CCK</a> is an example of this model.</p>

<p>Of course this is not to say that neither of the two relational models do not work. The direct mapping works well with simple, unchanging content types in small websites, for example, or in models where attributes are not very sparse, or the sparseness is worth the overhead, and changing the schema is rare. EAV works well too, if managed carefully; it helps if the type of queries required on the model are not too complex.</p>

<p>Once you add relations as well as attributes, the already difficult mapping layer gets harder; you add another set of operations (recursion to handle tree structures) that the relational model does not handle well, so you may need to add more into the mapping layer. The promise of NoSQL is that you can bypass this for these types of applications, and program directly to a database model that handles sparse attributes and relations natively. But how much do the NoSQL databases get you? You can argue that if you are already looking at EAV, then you are already not getting much from a relational database, and you are building a modeling layer on top of it, so dropping that and going for something that maps the logical data layer directly does make sense from a development point of view. Whether that really helps performance is less clear; much of the original work for NoSQL has come out of huge scaling, big problems, not actually providing efficient solutions to the types of data mapping problem we are seeing here on a medium scale; of course for huge sites there may be benefits.</p>

<p>The types of NoSQL database vary in their level of support for attributes and relations as they are used in content management. Document oriented databases do not give you much more than retrieval of content items; associative ones give key value type attribute lookups; graph databases should let you query relations directly, expressing the types of queries that are needed for information architecture problems directly, in principle. Examples I am thinking of are things like tag clouds, which is simple to express as a graph problem as it is simple a count of the number of edges from a set of nodes. Indeed most information architecture problems look like graph problems, and also like <a href="http://en.wikipedia.org/wiki/OLAP_cube">OLAP processing operations</a> which also do not work well on relational databases. And of course one of the things that NoSQL has shared with OLAP is the use of denormalization; you can use simpler models if you denormalize data to match the queries you will be using, rather than assuming that the types of query you will use can necessarily be optimized and made efficient by a general purpose system.</p>

<p>Denormalization is not without its difficulties, although arguably it could become a tool embedded in databases like indexes are now. One of the issues with NoSQL is most of the database systems leave denormalization to the user: you need to use it because joins are not available, but you have to manage that yourself. Building an infrastructure to explicitly manage denormalization as a first class database item akin to an index might be interesting. So that gives us a first issue, as in any NoSQL system except a graph database we will either need to denormalize or compose queries to get the results we want.</p>

<p>So I think there are four realistic models for content management backends going forward:</p>

<ol>
<li>The direct relational model for small systems with simple data models, rare attribute changes, little or no use of relations.</li>
<li>EAV models wrapped in a content modeling layer; JCR is an example of this, hiding the underlying SQL layer very well, and indeed allowing it to be replaced with another underlying storage model potentially; I am sure someone is testing a Neo4J backend somewhere. This is where most production solutions are at now.</li>
<li>Direct, nondenormalized graph database backends, with the raw content stored in a document store. Cuts out a special purpose middle level by mapping the domain more directly. As <a href="http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html">Emil Neo</a> says, it may not scale right up as far as the othe NoSQL technologies, but it cuts complexity of implementation; there are also issues about whether all the kinds of queries required are available efficiently. I think this will be the sweet spot in a few years once the products mature and we see more open source activity in the field. Of course RDF based solutions, for example using SPARQL fall into this category too, and the maturity of products around these technologies will help drive this category as well as the NoSQL models.</li>
<li>Big, denormalized systems, probably with software support for managing the denormalization, and using underlying simple but scalable technologies like key-value stores. These already exist in large scale web applications, but may remain niche if the development effort remains high. If frameworks for modelling more easily on these turn up they may trickle down for performance reasons even on smaller datasets; a key value store runs fine on a relational database backend, although the types of processing required probably means a specialized backend is useful.</li>
</ol>

<p>Note that the <a href="http://lilycms.org/">Lily CMS</a> which there was a talk about fits very much into the fourth option above; this is where the NoSQL technologies have perhaps seen most use, but I think there will be a lot of work in order to build a CMS like this now, in particular in terms of tools to support denormalization strategies that are needed. The outlined approach sounded much like the outlines I have been thinking about for this type of model, although I would focus more on tooling for denormalized queries and less on scaling other parts like full text search right now. It will be interesting to follow the progress of this project.</p>

<p>We are at an interesting juncture, where it looks like there are some options that will let us do domain modelling in a way that corresponds more directly to the domain, but there are a lot of interesting challenges on the way.</p>

<p><a href="http://dilbert.com/strips/comic/2008-02-12/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/800/1869/1869.strip.gif" border="0" alt="Dilbert.com" width="440"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/02/nosql-and-content-management/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Content microarchitecture: How I learned to love HTML part 2</title>
		<link>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/</link>
		<comments>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 16:34:24 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[html]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=145</guid>
		<description><![CDATA[I posted recently on an unfinished series by Daniel Jacobson, which was perhaps slightly unfair, so I thought I should write a followup to his final part.

My argument was mainly that storing flat, unstructured data was not enough for most content projects, and the difficult questions of structure needed to be addressed. Daniel&#8217;s third part [...]]]></description>
			<content:encoded><![CDATA[<p>I <a href="http://blog.technologyofcontent.com/2009/11/id-love-to-stay-here-and-be-normal-but-its-just-so-overrated-or-how-i-learned-to-stop-worrying-and-love-html/">posted recently</a> on an unfinished series by Daniel Jacobson, which was perhaps slightly unfair, so I thought I should write a followup to <a href="http://blog.programmableweb.com/2009/11/11/content-portability-building-an-api-is-not-enough/">his final part</a>.</p>

<p>My argument was mainly that storing flat, unstructured data was not enough for most content projects, and the difficult questions of structure needed to be addressed. Daniel&#8217;s third part addresses how they do this. Actually when I first looked through the NPR content after reading the first two articles I could not find any content that had inline HTML, clearly was an accident should have read more, as it is used.</p>

<p>The actual NPR process is interesting. In particular it shows the amount of care and attention in curating content that is needed to keep it reusable, repurposable and valuable. Semantic content requires you understand the range of meanings that it encodes and how to work with them, and transform them, and quality control them. And that means you need to know every tag and attribute that is going into the system and what that means for every output you are or may using.</p>

<p>One of the outputs that many people forget is plain text, and NPR is very clear on the writing style that is necessary for writing for HTML and plain text. Everything must make sense in the text form; links should be additional information that adds to the text not necessary to understand it. And of course no &#8220;click here&#8221;. Text output for other devices may vary between the expressiveness of HTML and that of plain text. Text that reads sanely also helps screen readers and other assistive technologies make the content understandable.</p>

<p>The key points here are that content markup must be</p>

<ol>
<li>Valid. Processing is likely to be inaccurate without valid content, and tools will be more limited in how the process it, or will fail unexpectedly. Best fix this at the beginning of the pipeline.</li>
<li>Meaningful. You need the markup to mean what the author intended, so look at interface usability and training.</li>
<li>Accepted. You do not have to accept all valid XHTML, say. For a start, XML is an extensible language! You can choose a whole range of markup for a story, from the very minimal, to marking up each person and place involved, or more.</li>
<li>Stored. The marked up text must be stored; the NPR decomposition is plain text plus plus normalized markup which may work for some systems; storing marked up HTML without output transforms may work out better for others.</li>
<li>Processed. You need to handle each kind of markup for all output mechanisms, so they need to be introduced in a controlled way, although this should not be difficult. Changing markup is something that may need to happen.</li>
</ol>

<p>I think the third part is the interesting one. Information architecture with websites often stops at the content level, missing out on this information microarchitecture of the textual content itself, leaving this to authors without enough guidance to build a consistent structure to maximise the long term content value.</p>

<p><a href="http://www.teara.govt.nz/en/fossils/7/1"><img src="http://www.teara.govt.nz/files/p9047niwa.jpg" alt="Fossil foraminifera" width="100%"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Wave experiment: Things We Hate About Content Management</title>
		<link>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/#comments</comments>
		<pubDate>Sat, 24 Oct 2009 13:33:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Wave]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=135</guid>
		<description><![CDATA[Experiment with writing in the wave.]]></description>
			<content:encoded><![CDATA[<p>Well, that was the story, six people in content management writing a blog about stuff using Google Wave. Mostly for the first time I think; something to do with those fresh invites.</p>

<p>Other links are here: <a href="http://jonontech.com/2009/10/23/a-collaborative-google-wave-blog-post/">Jon Marks</a>, <a href="http://irinaguseva.wordpress.com/2009/10/23/things-we-hate-about-content-management/">Irina Guseva</a>, <a href="http://www.persuasivecontent.com/i-predict-a-cms-riot-1-hour-6-people-1-wave">Ian Truscott</a>; other participants Adriaan Bloem, Andrew Liles, <a href="http://contentedmanagement.net/blog/bove-the-contentious-waves-he-kept/">Philippe Parker</a> (first use of Wave over GotoMeeting?)</p>

<p>Well it was fun. Technical difficulties, lost sync and crashed a few browsers, some people lost whole machines though. Safari coped better than Firefox. It took a while to realize what was happening here, hey but this is in beta!</p>

<p>As a brainstorming tool at worked pretty well. I thought it scaled pretty well. The named cursors indicate who is in the bit you are in, but for brainstorming you can look, write another point, move, continue, not edit much. After half an hour of getting to bulletted lists, a bit of moving around the heavy writing started (after a discussion at the top in our proxy process section; we should have split the thing up a bit).</p>

<p>There is a great tendancy to write temporary notes about the discussion and then just delete them. Which feels odd, data and metadata together of course. The editing process was odd, you would find orphaned bits, move things, try to join stuff up to make it flow, while it was all changing around you. Pretty chaotic. Bits that no one expanded into prose got junked (quite a good edit method, as they couldnt stand up themselves).</p>

<p>Here is the &#8220;finished&#8221; article&#8230; which cannot be attributed to anyone individually of course&#8230; the subject was chosen about 10 minutes in, just as something people would have something they could easily contribute into this situation, there are some good points in there though!</p>

<p><strong>Things We Hate About Content Management</strong></p>

<p><em>- By The Motley Crew</em></p>

<p>It was a lovely Friday morning/afternoon, and we were Waving. The experiment initiated by McBoof (yes, that one) brought together 6 CMS folks from around the world. The event gathered together analysts, journalists, vendors, system integrators to Wave on a topic that was decided at that very moment. We had one hour (in between conference calls and other job thingys) to pick a topic and Wave it.</p>

<p>A little collab on what exactly to Wave about later, we decided to do &#8220;a mindmap of things we find annoying in CMSs.&#8221; To up the ante, we also decided to take the original bullet points (deemed &#8220;too easy&#8221;) and convert the whole thing to prose. Was the tool given really up to the task? Were our minds flexible enough to wrap around this kind of realtime collaboration?</p>

<p>In the beginning &#8212; we blame the tool <img src='http://blog.technologyofcontent.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  &#8212; we were Drowning, not Waving. We (almost) didn&#8217;t fight about edits. We almost didn&#8217;t step on each other&#8217;s toes. All in all, it turned out to be a fun and productive collaborative exercise. Read on to see for yourself.</p>

<p><strong>Cosmetic Issues</strong></p>

<p>There really should be a CMS UI fashion police. As there should be a Magic Quadrant for shoes and handbags. Why? Well, there&#8217;s a couple of issues.</p>

<p>For instance, sloppy, non-designed design. You know the kind of thing that has not been thought about and reworked and made to feel right. The sort of thing coders do if you don&#8217;t force them. But at the same time, over-designed interfaces can be just as bad: the designers and developers really need to be on speaking terms.</p>

<p>When building a system that works, you can&#8217;t have the development team in the basement on a sustenance of Jolt coding away into the night, and the designers in the penthouse in turtleneck sweaters sipping espressos. Too many CMS designs end up being programmer vs. end-user friendly. And this is not the best way to charm away those marketing and web content folks.</p>

<p>Developers and designers need to talk to each other and essentially, both should talk to users &#8211; not just eat your own dogfood &#8211; but listen to what dogs like to eat. A developer or UI designer are not content editors, marketers or knowledge and information workers.</p>

<p>Some vendors say that the agonizingly and depressingly black UI backgrounds are hip and modern. Well, they are not, really. Who told you that? Especially if you add a Star Trek theme to it and sprinkle in some stars and cosmic swirls, because if Apple does it, it must be cool right? Not pointing any fingers, but I would quit if I were a content manager having to spend my 9-5 staring into the &#8220;black hole&#8221; of some of the CMS UIs that are out there on the market.</p>

<p>Even pop-ups seem less annoying when compared to dark UIs. Which brings us onto&#8230;</p>

<p><strong>Interface Issues</strong></p>

<p>Interfaces need a comfortable lived in feel. Content management is something people work with every day, it is their interface to their job. You meet people who hate the interface, and that makes their work a heap of pain. I have seen people who describe the 44 clicks it takes to insert an image. You have a responsibility to these people, to make them love the content and make the tool disappear.</p>

<p>We all hate it when the interface does something on its own that ruins your context. E.g. a page refresh, or in Wave the jumping around of the scrolled window in some cases <img src='http://blog.technologyofcontent.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  Or the lack of an easy way to bookmark, so you can reference someone to the content. Remember people will be collaborating and need to send links around. Make sure the UI is a proper web application with URLs. And why do tasks that are easy to describe and often repeated in exactly the same way still take more than a few clicks? (Or maybe even dozens of clicks.) With bonus points for forcing users to use dialogs or tabs to enter mandatory information. Remember people do not have all the information in the right order.</p>

<p>Also, we need sane conflict merges. Check in and check out is too extreme for most uses. But people want to edit offline still. Of course Wave doesn&#8217;t have an offline: Google thinks this problem is going away, it&#8217;s real time so there are never conflicts (that&#8217;s defined in the XML protocol; it&#8217;s quite interesting if you are that way geeky). Does Google have the right answer here? Well, the Motley Crew is struggling here, and some browsers lost sync during this experiment.</p>

<p>&#8220;Power users&#8221; (those who use it all day long) of CMSs needed to have a &#8220;Desktop&#8221; experience. What does Desktop Experience mean? Well, it doesn&#8217;t really have to be on the desktop &#8212; these days it is perfectly possible to get very close to a hitherto Desktop experience in a browser or similar. these are qualities: very low latency from action to response, no page refreshes, modal and modal-less dialog boxes as appropriate, &#8220;push&#8221; notification.</p>

<p><strong>Architectural Issues</strong></p>

<p>Architectural issues of the wave overtook any architectural issues of Content Management Systems. The fact that we authored this entire article in a single blip didn&#8217;t help, and slowed everything down enormously. McBoof learned the hard way that he really need a new laptop and spent most of the session giving his machine CPR. Next time we&#8217;ll do each paragraph in its own blip to stop FireFox going down like a Led Zeppelin.</p>

<p>Monolithic systems. Build it out of pieces that the client can not use all of. Obviously your pieces may work together better, but there should be components. Do not try to reinvent all kinds of wheel. &#8220;Best of breed,&#8221; though, is just another weasel marketing idea, as if systems are pinnacles not about meeting requirements.</p>

<p>Marketeers are adroit at using the term Best Practice to position Their Way as the only way that a particular matter can be solved. (Many of us live in that netherland of having to pedal that point of view, but it is a falsehood that the careful buyer should try to see through.) I think this devalues genuine best practice, vendors should cite references</p>

<p>Most often a marketeer&#8217;s Best Practice view is the only one they subscribe to as their product development has paddled up the wrong stream and cannot or won&#8217;t reverse their architectural design (probably because of the cost of doing so). This intransigence most often causes a product to doom itself. (Think of IBM and The Mainframe Is The Only Way To Do Serious Business).</p>

<p>Who really still believes that there is a place in this world for Flash or Java Applet based Rich Text Editors? TinyMCE, FCKeditor and others are filling the gap left by Ektron when they bit the hand that feeds and entered the CMS market. Ephox is trying to spread, but I find it difficult to come up with an excuse to use an Applet over HTML with javascript these days. Stick with the standard.</p>

<p><strong>Business Issues</strong></p>

<p>Where you are buying into something that you may very well need to change or integrate with there is strong benefit in considering Open Source. Open Source used to frighten commercial software companies but we have come along way on that road to understand that commercial organisation can operate in an Open Source world and benefit. This does not necessarily mean that their prized system needs to be fully opened up, but taking the spirit of it to mean that you are completely open to people seeing and learning from your code how it operates.</p>

<p>Exactly what you need to see opened up varies. In a CMS there may be a subsystem that stores the content or one that allows a Rich Text Editor. These arguably don&#8217;t need to be opened up, but when a CMS ships with modules for, for example, an RSS feed widget, calendaring tool, prebuilt webforms, users who then want a variation on this module can benefit from seeing how the &#8220;pros&#8221; did it, they can then use it as a starting point for their own different implementation.</p>

<p>We really don&#8217;t need vendors that pay lip service to the buzzwords. When they think the new CMS buzzword &#8220;engagement&#8221; is just a screenshot of Google Analytics. Or when they add an image picker and call it DAM. And a cross-over between WCM and ECM? Don&#8217;t think WCM is like ECM and it&#8217;s about organizing content, not about effectively communicating with the audience. And don&#8217;t think that if you organize the content, you can automatically communicate effectively.</p>

<p>Completely different, but equally frustrating, is procurement (and the procedures that go with it.) Procurement folk don&#8217;t recognise the importance of user adoption to the success of the project &#8212; of the black background and all the UI issues pointed out previously. If a CMS is procured according to procedure, the selection is a success to them. But those same rules are often a recipe for ignoring what the users really need.</p>

<p>At the same time, budgets that aren&#8217;t transparent are an issue &#8211; customer and vendor should be able to have a sensible grown up conversation. As a customer, of course you want good value, but how cheap are you? But to vendors: many licensing models don&#8217;t make any sense, and force you to do stupid things. People are scared to have that conversation &#8211; the best architectural fit first I say, lets figure out an appropriate license around that.</p>

<p><strong>Conclusion</strong></p>

<p>So much hatred rolled up into a tight little ball of anti-CMS rage. Who would have expected it from such a respected bunch of CMS folk. We hate the designs, the interfaces, the architectures and the business. Time for a beer/wine? Wave good bye!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/wave-experiment-things-we-hate-about-content-management/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>RESTful daydream #4</title>
		<link>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/</link>
		<comments>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 15:34:41 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[CMIS]]></category>
		<category><![CDATA[jcr]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=110</guid>
		<description><![CDATA[In favour of a REST architecture for a web content repository]]></description>
			<content:encoded><![CDATA[<p>This blog post has gone through far too many iterations, and taken far too long to write! It got much shorter in the process though.</p>

<p>It started with an idea I had, in an innocent sort of way. I thought if I looked at the JCR specs for a bit I might find some kind of way of building a non Java interface with them. You know, maybe there might be a nice REST architecture waiting to get out. But of course there is no such thing. It is an application definition. There are not even that many ways of implementing it, other than choosing your object persistence method to be a database, file, or something else.</p>

<p>The REST architecture is notionally provided by another layer, such as Apache Sling, but Sling is in no way a REST layer, it is a URL dispatcher and scripting and application layer with which some REST style applications can be developed. With that you end up with a pretty heavyweight development framework, indeed together you have much of Day&#8217;s CMS offering in effect, rather than a lightweight REST repository solution.</p>

<p>I had a look at CMIS again. Fielding once <a href="http://roy.gbiv.com/untangled/2008/no-rest-in-cmis">laid into CMIS for not being REST</a> and you can see why, although some improvements have been made since that. Although resources are discoverable through hypertext, there is a fair amount of semantics that needs to be known to understand what a type or a checkout means, and the search queries are obviously just RPC wrappers. It is not too bad though, but unfortunately the data model does not map well onto web content management right now for obvious historical document management reasons. Fixable? I think it serves a part‎icular purpose well and should probably not be forced into anything else, as we need it to succeed in its field.</p>

<p>Day claims that JCR is <a href="http://dev.day.com/microsling/content/blogs/main/fudbusting2.html">not a Java standard</a> in an odd way, that you can implement the API in another language. Thats a strange argument to make, especially as the types are defined as Java types, and standards without interoperability are pretty vague. Without some sort of wire format or ABI this is meaningless outside the JVM world. People are making <a href="http://www.simpcore.org/">JCR like repositories in PHP</a> but outside any standards process, so in the end this just becomes a PHP repository project; Typo3 seems to be building another, also closely aligned to JCR.</p>

<p>The problem with these efforts is that it is not helping the balkanization of web CMS, which is already fragmented by language and API, which is ridiculous in an industry that is about the web. The web has an architecture (REST) and an API (HTTP). Building web content management on Java APIs or PHP APIs or .NET is a legacy way of thinking; it is acceptable for document management given its role in existing enterprise architectures, but it is not going to work if we want to get widespread acceptance in web development; in the short term it is the easy path, it is what people are used to, but a forward thinking industry needs to look at defragmenting the landscape and building future proof tools.</p>

<p>The odd thing is that a web content repository alone surely lends itself to a simple REST architecture. Content is after all lots of small resources with relations. Hypertext. It is pretty much in presentation a fairly dumb web application, although with a fair amount going on behind the scenes. It takes content, relates it to other content, and serves it back, with authentication and versioning. Everything else is in other system layers, transforming it and so on. Not simple, but well defined; lower level than JCR + Sling say</p>

<p>So we need to work on a web content repository model, as a community. Process wise, it makes sense for this to sit in an organization like AIIM, as a content management based industry body. It may well be that what ends up coming out of this is more standardized architectures and semantics and open source implementations rather than the tighter prescriptions of JCR and CMIS; I have some ideas along these lines that I need to code up. I have had some discussions and there is a degree of interest in some sort of solution; who is interested? Or is infrastructure dead, everything ust wants interfaces?</p>

<p><a href="http://dilbert.com/strips/comic/2009-09-02/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/6000/400/66480/66480.strip.gif" width="480" alt="Dilbert.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/10/restful-daydream-4/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Metadata is not what it used to be</title>
		<link>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/#comments</comments>
		<pubDate>Mon, 31 Aug 2009 14:54:05 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[metadata]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=95</guid>
		<description><![CDATA[
  &#8220;Spent a week in a dusty library waiting for some words to jump at me&#8221; Camera Obscura.


Kas Thomas in his contribution to Julian Wraith&#8217;s popular thread on the future of content management really managed to make me disagree, the first of the posts that has!

The theme of this blog (yes it has got [...]]]></description>
			<content:encoded><![CDATA[<blockquote>
  <p>&#8220;Spent a week in a dusty library waiting for some words to jump at me&#8221; Camera Obscura.</p>
</blockquote>

<p><a href="http://www.cmswatch.com/Trends/1679-Future-CMS-Metadata">Kas Thomas</a> in his contribution to <a href="http://www.julianwraith.com/?p=328">Julian Wraith&#8217;s popular thread on the future of content management</a> really managed to make me disagree, the first of the posts that has!</p>

<p>The theme of this blog (yes it has got one) is that content management is changing as the web way of working starts to infiltrate the enterprise. And the web way of metadata is not what the old document oriented way was.</p>

<p>Kas says &#8220;Keeping knowledge about a file separate from the file itself is a hugely important concept.&#8221;</p>

<p>Looking at that point first. Look at any newish document format, from <a href="http://en.wikipedia.org/wiki/PDF/A">PDF/A</a> to <a href="http://en.wikipedia.org/wiki/EXIF">EXIF</a> via the HTML <code>meta</code> tag, even a Word document and you will find embedded metadata. Remember that documents are emailed around, generally get lost. Metadata is the dog collar with a name and phone number on, saying version me and send me home. This trend is not going away, documents are becoming self contained.</p>

<p>Kas then continues &#8220;A file&#8217;s metadata becomes its interface to the outside world. It&#8217;s like a service descriptor.&#8221; That is simply not the way the web works. Resources use self describing formats like HTML for core data, and then all the other important metadata is linking information. It used to be that you had a blob file type and a program that could understand it, that was the basis for the desktop architecture, but that is not the case any more, even on the desktop you have a choice of applications that can understand a given file type, and your choice depends on what you can do with them. The web architecture goes much further, and resources become fully self-describing, a browser can understand all the web as every resource carries its own description, and bundled code to help you interpret it. A web page is its own service descriptor, and defines application state through hyperlinks. The web architecture has never had service descriptions.</p>

<p>There is one vital part of metadata that is not kept with a file, that is the link. Kas says &#8220;content, on the whole, is becoming richer, less structured&#8221;, missing out completely on the big picture that content is being structured by the imposition of links onto it, by its transformation into a hypertext, that creates a much richer structure than the individual documents have in themselves. Documents contain metadata about other documents in the form of links. The semantic web project is an attempt to add further richness to this structure. &#8220;What does the trend toward richer, less structured content mean for management of content?&#8221; well that means that content management is going to be about managing those links and relations between items, a lot more than it is now, when it came from a background of just managing documents, each an isolated item.</p>

<p>In a way that does come back to &#8220;Keeping knowledge about a file separate from the file itself&#8221; but not at all in the way Kas was trying to argue. Now time to link this to his argument to create a structured discussion&#8230;</p>

<p><a href="http://www.threadless.com/product/1053/Now_That_s_Dope"><img src="http://www.threadless.com//product/1053/zoom.gif" width="450px"/></a></p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Content Enabled Vertical Applications and taking the CMS apart</title>
		<link>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/#comments</comments>
		<pubDate>Wed, 26 Aug 2009 23:00:25 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[CEVA]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=70</guid>
		<description><![CDATA[Part 2 of my response to Julian Wraith's future of content management meme. This part looks at CMS architecture, rather than the technology choices of the first part, talking about application development and content repositories.]]></description>
			<content:encoded><![CDATA[<p><em>This is a second part of my response to Julian Wraith&#8217;s <a href="http://www.julianwraith.com/?p=313">future of content management thread</a>; the <a href="http://blog.technologyofcontent.com/2009/08/cms-technology-choices/">first part</a> was more about the technical decisions, this one more about the architecture, and responding to some of the other issues. I have a new post which is more of a general view on <a href="http://blog.technologyofcontent.com/2009/10/content-applications-briefing/">content enabled vertical applications</a></em></p>

<p>Stéphane Croisier in <a href="http://stephanecroisier.jahia.com/new-blog-post-what-is-the-future-of-content-m">his post in the thread</a> says</p>

<blockquote>
  <p>There is currently an unclear separation between applications frameworks and content infrastructure. But at the end of the day everything is content and every application has first to deal with content items rather than with processes, states, UI components or other application oriented paradigms.</p>
</blockquote>

<p>In my general work in content management I think this is one of the things that has become very clear, and that &#8220;unclear separation&#8221; is very apparent. First content; it is a very good start, and every project needs to be grounded in content, and in the structure of content, the architecture of content and the user IA. However, the processes, states, UI and other parts of the web application are beginning to dominate projects. In the Gartner terminology we are building CEVAs (Content Enabled Vertical Applications that is), as integration, process, e-commerce, CRM parts of the project start to dominate the requirements over the purely content based parts.</p>

<p>It seems that most of the contributors to Julian Wraith&#8217;s future of content management thread who mention it see content management moving to a clear split between repositories (Common Content Information Infrastructure as Stéphane calls them) and applications and content management systems and CEVAs implemented on top of these.</p>

<p>I don&#8217;t think we can yet see what the successful content infrastructure stack will be; as I said in my earlier post there are technical decisions that have to be made that there is not yet agreement on (except between me and <a href="http://blogs.alfresco.com/wp/pmonks/2009/08/07/the-future-of-cms-technologies/#comment-148">Peter Monks</a>!) and the existing putative standards (CMIS and JCR) do not extend far enough to take a position on. But we can see that this is the way things are going. Quite clearly the standards for the infrastructure will be open, and most implmentations will be open source. There will be some vendors who do not embrace standards, but they will need to be the few large ones or they will lose out. Infrastructure environments remember (think Linux, Apache) are mainly open source, although there is scope for proprietary layers at the very high end (think Amazon, Google).</p>

<p>At the application layer, as Stéphane says, everything is a mashup, content from different systems, content from other APIs, this is the we application layer. It needs to be content aware, very much so, but it needs to be an application development environment. This is where most people will see the value added in the content management business, although in fact the value here is in implementation, design and integration services, not the technology itself. Application development environments no longer make a lot of money, and again they are dominated by open source (think Java, Eclipse, JBoss, Django).</p>

<p>Once you take out content infrastructure and application development, and the other tools like search, workflow, there is a core of tools for working with content, to support reuse, refactoring, cleaning, import and export, that one might call a Content Workbench. There is a lot of potential value if these types of tools are the value added end of the business, as they can differentiate vendors and add value. Interfaces for merging changes and so on would be part of this type of toolkit. This is the stuff where good UX means timesaving for content workers, but it is difficult to build on a customized per-project basis, so this still offers value from a particular vendor.</p>

<p>Overall then we see a picture where the monolithic CMS starts to break apart into infrastructure, application and toolkit layers, that can perhaps gradually be mixed and matched together to build content applications. We are just seeing the beginnings of this now.</p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/content-enabled-vertical-applications-and-taking-the-cms-apart/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Resource Oriented Enterprise</title>
		<link>http://blog.technologyofcontent.com/2009/08/the-resource-oriented-enterprise/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/the-resource-oriented-enterprise/#comments</comments>
		<pubDate>Sun, 23 Aug 2009 20:40:58 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[REST]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[ROA]]></category>
		<category><![CDATA[ROE]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=49</guid>
		<description><![CDATA[This is the start of getting a bunch of ideas about how web technologies are going to change the way business works and enterprise software is built. I have been meaning to start writing this up for a while. Here is an initial overview, somewhere to start... follow up post soon.]]></description>
			<content:encoded><![CDATA[<p><em>This is the start of getting a bunch of ideas about how web technologies are going to change the way business works and enterprise software is built. I have been meaning to start writing this up for a while. Here is an initial overview, somewhere to start&#8230; Some of these ideas are floating around <a href="http://www.restfulness.info/">elsewhere</a> it seems, from web practitioners working in the enterprise, but there is surprisingly little written up so far.</em></p>

<p>I went to a  meeting a while back with a company starting to move to a web service based, internal API based architecture, and there was a minute where the CTO said (more or less) &#8220;does anyone know of any let or hindrance to this being a SOAP API?&#8221;. Like those moments at weddings, no one spoke out. But I do object. SOAP is not the backbone of a happy architecture, but we do not have the strength to say no right now.</p>

<p>So here is the beginning of the shout out. The enterprise architecture needs to be a resource oriented architecture (ROA) not a service oriented architecture (SOA). The enterprise needs to move from SOAP to REST.</p>

<p>Web developers know this. When given the choice, <a href="http://www.oreillynet.com/pub/wlg/3005">9 out of 20 cats prefer to use REST</a>. It is more productive. It does not involve programs to generate huge chunks of useless code. It involves hypertext (HATEOS) not opaque documents that are mappings of database schemas.</p>

<p>What are the resources in the enterprise? Start with customers. A CRM system is a clear example of something that needs to be modeled as resources. You need API calls that can find products a customer has bought, support tickets, all the core data that you would need to build customer service portals, internal support applications. What is your customer API?</p>

<p>Building this framework can be incremental. You need either tools that already provide REST APIs, which is becoming easier, or a web application development framework. This is the convergence point between application frameworks and web content management. The big issues are the domination of legacy applications with bolt on APIs that may have moved from SQL to CORBA to SOAP over the years. You can however build a REST API over at least parts of these applications to open them up into the enterprise.</p>

<p>OK thats the beginning. Apart from cheaper application development where is the real value?</p>

<p>Resources are the first part of the REST architecture, and resources are much easier to work with than services, as they share uniform semantics, and they are addressable, two keys to making application design simple. The big bit though is the (harder to understand) HATEOS, Hypertext as the Engine of Application State. What this involves is moving the vague and very expensive field of business logic from code (often code that is not even owned or understood by the enterprise, as it has been embodied into code written into systems from suppliers) into hypertext documents.</p>

<p>A concrete example might help here, based on another recent consulting example. A professional body has a set of different membership levels, exams, rules, CPD requirements and so on. These are resources, described states, and hypertext links of state transitions. This is the core of the business logic of the organization, embodied in documents. It can be used to build the membership application, as it describes for example the actions open to a member at the present time, as well as providing potentially a browsable structure, as well as a formal structure that can be used to ask questions about membership (what are the routes to becoming X, for example).</p>

<p>You can view this as a move of business logic to a declarative rather than imperative form. Things do not have to be stored in documents in the underlying storage, although the interface is document and hypertext based. Hypertext makes things human browsable, and declarative makes them computer browsable, and HATEOS makes them discoverable. Business logic becomes content rather than data, rather than there being tables of parameters that the business logic black box feeds off, the states themselves become resources that can be discovered, addressed, and reasoned with.</p>

<p>As states become resources in a REST model, because of the statelessness of web applications, both states and state transitions become first class objects. Constructing ad-hoc queries about state changes should become easy (what percentage of customers renew their contracts, say). The first class objects need to be the ones that are meaningful for the business.</p>

<p><em>There is more to this, it needs expanding. Perhaps we need a manifesto. The software architecture of the web is going to make huge changes to the architecture of other realms, more than most people realize it seems. It is inevitable as things get re-architected around the web that more areas will be affected; also there are the benefits of scalability and reliability that are being created for the web. If anyone has any other business case references for REST please post in comments.</em></p>

<p><a href="http://www.dmst.aueb.gr/dds/etech/arch/rom.png"><img src="http://www.dmst.aueb.gr/dds/etech/arch/rom.png" width="450px"/></a></p>

<p>(Image from <a href="http://www.dmst.aueb.gr/dds/etech/arch/indexw.htm">this software architecture overview</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/the-resource-oriented-enterprise/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>CMS technology choices</title>
		<link>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/#comments</comments>
		<pubDate>Sun, 02 Aug 2009 20:40:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=33</guid>
		<description><![CDATA[Response to Julian Wraith's "The future of Content Management…" post covering some of my arguments with Jon and some technical decisions that the content management community will have to make to get to that future…]]></description>
			<content:encoded><![CDATA[<p>The future of Content Management is what we make of it right now, it has not been decided or built yet. Remarkably for a market with so many people in it there are no hard and fast rules and nothing definitive. However we are coming to the end of the experimental phase and the hard decisions are going to be made now, and the future for a fairly long period will be determined pretty soon now.</p>

<p>Although the vast majority (but not all) of open source content management systems are continually trying to reinvent the blog, we are talking about internet infrastructure here, and the future of content is going to be open source, like the rest of internet development. I also believe that long term the web project will overwhelm the legacy areas of document management, although it may take some time. Hypertext, the web architecture, XML, HTML, and all those standards are here to stay and to dominate long term. Content management will also become pervasive long term, as the blogging projects show, as the right tools make content management a natural part of workflow. Content management succeeds when it replaces the file and folder paradigm with a content-led paradigm.</p>

<p>In my conversation the other day with <a href="http://jonontech.com/2009/08/01/i-have-a-dream-of-the-cms-future" title="Jon's post on the subject">Jon</a> I was arguing that although we agree on many of the technical issues there are real decisions that need to be made about what needs to be built to get the the content management future. Below are some of my lists of differences. Generally, I think the future of content management is going for the left hand one of the pairs, although some are not clear yet. I have probably missed a lot of the things to determine, but it is a start.</p>

<h2>Architecture &ndash; API differences</h2>

<p>These may cause API and other more significant differences, though some may not matter (eg git can read svn repos, but not vice versa).</p>

<ul>
<li>REST vs SOAP</li>
<li>REST vs Java native interfaces</li>
<li>distributed version control (git) vs file based (SVN)</li>
<li>compositional vs monolithic</li>
<li>structured content vs files</li>
<li>relations vs metadata</li>
<li>web (hypertext) content vs documents</li>
<li>URIs vs referential integrity</li>
<li>web applications with content management vs content management systems</li>
</ul>

<h2>Architecture &ndash; performance differences</h2>

<p>These could have different implementations with different performance characteristics potentially. These are basically IA differences to a large extent, so they do depend on the type of problem being modelled and the modelling process. Models and performance are linked though, and the best we can do is to make parts of this pluggable so that a range of performance characteristics can be used.</p>

<ul>
<li>unstructured vs structured</li>
<li>sparse vs dense</li>
<li>untyped vs typed</li>
<li>NoSQL vs RDBMS</li>
<li>permission hierarchy vs permission graph</li>
<li>scaleable vs local</li>
</ul>

<h2>Development process</h2>

<p>This is key to getting the product to where you want it to be.</p>

<ul>
<li>open source vs proprietary</li>
<li>API driven vs UX driven</li>
<li>ubiquitous content management vs isolated systems</li>
<li>agile vs monolithic</li>
</ul>

<h2>Architecture &ndash; usage differences</h2>

<p>These could potentially just come down to the ways or tools with which components are joined together, maybe they do not affect architecture per se.</p>

<ul>
<li>social media vs controlled content</li>
<li>programming languages (Javascript, XSLT) vs templating systems</li>
</ul>

<p><a href="http://browsertoolkit.com/fault-tolerance.png"><img src="http://browsertoolkit.com/fault-tolerance.png" alt="fault tolerance" width="500px"/></a></p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Standards in Content Management</title>
		<link>http://blog.technologyofcontent.com/2009/04/standards-in-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2009/04/standards-in-content-management/#comments</comments>
		<pubDate>Mon, 13 Apr 2009 09:51:01 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[CMIS]]></category>
		<category><![CDATA[JSR]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=24</guid>
		<description><![CDATA[CMIS, JSR, first post on how these are affecting content management, and the choice between them. More followups later...]]></description>
			<content:encoded><![CDATA[<p>The great thing about standards is there are so many to choose from.</p>

<p>In content management the old saying has not really been relevant. There have not really been many standards. Then one day JSR-170 turned up, a Java content management standard.</p>

<p>Then CMIS.</p>

<p>Then the other day, JSR-283, ostensibly the simple successor to JSR-170 <a href="http://dev.day.com/microsling/content/blogs/main/jsr283proposedfinaldraft.html">suddenly split into two</a>, one part the data model and the other part the Java API. Clearly leaving room for a non Java API track. Not being on the standards group I do not know when this happened, but it does smell like a bit of reaction to CMIS. And indeed Roy Fielding from Day did react to CMIS recently <a href="http://roy.gbiv.com/untangled/2008/no-rest-in-cmis">in a not happy way</a>.</p>

<p>Previously most of us outside the Java CMS world looked on the Java standard as being a pure Java move. Many standards attempts now are about strategic commercial interests. Actually though within the Java world it seemed to be quite well accepted, and Magnolia and Day could live in the same world. Deeper down though, is it an attempt to define ranges of functionality within product ranges? Although the specification does not correspond to the implementation necessarily, both constrain each other, especially in some parts of these specifications.</p>

<p>CMIS was a different matter. A different set of vendors, a similar model, a different set of APIs (a non RESTful REST that Fielding could lay into!). But the biggest criticism of JSR-170 was simply the J; anything could look credible by being more inclusive. Hence I think the current changes in JSR-283.</p>

<p>The CMIS draft states</p>

<blockquote>
  <p>The JSR standard requires a particular type of implementation for ECM repositories: Whereas CMIS 
  restricts itself to specifying only generic/universal concepts for ECM constructs like Documents and Object 
  Types that could be layered on most existing ECM implementation, the JSR standard requires a highly-specific
  &amp; feature-completion implementation of a repository. This structure may not be appropriate for 
  many types of applications, or efficiently layered on existing ECM repositories.</p>
</blockquote>

<p>This is core to Fielding&#8217;s substantive criticism &#8211; it is just trying to model folders and files, and a WEBDAV interface does that fine. JSR defines abstract item and property trees, that the CMIS players feel don&#8217;t fit with their content models. The CMIS draft mentions webdav, but says it misses out on types and queries, locking, and not http interfaces(!).</p>

<p>Jon Marks http://jonontech.com/2009/04/09/cmis-is-xpath-just-a-bit-too-tricksy/ points to the XPATH/SQL distinction. XPATH too is a bit modern for the CMIS vendors. The XPATH expressions refer to the XML representation of the property tree, which if it does not actually correspond at all to your internal models or implementation method is quite a lot of work to implement.</p>

<p>There is some truth in the implementation specific parts of the criticism of JSR, in that I am not aware of any implementation of the JSR standards that does not use a JSR native content repository as the base implementation (eg Jackrabbit). Maybe I have missed one. Partially that is because it is a strong model for a moderns CMS, with excellent open and closed source implementations. Partly also that not many vendors have yet explored related but dissimilar frameworks (it is not clear that an RDF model would fit well for example as properties are first class; though a mapping may be possible the semantics of an object may end up differing). The other area in which differences will probably become apparent are the versioning models, though these will not necessarily be exposed through interfaces if there is a big mismatch.</p>

<p>In an abstract sense the differences in the two models seem small. In CMIS documents have a (single) body and then additional properties. In JSR objects simply have properties, the &#8220;content&#8221; is simply anther property. Although that sounds subtle and easy to switch models &#8211; just add a type property and a content property to documents, not to folders, there are more and more model differences the further you go. Locking for example. And the rules about folders not being versioned while files are, and the primacy of the document folder containment relationship. Another difference in CMIS is the large list of optional facilities, such as the query types available, and whether checked out copies or versioned copies of documents are accessible through the query mechanisms.</p>

<p>Overall there is a difference in what the &#8220;generic/universal concepts for ECM constructs&#8221; are. Two formalized models is actually a useful start to help classify the CMS models that exist. A web services interface to JSR will help that model expand outside the Java only area. Islands of interoperation or at least data transfer can only help the industry as a whole.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/04/standards-in-content-management/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
