<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; Uncategorized</title>
	<atom:link href="http://blog.technologyofcontent.com/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 25 Apr 2010 21:45:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>iPad review</title>
		<link>http://blog.technologyofcontent.com/2010/04/ipad-review/</link>
		<comments>http://blog.technologyofcontent.com/2010/04/ipad-review/#comments</comments>
		<pubDate>Sun, 25 Apr 2010 21:45:47 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ipad]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/2010/04/ipad-review/</guid>
		<description><![CDATA[We got an iPad at work the other week and are sharing it in rotation so here are my thoughts after the first week. Note that this review is written on the iPad as a full test!

Out of the box

The out of the box experience is pretty terrible you sit in the airline lounge and [...]]]></description>
			<content:encoded><![CDATA[<p>We got an iPad at work the other week and are sharing it in rotation so here are my thoughts after the first week. Note that this review is written on the iPad as a full test!</p>

<h2>Out of the box</h2>

<p>The out of the box experience is pretty terrible you sit in the airline lounge and open it up and turn it on and then it just insistently asks for the mothers nipple iTunes parent for imprinting. No playing until that&#8217;s done.</p>

<p>It gets better after that though. As you expect it is nice to look at and to use like most Apple products. Pictures in particular look lovely and it is fast smooth and responsive. Web pages vary in how quick they are, but somehow slow pages seem more annoying.</p>

<p>I haven&#8217;t installed many apps for various reasons: the UK app store does not have all the apps in, for example none of the Apple ones. The pad won&#8217;t even download apps over the air saying &#8220;the app store is not supported in your country&#8221; but it will sync them. However my work Mac that it was mothered to died leaving it orphaned at a young age.</p>

<h2>Comparison to a netbook</h2>

<p>I mostly write my blog posts on my Eee PC running Ubuntu netbook remix. The iPad seems noticeably faster. Part of this is clearly the screen rendering which on the Eee is a terrible integrated Intel graphics chipset. Mind you it was much cheaper too. The bigger screen size and higher resolution of the iPad make it nicer to use too. But there is a limit to what you can do as there are very few applications. Personally I like to have a command line and a programming language locally rather than on the web, although I guess one could rig something up in browser with local storage; there seem to be <a href="http://robrohan.com/projects/9ne/">a few editors around now</a>.
Will think about that model although it needs a lot of infrastructure to work.</p>

<h2>Web</h2>

<p>The web works well. In portrait view which I have mostly been using you can see a long way down a page and generally read the text too. That&#8217;s quite a nice page view. The big issue though is browser detection which is a terrible thing. People are detecting browsers not capabilities. For example the BBC iplayer thinks the browser wants flash even though they have a QuickTime iPhone version (the videos may be iPhone resolution only I suppose but that would be better than trying to show flash. Other sites show the mobile version which should mainly be about screen resolution detection not the browser identifier. Some sites just don&#8217;t work because they expect hover states. Other than that you just want to manipulate things rather than press buttons, as a mouse interface just feels unnatural. It is going to take a while before many web sites have gestural interfaces.</p>

<h2>Writing</h2>

<p>Google docs turned out to be a bit confused and I was unable to create a new doc on the website for this review I think it was only showing the mobile version even when I selected desktop. I couldn&#8217;t bear to use Notes with it&#8217;s hideous use of the Marker Felt font. Fortunately I had installed the Evernote app earlier so that&#8217;s what I am using. Typing is not great, pressing on glass and you have to hold it up with the other hand as typing on the lap doesn&#8217;t feel right. Doable but not ideal. Also I can&#8217;t work out how to turn the clickiness off other than just turning the volume down.</p>

<h2>Gestures</h2>

<p>After a short time gestural interfaces become very natural. There are a few issues with standards and ways of doing things that are not yet well defined but the basic movements are simple and other operations are easily learned. The big screen makes things much easier than the iPhone and multi finger operations make sense like bunching and spreading out photo albums in the picture viewer. Our hands are good at learning these sorts of operations and being precise about them. Touch is far more natural than say speaking to a computer. It is interesting that Apple and others have gradually been introducing elements of touch such as two finger scrolling on touch pads that I find it hard to manage without. Who wants to move a mouse to a picture of an elevator  when you can just stroke the screen?</p>

<h2>Walking around</h2>

<p>The iPad feels the write sort of thing to use in meetings for looking at reference material (the Basecamp overview page works well for example), looking at diagrams, the web, taking notes and so on. It is almost as easy to walk around with as a pad of paper and generally as useful although I don&#8217;t find diagram drawing very intuitive yet. A camera would be useful for capturing whiteboard pictures and so on; having a device without a way to get rough pictures easily is a bit annoying. Oddly though I don&#8217;t feel the same way about the netbook which has a front facing camera for video calls that I don&#8217;t use. If the phone could talk to the iPad easily that would help but it doesn&#8217;t. It is one of the Apple annoyances that they want to sell iPad 3G contracts rather than make the iPhone and iPad work together as a unit.</p>

<h2>Future of portable devices?</h2>

<p>I think the gestural touch interface is going to win over the mouse mediated interface. The keyboard will last, but maybe as an accessory like with the iPad rather than joined. However vertical screens don&#8217;t work with touch as your arms get tired. Pad is perhaps the right model reflecting how we use paper most of the time. There are issues about how to hold and use it that will need to be ironed out, and there are issues with Apple&#8217;s idea that it should be a simplified computer as they have perhaps gone too far. Indeed I would be very happy if it had a gestural version of Ubuntu on it like my netbook I think that might be perfect.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/04/ipad-review/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>JSON vs XML</title>
		<link>http://blog.technologyofcontent.com/2010/01/json-vs-xml/</link>
		<comments>http://blog.technologyofcontent.com/2010/01/json-vs-xml/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 21:16:09 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=206</guid>
		<description><![CDATA[A lot of web developers you meet hate XML with a passion, and JSON has taken its place as the format of choice for a lot of API work. There are some advantages to JSON, but some disadvantages, and XML does have some problems, but the arguments are not as simple as generally made out.

I [...]]]></description>
			<content:encoded><![CDATA[<p>A lot of web developers you meet hate XML with a passion, and JSON has taken its place as the format of choice for a lot of API work. There are some advantages to JSON, but some disadvantages, and XML does have some problems, but the arguments are not as simple as generally made out.</p>

<p>I have been looking at the issue of writing filters for formats that basically change them as little as possible. This is a slightly difficult field in many ways. You have to store a fair amount of extra information in order to do this, but if you are say changing some metadata items for a user they may not want their CDATA removed and replaced with a semantically identical but syntactically different form. So I am looking at the formats partly from this point of view. Sometimes this brings up the conflicts between the human readable and computer manipulation aspects of these formats, the logical and physical structures. I have also been looking at making simple tools to allow modification of document formats, which has also raised some issues.</p>

<h2>Pro JSON</h2>

<p>JSON is simple and now well defined. It used not be clear in a few places, like how to encode items outside the 2 byte UTF-16 encoding using the \u notation. Many people do manage to generate invalid JSON (no quotes around identifiers, use of single quotes, use of a BOM at the start) it seems, which is a problem. I mean if people cannot even get possibly the simplest format ever invented right, what hope is there for civilization? This should get better as standard libraries that do the right thing come out one would hope! Introducing lax JSON parsers (in the style of HTML 5) seem to be unnecessary for a simple format that is normally generated by a computer. A strict JSON parser is not a lot of code.</p>

<p>JSON has a simple way of showing which of the allowable encodings it is in, based on the zero bytes at the start, as the first two characters must be ASCII. Allowing a BOM and then allowing unicode whitespace might be more standard, but the whitespace has no function except for use in text editors.</p>

<p>Despite teh attempts by, for example, E4X to add a simple XML native format and processing model to Javascript, JSON remains much easier in most languages to process, as it is built around structures that most languages just have natively, while XML is not. Some languages have issues with a mismatch to JSON, but most are fine. E4X on the other hand has security issues client side, and is not seeing adoption except in some server side applications.</p>

<h2>Against JSON</h2>

<p>JSON does not have a native hyperlink type. This is unacceptable in a web format in my opinion. For example a REST interface (a real one, not the clean URLs many people who use JSON think are REST, one with HATEOS) requires native links and link types. A native link format such as {&#8220;next&#8221;: <a href="http://example.com/next">http://example.com/next</a>} would solve a lot of issues, and would be compatible. There is JSON-Schema trying to add schemas that can extend the type system, but having to have a schema to understand links just seems overkill to me.</p>

<p>JSON ought to specify that unicode strings are normalized really. I guess most of them are, but it does mean you should normalize before doing comparisons on keys for example.</p>

<p>There are some syntactically different representations: arbitrary white space, although this does not including the full unicode white space definition, and backslash escaped characters, which can mostly be represented directly in the unicode encoding of the document. Whitespace clearly needs to be preserved for readability and use of line oriented editors and tools. It is unclear how inconvenient it would be if \u codes were normalized to unicode which is the sane default. I think there were some tools that did not support unicode, although it is mandated by the standard; it is odd perhaps that an ASCII encoding is not an option, but it seems unlikely to be important. In fact though, preserving the exact used syntax is not difficult in many applications as, unlike with some cases in XML, this does not involve much additional state.</p>

<p>Although JSON was designed to serialize data structures, its small set of types is limiting. We are not going to get around this though for now, as computer languages still have such different ideas of types. Most uses of JSON have an implicit schema, which is both a strength and weakness. Most implementations were tightly coupled, for example in AJAX. Now we are seeing more APIs exposed to the world using JSON; these have more need for a schema. I tend to prefer the idea of HATEOS, the REST idea of hypertext as the interface design constraint, rather than published schemas in the SOAP WSDL style, and JSON seems to be more inclined to move to the latter. Especially if people use JSON-RPC, I thought people had given up on the RPC style on the web but it appears not.</p>

<h2>Data model</h2>

<p>The JSON data model is simpler than XML. This is less clear as a differentiator. XML nodes have attributes and children, JSON ones attributes or children if you consider the object to model an attributes set and the array or list type to model an ordered list of children. This difference is not hard to work around, and it is very domain specific what the requirements are.</p>

<p><a href="http://twitter.com/dehora">Bill de hOra</a> pointed out &#8220;you should add field cardinality to the distinctions &#8211; json needs to change structure [], xml needs just another element&#8221; which is a very good point.</p>

<h2>Schemas</h2>

<p>Schemas for validating are great. Validation is an important activity. It is complicated though in general, rules such as this must be filled in if that is not and so on. Essentially a validation schema might need to be very complicated, but many are very simple. Having a choice of languages to express these  constraints in seems to me to be a good thing. The XML DTD is too weak, and should not have been included in the language, as discussed below. Some constraints are computationally complex and need very expressive languages.</p>

<p>The second function of a schema is interpretation; this may relate to validation in that a field must be readable as a number say, and we are also going to read it as a number. This is a different requirement, as in many cases it is about object modelling and code generation, when a validated structure is then mapped to a native language object. These are conceptually separate processes, as a number may be constrained to be between 3 and 5 for domain reasons, but the representation in say Java may be an integer, but it need not be. Of course the validation stage here is essential for security reasons, to stop overflows and type errors; however these are conceptually different activities and may have different schemas.</p>

<h2>Against both</h2>

<p>Binary data is a big problem. We will need a lot of other formats for anything that has binary data, they are just so much more efficient, even after compression. So ideas of a universal format are not going to happen.</p>

<h2>Against XML</h2>

<p>XML is weak on unordered items. Most of the structure in an XML document is the child relations and these are ordered. This is used as a criticism, but I am not sure it is that reasonable, as attributes are unordered and as said elsewhere there is an equivalence with the two structures provided by JSON, named and unordered items, and unnamed ordered ones which seems natural.</p>

<h2>Pro XML</h2>

<p>It was pointed out <a href="http://twitter.com/dret">by Erik Wilde</a> that I had missed out the pro XML section. This was an accident. I am actually very pro XML in many ways. First it has enough structure that we can build rich data structures; and to add to that it has some standard forms (such as XHTML) with rich sets of attributes and elements which can be reused in a variety of domains, and standard link relations. The other big thing is the set of extraction and transformation tools, which are generally quite well designed and fairly complete. There are stream and DOM parsers widely available.</p>

<h2>Against XML DTDs</h2>

<p>The DTD, which XML inherited from SGML, is an anomaly in many ways. First it has a non XML syntax, so we need another set of parsers and tools to work with it. It has several functions that really need to be separated. The first function is as a schema for validating documents against. Unfortunately it is not a very good schema language, as the constraints it can apply against documents are limited. Now we have for example XML Schema and RELAX-NG, which are better schema languages, but the DTD has a special position in the specification that is difficult to drop.</p>

<p>In addition to being a schema, the DTD can also define default values for attributes that the application should see just as if they were in the document. This is the kind of thing that makes preserving the textual form difficult, as there is a syntactic but not semantic difference between certain attributes. I also do not think that this is used much, as real defaults would be implied by the processing model not the document. Clearly it is easy to remove this feature from documents simply by adding in all the implied defaults explicitly.</p>

<p>There are security issues due to the parsing issues with entities, which means that <a href="http://msdn.microsoft.com/en-us/library/ms756016%28VS.85%29.aspx">some parsers disable DTD parsing for security reasons</a>. SOAP for example does not support DTDs. This is of course non conforming, but clearly a good idea in many situations.</p>

<p>DTDs are not namespace aware, which makes them unusable in many cases with documents with namespaces. Another reason to deprecate them.</p>

<h2>Against XML entities</h2>

<p>Then there are entities. My reading of the initial spec is that entities were designed to save typing for people, but I do not think that they are used for anything except for memorable encodings of characters outside the ASCII set. The thing about this use case is it is perfectly alright to substitute the values for them, as they never change, whereas if I create my own arbitrary entity inn a DTD for the name of something it may be because I wish to use this like a search and replace function to substitute whatever I want in. This is in my opinion not really appropriate at the document format level, this is an application level tool, and the application should use regular XML tags for this type of user level structure.</p>

<p>XML entities can also be used as an inclusion mechanism; again the DTD is not the place to define this. XInclude seems much better if this facility is needed.</p>

<p>Entities can contain other entities, markup and so on. Recursion, and unbalanced markup are not allowed. This whole thing adds enormously to parsing complexity, when the use case is entirely as character data.</p>

<h2>Against XML namespaces</h2>

<p>I am not against XML namespaces per se, but there are <a href="http://lists.xml.org/archives/xml-dev/200204/msg00170.html">pathological cases</a> which make them very hard to process sanely. In particular, you can redefine the same namespace name to refer to multiple URIs in the same document, and you  can refer to the same URI with  different names. This effectively means that all processing needs to refer to both the short name and the full name. As this is exactly what the spec was trying to avoid it is pretty bad. The amount of state you need to keep to keep a namespaced document textually the same after processing is very large; the nasty mess one tends to get from parsers to let you cope with namespaces is one measure; another is the complexities of xpath on namespaced documents, especially ones with any of the pathological cases in.</p>

<p>The simple solutions seem to involve not allowing redefinition of namespaces to a different URI in the same document, or the converse; declaring all the namespaces that will be used in the root element is also an option. This means processing can be more or less namespace unaware, as xsd:type will mean the same thing regardless of the context. This falls in with the standard usage, where a fairly small set of namespaces are used and they have abbreviations by convention that remain constant across large sets of documents. This means that very little namespace awareness complexity is needed.</p>

<h2>Other issues</h2>

<p>Mixed content, the role of CDATA, the significance of whitespace, these are all extremely complex issues that could be simplified.</p>

<h2>Minimal XML proposals</h2>

<p>XML, quite hard but worth it? For the applications I am interested in, I think simplification is needed. The first issue is that security and simplicity are related. Anything web facing will get hostile documents thrown at it, and having more constraint helps, in a way that the document processing industry does not see so much as an issue.</p>

<p>There was a time ten years or so ago, when minimal XML proposals were fashionable. XML itself was of course an attempt at a minimal SGML proposal, but not enough was cut or changed, and much compatibility was kept. <a href="http://simonstl.com/articles/cxmlspec.txt">Common XML</a> seems the most reasonable to me, and addresses many of the issues. XML tools do not work in the way that was perhaps envisaged, and making things simpler and easier, evolving them, will make them more robust. JSON shows that the demands for simplicity are there, and XML will suffer if it does not answer these.</p>

<p>The first thing is to drop the DTD. It serves no real function now we have alternative schema languages for XML. Radically, I think we can drop entities too, other than the necessary ones for escaping (amp, quot etc), and numeric ones which are again syntactic. The only possibility for requiring named entities is XHTML, but it barely exists now, and those entities could be special cased there without difficulty, as their values will never change and they do not contain markup or other things that cause parsing issues. Arguably these named entities could be added to the XML spec anyway for all documents, changed to a purely syntactic thing. I am not aware of any other XML usage of entities; there may be a few I suppose.</p>

<p>For namespaces, there needs to be a solution that maps syntax to semantics, so that an attribute or element syntactic name has the same semantics throughout the document. Renaming in different scopes makes global transformations, comparisons, and simple processing too hard. It breaks simple search and replace, even that needs to be namespace aware.</p>

<h2>Data versus applications</h2>

<p>Part of the conflict is due to whether XML is an application protocol, or a data format. Some of the bits that have issues, like entities, are really part of an application data format, for a class of applications that work according to the model in the mind of the XML designers, which in turn was based on real SGML applications. But data formats are winning really. We want to attach additional semantics to data now through standard mechanisms, such as relations, RDF and so on, not be expanding the storage format. Simplicity is winning here: complexity in a data format does not add to the richness that can be expressed; simple uniform mechanisms can do this. And simplicity is going to win; linked data over Microsoft Word style application data formats.</p>

<h2>What will happen?</h2>

<p>I actually think these changes are, informally, happening. DTDs and entities are not used in many cases now. They may be in some publishing applications, especially those based on SGML, but the web document architecture does not use them significantly. Namespaces are used in a particular way, usually. HTML5 has shown what the logic of human readability and writeability implies, which is a non XML language. The great advantage of XML is the variety of ways in which it can be processed, but issues such as security to hostile documents, parsing complexity, performance, and ease of processing really matter a lot, and despite many weaknesses JSON is showing the way of radical simplicity. But a simplified XML would be no more complex than JSON I think, and have the advantages of richer tool support, and widespread use. Most of the XML in the wild an APIs is very simple; the sorts of XML that are embedded in other documents as metadata are simple too. Security is limiting processing, and the traditional publishing applications that historically used more of the functionality could change too, although more slowly. Will simplicity win, and wil JSON replace XML? I think not, because so much XML is in use, but I think a specification of an XML subset is needed to stabilise the situation.</p>

<p><a href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags"><img src="http://blog.technologyofcontent.com/wp-content/uploads/2010/01/parse.png" width="440" alt="you cannot parse XML with regular expressions"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/01/json-vs-xml/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>The bottom 10 things of 2009</title>
		<link>http://blog.technologyofcontent.com/2009/12/the-bottom-10-things-of-2009/</link>
		<comments>http://blog.technologyofcontent.com/2009/12/the-bottom-10-things-of-2009/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 23:40:04 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=180</guid>
		<description><![CDATA[Ok so I agreed to write a bottom 10 list for 2009, in a twitter agreement with @pmonks. Unfortunately I have just had another bout of winter flu so it has got a bit late so I may not make it to 10, unless there is another last minute entry (any suggestions?). Actually checking the [...]]]></description>
			<content:encoded><![CDATA[<p>Ok so I agreed to write a bottom 10 list for 2009, in a twitter agreement with <a href="http://twitter.com/pmonks">@pmonks</a>. Unfortunately I have just had another bout of winter flu so it has got a bit late so I may not make it to 10, unless there is another last minute entry (any suggestions?). Actually checking the tweet, it said bottom for 2010, but it is traditional to do that in the new year. So here goes, here is what kept me awake in the night in 2009.</p>

<h2>10. The Pirate Bay saga</h2>

<p>In yet another mess in the ongoing spectacle of the entertainment industry preferring legal to creative solutions was the Pirate Bay trial. All this really showed us was that the laws are just not well framed, so anyone could win, and it may all change on appeal. This time it was not suing your customers directly, but legal action is not going to make anyone change their mind. Obviously the next step is going to be to influence the passing of bad laws, not the creation of business value. It seems uncoincidental that Spotify is Swedish. In times of change, business model engineering and service engineering are as important as product engineering. Legal action in the way the entertainment business is conducting it creates nothing long term.</p>

<h2>9. <a href="http://en.wikipedia.org/wiki/Internet_censorship_in_Australia">Australian internet censorship</a></h2>

<p>Get your act together Australians and stop this. Many other governments are looking at ways to start doing this, so it is an important example.</p>

<h2>8. The EU MySQL Oracle Sun delay</h2>

<p>You cannot make industrial policy on this sort of timeline. If the EU were to turn down the deal now Sun would be destroyed. Oddly MySQL was at a transition point anyway. I am very much in favour of the <a href="http://en.wikipedia.org/wiki/Drizzle_%28database_server%29">Drizzle idea of the future of MySQL</a>; who knows where it will end up but it may well be outside Oracle anyway.</p>

<h2>7. SPARQL is a query language without a resource model</h2>

<p>Looks like this has a chance of being fixed in 2010 at last, although I have temporarily mislaid the references, check for the newer references to named graphs. The idea that you could launch a query language for the web without a resource model was yet another of the dumb W3C ideas. The model appeared to be to build XML in Prolog. That sucks. Unfortunately the fixes are quite substantial (quads not triples for example).</p>

<h2>6. WebDAV</h2>

<p>Although not exactly something from this year, remarkably it has kind of held on and since people still specifically mention it as an alternative to CMIS. Indeed it is kind of useful sometimes, in strange situations, and it does work in a limited way, but it is not a modern HTTP interface. You have to remember how early it is, as work started in 1996, when it was not clear how the web would develop, or indeed how HTTP would develop (HTTP 1.1 was out but not much used and it was shipped in a mostly 1.0 environment). Even at the time some of the mistakes were clear, but the great thing is they are all documented in Yaron Goland&#8217;s <a href="http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0308.html">The WebDAV Book of Why</a>. Like the issues of hierarchy that make mixing WebDAV with normal HTTP impossible, and the <a href="http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0303.html">depth header disaster</a>. There are also some comments about it in the <a href="http://jonudell.net/udell/2006-08-25-a-conversation-with-roy-fielding-about-http-rest-webdav-jsr-170-and-waka.html">Roy Fielding podcast</a> in which Roy tries to avoid talking about JSR. The best thing about WebDAV is how well documented the mistakes are; this should be compulsory for all standards.</p>

<h2>5. <a href="http://en.wikipedia.org/wiki/GeoCities">Geocities</a></h2>

<p>The embarrassing kiddie years of the internet dead and buried. Mostly not the worst of 2009, but the idea you could still nurture your city. Obviously an anti archival moment for future historians to curse about. Still, a hubris reminder too, this was once the third most popular site on the intranet, and sold for $3.57 billion; look on their works ye mighty and really despair.</p>

<h2>4. Microformats</h2>

<p>Never going to work. We really need generic metadata representations that have sane serializations or embeddings into all formats. Metadata <a href="http://blog.technologyofcontent.com/2009/08/metadata-is-not-what-it-used-to-be/">now lives within documents</a>; it used to get lost before that. So the RDF model has won, and microformats have lost. Oh, and the standards process sucked.</p>

<h2>3. The XHTML2 débâcle</h2>

<p>Had to happen, but why did it take so long for the W3C to fall behind HTML5 rather than XHTML2? This was a huge diversion of resource. The W3C churns out stuff and some of it gets adopted, some is implementable, some of it is not implementable realistically. The organization needs to change or it will be irrelevant.</p>

<h2>2. The Go programming language</h2>

<p>I am an aficianado of programming languages. I have programmed in many of them, C, Haskell, you name it. Lua and Erlang my new ones for the year though its getting a bit late and I have barely started. I know my combinators from my closures. What is the point of <a href="http://en.wikipedia.org/wiki/Go_%28programming_language%29">Go</a>? It does not really offer anything for the currently interesting problems, I do not think it is going to make it anywhere. I would be surprised if it ever gets onto the allowed Google programming language list, which is <a href="http://steve-yegge.blogspot.com/2007/06/rhino-on-rails.html">C++, Python, Java, Javascript</a> since you ask. Google is doing some cool performance work on python though under the name <a href="">unladen swallow</a>.http://code.google.com/p/unladen-swallow/wiki/ProjectPlan).</p>

<h2>1. I4I&#8217;s patent win over Microsoft</h2>

<p>A last minute entry here. I4I has an <a href="http://www.theregister.co.uk/2009/12/22/microsoft_loses_word_patent_appeal/">injunction against Microsoft selling Word</a> without the generic XML editing functionality removed. Obviously it will be removed, and it is not a feature that a lot of people used. However <a href="http://broadcast.oreilly.com/2009/08/microsoft-and-the-two-xml-pate.html">analysis of the patent</a> indicates that it clearly has prior art, is unclearly applicable, and could affect many other XML applications. The affected part of Word is designed to be a fairly general XML processor, with similar capabilities to <a href="http://en.wikipedia.org/wiki/XForms">XForms</a>. We need to support Microsoft in getting the judgement reversed.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/12/the-bottom-10-things-of-2009/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Social {space&#124;media&#124;policy} @Starbucks</title>
		<link>http://blog.technologyofcontent.com/2009/12/social-space-media-policy-starbucks/</link>
		<comments>http://blog.technologyofcontent.com/2009/12/social-space-media-policy-starbucks/#comments</comments>
		<pubDate>Sun, 06 Dec 2009 13:52:50 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[flickr]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[starbucks]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=164</guid>
		<description><![CDATA[Companies still don&#8217;t get social media. Starbucks may have 5,151,861 fans on Facebook but they don&#8217;t get the way social media actually involves engagement and changing the way you work.

I happen to like pictures of people unposed, and I sometimes take them when I am in the right mood. The decisive moment of course has [...]]]></description>
			<content:encoded><![CDATA[<p>Companies still don&#8217;t get social media. Starbucks may have 5,151,861 <a href="http://www.facebook.com/Starbucks">fans on Facebook</a> but they don&#8217;t get the way social media actually involves engagement and changing the way you work.</p>

<p>I happen to like pictures of people unposed, and I sometimes take them when I am in the <a href="http://www.flickr.com/photos/justincormack/2632480082/">right mood</a>. The <a href="http://en.wikipedia.org/wiki/Henri_Cartier-Bresson">decisive moment</a> of course has an important role to play in the history of photography; when I was on holiday I happened to take a picture of a man sitting on the street outside a Starbucks, and when I <a href="http://www.flickr.com/photos/justincormack/">posted it to Flickr</a> I thought I would find a Starbucks group to post it to. And thats where I <a href="http://www.flickr.com/groups/starbuckscoffeecompany/discuss/72157622351418443/">found this hilarious thread</a>, which is a warning to people about what happens if you jump into social media at the deep end.</p>

<p>At the end of September, when the <a href="http://www.flickr.com/groups/starbuckscoffeecompany/">official Starbucks group</a> was started, in the social media frenzy of 2009, this <a href="http://www.flickr.com/groups/starbuckscoffeecompany/discuss/72157622351418443/">thread was started</a>, pointing out that many people had been asked not to photograph in Starbucks stores, or had been thrown out for taking photographs.</p>

<p>The official responses from the official moderator <a href="http://www.flickr.com/photos/42346097@N02/">analisamarie</a> started off fairly optimistically</p>

<blockquote>
  <p>Our formal policy is that all press-related photo inquiries need to contact press@starbucks.com prior to taking pictures in a Starbucks store. However, we have no formal policy around customers taking non-press related pictures in-store so if you hear otherwise, it might just be because your barista is camera-shy <img src='http://blog.technologyofcontent.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
  
  <p>Hmmm- good discussion! Sounds like there is a bit of confusion out there &#8211; let me take this back to my team and see what we can do to help. Thanks for bringing this up&#8230;more to come!</p>
</blockquote>

<p>Then got bogged down in legal</p>

<blockquote>
  <p>I am making great headway here and hope to have some detailed information for you all shortly. To give you an idea of what I&#8217;m up to, I am researching if some of our international markets have policies around photography in stores. Since international laws and regulations vary country by country, this is quite the task <img src='http://blog.technologyofcontent.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I&#8217;m also working to see where the confusion is stemming from in some US stores. Again, stay tuned. I&#8217;m working on it!</p>
  
  <p>I have been meeting with various teams in the building and learning a lot about the world of policies <img src='http://blog.technologyofcontent.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I hope to have something more concrete to share with you soon &#8211; thanks for your patience while I work through the details.</p>
  
  <p>I am getting closer to a final ruling each day. I have a big meeting on Wednesday and after that, I will post here with an update.</p>
  
  <p>I did have a very productive meeting on Wednesday of last week. We read through each of your comments and now the legal team is reviewing some of your feedback around public and private property. More meetings this week&#8230;more to come!</p>
</blockquote>

<p>The social networker starts networking internally</p>

<blockquote>
  <p>Still here and haven&#8217;t forgotten about you. I&#8217;m writing a blog this weekend/next week about this discussion and hope to post by the end of the week. I&#8217;ll keep you in the know. Have a good weekend!</p>
  
  <p>Just wrote a blog response that my legal team is currently reviewing&#8230;once I have final approval I&#8217;ll post it and let you know. I know it&#8217;s taken a while and I know I&#8217;ve said it before but I appreciate your patience. This has been quite an interesting project to work on and has involved many meetings with all sorts of teams throughout the building. SO glad you guys brought this to our attention so that we could sort it out for you!</p>
</blockquote>

<p>Hints on something more negative</p>

<blockquote>
  <p>We want to do this in the best way possible. There are many perspectives to take into consideration as part of this discussion. That means considering our baristas&#8217; daily work and their privacy, our customers&#8217; experience in our stores as well as your photographic expression of that experience. We have a lot of things to consider when making decisions that affect what happens in our stores. It has to be the right thing for our partners (employees) and customers, and it has to work well for stores around the world. Please continue to be patient while we work on a solution. In the meantime, I do ask that you continue to be respectful of customers and partners in our stores. If a barista asks you not to take pictures, please respect their request. More to come &#8211; Anali</p>
</blockquote>

<p>And more strongly hinting that the answer is going to be no. As several of the people in the thread point out they might as well close the sponsorship agreement with Flickr if they are going to say no to photography.</p>

<blockquote>
  <p>I have to add that this group isn&#8217;t explicitly here for the purpose of taking pictures inside Starbucks stores. That is one part of the Starbucks Experience but pictures of your experience out-of-store are welcome in this group as well.</p>
</blockquote>

<p>This is currently her last post, a few days ago, so the story may still unfold. Possibly not as dramatically as when <a href="http://www.schneier.com/blog/archives/2009/02/man_arrested_by.html">Amtrak police arrested someone for taking pictures for their photo competition</a> but it has slightly broader issues than that.</p>

<p>First there is the timetable thing. Do not bother with social media if you can&#8217;t make decisions quickly as an organization. Period. When issues are raised by social media, you have to respond fast, because things have a habit of going viral. Two months is a joke. Fix the response times before you do anything, or when something blows up you will bodge the response.</p>

<p>Second, be a bit lateral. I mean, surely someone would have thought about this issue, maye the &#8220;camera-shy baristas&#8221; if internally the social media plans were discussed, but obviously this social media plan comes from head office.</p>

<p>Third, social media is not about head office marketing, it is about running an open, transparent business. Flickr is not Facebook, and does not have quite the same vibe (and far fewer photos actually). It is mainly a subscription platform, and many of the people there are generally quite articulate. The issue is not that they are going to complain about your coffee, which has carefully planned responses, they are complaining about the way you treat them as a commmunity. Walk right into it, unprepared.</p>

<p>Engagement has to be seen as a two way street; if you are not prepared to change in the social engagement and you treat it like an advertising campaign you may come unstuck.</p>

<p>The stupid thing is of course that Starbucks is a social space. The coffee shop, its Central Perk in Friends, its the pub in the British community, its the office when travelling. Casual photography is entwined with the social space since the <a href="http://en.wikipedia.org/wiki/Brownie_%28camera%29">Kodak Brownie</a>; it has been reckoned that most of the data the human race has ever produced is in the form of photos (lacking a citation for that right now; would welcome one). Every mobile phone has a camera now. It is not actually hard to work out what the answer to the question should be; it even seems that that is already the policy, though no one can actually tell, as Starbucks policies appear to be secrets.</p>

<p>If your organization doesn&#8217;t grok social media, don&#8217;t copycat and try it anyway. maybe try a dose of Enterprise 2.0 first, and I don&#8217;t mean writing blogs for lawyers to read. This online stuff, it is going to change the way things work, until you understand that you will get it wrong.</p>

<p>Will be interesting to watch and see if Starbucks manage to sort this out.</p>

<p><em>Update</em> We finally have a complete copout policy &#8220;Here&#8217;s the answer that you&#8217;ve been waiting for &#8230;Photos are allowed in our stores for the purpose of sharing them in our Flickr group.&#8221;</p>

<p><a href="http://www.flickr.com/photos/justincormack/4158745476/" title="Man at Starbucks by Justin Cormack, on Flickr"><img src="http://farm3.static.flickr.com/2486/4158745476_786894d534.jpg" width="500" height="399" alt="Man at Starbucks" /></a></p>

<p>(Note picture taken outside Starbucks in a public space, without purchase of Starbucks beverage; however I cannot post it to the Starbucks pool as I don&#8217;t have a model release).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/12/social-space-media-policy-starbucks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>&#8220;Web content is, for the most part, crap&#8221;</title>
		<link>http://blog.technologyofcontent.com/2009/11/web-content-is-for-the-most-part-crap/</link>
		<comments>http://blog.technologyofcontent.com/2009/11/web-content-is-for-the-most-part-crap/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 01:29:58 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=151</guid>
		<description><![CDATA[Book review: Content Strategy for the web, by Kristina Halvorson

I like books. I read a lot of them. The office is full of books, almost all of them mine. Apparently technical people don&#8217;t read books much. I seem to remember a figure (from Joel Spolsky? I am on a plane so I can&#8217;t check right [...]]]></description>
			<content:encoded><![CDATA[<p><em>Book review: Content Strategy for the web, by Kristina Halvorson</em></p>

<p>I like books. I read a lot of them. The office is full of books, almost all of them mine. Apparently technical people don&#8217;t read books much. I seem to remember a figure (from Joel Spolsky? I am on a plane so I can&#8217;t check right now) saying that the average programmer had never read any books on programming, and never would. What&#8217;s that about? If you don&#8217;t think about what you do in that sort of introspective way that makes you want to read books about other people thinking about it that way, then you are not really very engaged in what you are doing.</p>

<p>This blog was meant to have more book reviews than it does; in fact this is the first one. Expect more. Its not that I am not reading the books, but there are other things I need too write about. Expect some more. Why this book? Well I had a plane trip, and it was on the shelf at <a href="http://www.foyles.co.uk/">Foyles</a> and it sounded interesting. I follow <a href="http://twitter.com/halvorson">@halvorson</a> on twitter, and she sounds interesting, though I have not heard her speak. Its a short book, so I wont make the review too long, after all you can just read the book next time you are on a plane.</p>

<p>I am not a content strategist, but I do like writing. And reading, as I mentioned. Now it is therefore possible that there is a big self selection issue here. It could well be that everyone who reads this book will agree with its central premise about the importance of being strategic and serious about writing on the web. It could be that actually only people who read books ever read anything on the web either. Maybe everyone else just looks at the pictures and laughs at the <a href="http://www.comparethemeerkat.com">funny meerkat</a>. Igor the meerkat is the most successful advertising campaign this year, and has turned a failing web property into a great success with no words at all, and not much money either.</p>

<p>Fortunately I don&#8217;t have any figures on the importance of written content, and no internet access on the plane to find any. So I am going to say, I read content on the internet. Almost all of what I read is good content, I don&#8217;t read the other. If it is boring I skip it. If it is too short I skip it. I don&#8217;t watch videos, due to lack of time and headphones. Reading is faster. If you want me in your target audience, you need to write, and write well. Or catch me in the pub over a pint of bitter. (Actually I like good visuals too, underused on the web).</p>

<p>So what are the key things I took away from the book? Think like a publisher. Remember there used to be a whole business managing content; it was called publishing. They planned it and commissioned it and had whole branded collections of it called &#8220;magazines&#8221;, &#8220;books&#8221; and &#8220;newspapers&#8221;. People paid for these they were so good. There is a whole industry to steal ideas and methods from (and people for that matter).</p>

<p>If you don&#8217;t have a content strategy, you are just hoping it will all work out. You might get lucky, especially if you have a good writer, who will unconsciously perhaps create you a strategy, or at least stuff people want to read. But help yourself, think strategic in content like you do in other areas of the business. Plan, execute, measure, regroup.</p>

<p>&#8220;Page tables&#8221; a content wireframe. Don&#8217;t like the term, but they are needed for a web project and I cant think of a better name.</p>

<p>Content is to support the aims of the organization. It is not just marketing any more, it is branding, it is product design, it is sales, and it is the conversations you are having through social media that position your company in the marketplace. &#8220;Recognize content as a valuable business asset&#8221; is it in your investment programme? Is it on your balance sheet?</p>

<p>Web content is long lasting; unlike print which cannot be changed and often has a short shelf life. Kristina has a chilling example of a corporate Youtube channel that no one has logged into for a year. Many corporate blogs just die from lack of care and feeding. A blog is for life, not just for Christmas.</p>

<p>&#8220;Push &#8216;user experience design&#8217; off the pedestal&#8221;. UX without content is like a fish without a bicycle, looks pretty and maybe its tasty, but it is not getting you out to the next village. Or something. Actually some agencies do cover content quite well, but many do not, sticking with the pretty pictures and the ooh shiny widgets.</p>

<p>So there you are. Content, words, unsexy, black like coal, and almost as much effort to get out of the ground. An anthropomorphic pun might work for some people, but not you and me, those of us who reached the end of these words.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/11/web-content-is-for-the-most-part-crap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Content microarchitecture: How I learned to love HTML part 2</title>
		<link>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/</link>
		<comments>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 16:34:24 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[html]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=145</guid>
		<description><![CDATA[I posted recently on an unfinished series by Daniel Jacobson, which was perhaps slightly unfair, so I thought I should write a followup to his final part.

My argument was mainly that storing flat, unstructured data was not enough for most content projects, and the difficult questions of structure needed to be addressed. Daniel&#8217;s third part [...]]]></description>
			<content:encoded><![CDATA[<p>I <a href="http://blog.technologyofcontent.com/2009/11/id-love-to-stay-here-and-be-normal-but-its-just-so-overrated-or-how-i-learned-to-stop-worrying-and-love-html/">posted recently</a> on an unfinished series by Daniel Jacobson, which was perhaps slightly unfair, so I thought I should write a followup to <a href="http://blog.programmableweb.com/2009/11/11/content-portability-building-an-api-is-not-enough/">his final part</a>.</p>

<p>My argument was mainly that storing flat, unstructured data was not enough for most content projects, and the difficult questions of structure needed to be addressed. Daniel&#8217;s third part addresses how they do this. Actually when I first looked through the NPR content after reading the first two articles I could not find any content that had inline HTML, clearly was an accident should have read more, as it is used.</p>

<p>The actual NPR process is interesting. In particular it shows the amount of care and attention in curating content that is needed to keep it reusable, repurposable and valuable. Semantic content requires you understand the range of meanings that it encodes and how to work with them, and transform them, and quality control them. And that means you need to know every tag and attribute that is going into the system and what that means for every output you are or may using.</p>

<p>One of the outputs that many people forget is plain text, and NPR is very clear on the writing style that is necessary for writing for HTML and plain text. Everything must make sense in the text form; links should be additional information that adds to the text not necessary to understand it. And of course no &#8220;click here&#8221;. Text output for other devices may vary between the expressiveness of HTML and that of plain text. Text that reads sanely also helps screen readers and other assistive technologies make the content understandable.</p>

<p>The key points here are that content markup must be</p>

<ol>
<li>Valid. Processing is likely to be inaccurate without valid content, and tools will be more limited in how the process it, or will fail unexpectedly. Best fix this at the beginning of the pipeline.</li>
<li>Meaningful. You need the markup to mean what the author intended, so look at interface usability and training.</li>
<li>Accepted. You do not have to accept all valid XHTML, say. For a start, XML is an extensible language! You can choose a whole range of markup for a story, from the very minimal, to marking up each person and place involved, or more.</li>
<li>Stored. The marked up text must be stored; the NPR decomposition is plain text plus plus normalized markup which may work for some systems; storing marked up HTML without output transforms may work out better for others.</li>
<li>Processed. You need to handle each kind of markup for all output mechanisms, so they need to be introduced in a controlled way, although this should not be difficult. Changing markup is something that may need to happen.</li>
</ol>

<p>I think the third part is the interesting one. Information architecture with websites often stops at the content level, missing out on this information microarchitecture of the textual content itself, leaving this to authors without enough guidance to build a consistent structure to maximise the long term content value.</p>

<p><a href="http://www.teara.govt.nz/en/fossils/7/1"><img src="http://www.teara.govt.nz/files/p9047niwa.jpg" alt="Fossil foraminifera" width="100%"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/11/content-microarchitecture-how-i-learned-to-love-html-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>I&#8217;d love to stay here and be normal But it&#8217;s just so overrated Or How I Learned to Stop Worrying and love HTML</title>
		<link>http://blog.technologyofcontent.com/2009/11/id-love-to-stay-here-and-be-normal-but-its-just-so-overrated-or-how-i-learned-to-stop-worrying-and-love-html/</link>
		<comments>http://blog.technologyofcontent.com/2009/11/id-love-to-stay-here-and-be-normal-but-its-just-so-overrated-or-how-i-learned-to-stop-worrying-and-love-html/#comments</comments>
		<pubDate>Thu, 05 Nov 2009 08:50:10 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=141</guid>
		<description><![CDATA[Last Sunday Peter Monks and I were discussing Jon Marks&#8217; upcoming J. Boye presentation when Peter linked to a pair of articles he had been reading by Daniel Jacobson: Create once and publish anywhere
and Content modularity: more than just data normalization. As it happened I had been thinking a lot about normalization that day.

Now Daniel [...]]]></description>
			<content:encoded><![CDATA[<p>Last Sunday <a href="http://twitter.com/pmomks">Peter Monks</a> and I were discussing <a href="http://jonontech.com/2009/11/04/my-jboye09-fix-wcm-presentation/">Jon Marks&#8217; upcoming J. Boye presentation</a> when Peter linked to a pair of articles he had been reading by <a href="http://www.twitter.com/daniel_jacobson">Daniel Jacobson</a>: <a href="http://blog.programmableweb.com/2009/10/13/cope-create-once-publish-everywhere/">Create once and publish anywhere</a>
and <a href="http://blog.programmableweb.com/2009/10/21/content-modularity-more-than-just-data-normalization/">Content modularity: more than just data normalization</a>. As it happened I had been thinking a lot about normalization that day.</p>

<p>Now Daniel has cllearly done an excellent job of analyzing and modelling the NPR content, a job which is vital to do well, and has chosen the simplest solution that works for it. The simplicity has a lot of benefits, because simplicity cascades in a lovely way. My argument today though is that for most people this type of model is too simple, and the general problem domain means that people need to embrace a more complicated solution for their content modelling, and that means learning to love hTML.</p>

<p>One way to start looking  at the problem is to take your content and consider the case of inline hypertext links. No hypertext is the simple case, you can not solve the problem by forbidding inline linking in body text, which NPR chose, and which I will discuss below. With hypertext the challenges are to either</p>

<ol>
<li>Rewrite links to what will be the output format&#8217;s reference</li>
<li>Rewrite links to a canonical output medium</li>
<li>Strip links</li>
</ol>

<p>The choice may depend on the medium &#8211; your mobile site wants internal links to itself, but output to email might have links to the main site. This violates the clean layering diagram, as the output is not a pure content object. The first option is easier in some ways, as it is not much different from getting the non inline links that a model usually has to do, although NPR actually store all the link URLs in the content store. The difficulties come in this case when you want to output a subset of a site to mobile say, and have to deal with potential dangling links. The second option involves asking another output processor what its URL would be for a content item, which again violates the clean layering diagram. Some media require the third option, print for  example. Making this work is a question of writing style, making sure that the hyperlinks just flow in with the text and are not required for sense &#8211; yes that means not allowing &#8220;to see this story click here&#8221;, but thats how writing for the web should be written, for reading with and without the hyperlinks.</p>

<p>This messiness is important though, and needs to be done well to not impinge on the API layer more than necessary, and to keep the caching working well. Adding this to the cleaner model allows the use of free hypertext, the foundation of the web.</p>

<p>Daniel goes on to say</p>

<blockquote>
  <p>But to truly separate content from display, the content repository needs to also avoid storing “dirty” content. Dirty content is content that contains any presentation layer information embedded in it, including HTML, XML, character encodings, microformats, and any other markup or rich formatting information.</p>
</blockquote>

<p>Now I half agree with this. You cannot avoid character encodings, although mixing them is not usually advisable. But, like the issue of hyperlinks, he seems to be recommending content without any inline structure, just text. Anything else has to be normalized into a separate structure.</p>

<p>Now you can normalize and flatten everything into structures where everything is stored in low level structures, not say XML or HTML documents, which can be reformed into XML for output. But one of the points of the web project is that well written semantic (X)HTML or XML is a neutral semantic data format that it is transformable into other forms if necessary, and does make a good base format. Otherwise you are just normalizing into a database structure instead.</p>

<p>The NPR data model has the following structure</p>

<blockquote>
  <p>We then attach “resources” to the story, each of which is its own object in the database (examples of resources include full text with each paragraph stored as distinct records, audio, video, images, related links, and a range of other object types)</p>
</blockquote>

<p>So no structure within a paragraph. No microformats, as we cannot mark up a name or place. No hyperlinks in running text, no equations or formulas. No rich text. Just boxouts, diagrams between paragraphs. It works, it can be very good, it is easier to implement, but its not as rich or excellent as truly structured, hierarchical, denormalized content can be.</p>

<p>Fundamentally the structural of the story in the NPR model is flat. There is hierarchy above the story, but it is flat within it. For some types of application, perhaps like the NPRs journalistic style this can work. I have worked with clients who do not want hypertext, just boxed lists of links, often as a stylistic issue. It has a certain formality, and the unbroken text has a traditional look, more Britannica than Wikipedia. The NPR is extreme as the model does not even as far as I have seen allow a word to be emphasised; emphasis is not however presentation layer information, it is semantic information that can be conveyed in pretty much every form of human communication.</p>

<p>Maybe it is because even for print I have worked often with richly substructured texts, or maybe just because I appreciate the browsability of well structured hypertext, flat text is not my preference.</p>

<p>Once you give in to stucture within the paragraph level, the normalization question becomes more interesting. You can still normalize items below a paragraph level but it becomes less satisfactory, as the objects start to lose context and stop being reusable on their own, divided into increasingly small units. There are still important normalization issues, as just because the document format is structured does not mean that everything should be embedded into it; items that are reusable within a document should be nested by reference so they can be reused elsewhere, so you end up with a tree of components for a document rather than the flat NPR structure of a list of components; just a few extra levels of structure, as normal articles are not very heavily tree structured, and just a bit more processing to turn these into items for the API and content output layers.</p>

<p>Daniel raises more issues</p>

<blockquote>
  <p>First, as an example, the image references within the block of text will contain HTML and possibly other markup, making the text block dirty. Any distribution to other platforms could then require special treatment to prepare the content for that destination. More importantly, however, is the fact that these same images are very difficult to repurpose because they are embedded in text. So, it would be quite a challenge to make a feed of images, to identify only those posts that contain images, to resize some or all images in the system, or to consistently restrict distribution of images that do not have the rights cleared.</p>
</blockquote>

<p>Now some of these points are important. Journalists were traditionally encouraged to write as if there were no pictures, which always struck me as odd as a child, describing in 1000 words what is in a picture right next to the story, but content has always been repurposed, or reprinted, or read out. But the sizing issues are easily dealt with by linking to a class of images not a particular size, and letting the output process choose the size. And the system can understand the content types being used, such as HTML or XML and say strip out an image feed, or answer questions about the images. HTML (and SGML and XML) have been designed as one of the first presentation independent structured, semantic, content description languages to solve many problems that simpler, flatter representations did not solve, initially being developed for struuctured document applications.</p>

<p>Of course not any old HTML, say, is a good choice. It needs to be valid, and there must be agreed restrictions. These could, but need not, be a  minimal, NPR style, only paragraphs, block quotes, lists, UTF8, no HTML entities, or they could allow more structure but still not allow presentational elements. This contract of what is acceptable because it is not presentation layer and it has transform rules to the output layers, must be strongly enforced. Authoring tools still need work; especially tools for structured, denormalized content need more support.</p>

<p>Overall then, web content management, and structured content mnagement needs to embrace the document types that the web has adopted. Native HTML and XML storage is not the wrong approach. Yes there are issues and complexities that have to be addressed, and addressed <em>really well</em> because if done wrong they can hugely hinder flexible reuse and repurposing, but if done right can enable a rich, expressive, hypertext, denormalized content world.</p>

<p><a href="http://blog.programmableweb.com/wp-content/npr_architecture_diagram.jpg"><img src="http://blog.programmableweb.com/wp-content/npr_architecture_diagram.jpg" width="480"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/11/id-love-to-stay-here-and-be-normal-but-its-just-so-overrated-or-how-i-learned-to-stop-worrying-and-love-html/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>My history in content management</title>
		<link>http://blog.technologyofcontent.com/2009/09/my-history-in-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2009/09/my-history-in-content-management/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 13:19:34 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[#cmsorigins history craft]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=104</guid>
		<description><![CDATA[How I got into content management, way back in the 1980s...]]></description>
			<content:encoded><![CDATA[<p><a href="http://wordofpie.com/2009/09/08/my-first-content-management-application/">Laurence Hart started a thread about how people got into content management</a> which has been <a href="http://search.twitter.com/search?q=%23cmsorigins">picked up quite a bit</a>. It is always interesting to see how people got to what they do now. Most of them have a fair amount of randomness, content management being one of those new fields that people might not know is interesting.</p>

<p>Mine started way back in around 1986 when I started as a fairly accidental part time job working on <a href="http://openlibrary.org/b/OL722527M/new_Langwill_index">the New Langwill Index</a>, a historical encyclopaedia of wind instrument makers. We had a shiny new IBM XT with a hard drive, a copy of Ventura publisher, running under GEM, WordStar, and a first edition Apple Laserwriter, in an office at the back of an antique shop.</p>

<p>The tools were not ideal for the job it turned out. This type of book consists of a lot of short entries, and some longer ones, with a lot of linking between entries, and a lot of indexes, by place, type and so on; I dont have my copy to hand but I think about a third of the book was indexes. I wrote a fair amount of code to try to content manage it, as well as for working on markup and indexes, aside from learning to proof read, and learning to program PostScript.</p>

<p>The problems stuck with me: the basic content management issue of managing updates and the implications of them as entries were merged or split as <a href="http://www.guardian.co.uk/news/2007/nov/09/guardianobituaries.obituaries">William Waterhouse</a> did more research. There were endless issues about place names, as during the period covered eastern europe changed between Germanic names and local ones, Breslau and Wrocław or merged like Buda and Pest, and borders moved around, and all of the shifts had to be reflected in the indexing. And it was all very much hypertext, with complex linkages of dynasty and business and references and sources.</p>

<p>Over the next few years, through and after University, I kept this interest, the idea that there needed to be better ways and tools of working on this kind of problem. I discovered SGML in the early 1990s &ndash; my copy of <a href="http://www.amazon.com/SGML-Handbook-Charles-F-Goldfarb/dp/0198537379/ref=sr_1_4?ie=UTF8&amp;s=books&amp;qid=1252846101&amp;sr=1-4">Goldfarb</a> is sitting in the office, and I have n assortment of early 1990s books about hypertext. I nearly started a business around SGML then too, but ended up accidentally going in other directions, via financial reporting applications, and did not get back into content management until much later.</p>

<p>The other accidental discovery which lingered in my memory was the world of musical instrument making, which was an interesting mixture of craft and small scale manufacturing, like many industries in nineteenth century Europe. Much of it was in Bohemia, and other areas of central Europe, across too Switzerland. Communism and war destroyed most of this, moving to mass production. But these craft methods, mass customization, and particularly the ways of training people in skills that have deep artistic and technical needs is very relevant to the knowledge economy. Last year <a href="http://www.amazon.com/Craftsman-Prof-Richard-Sennett/dp/0300119097">Richard Sennetts excellent book The Craftsman</a> finally related coding on Linux to this tradition, in a compelling book that I recommend anyone who manages coders or has an interest in history reads.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/09/my-history-in-content-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
