<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; justin</title>
	<atom:link href="http://blog.technologyofcontent.com/author/justin/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 29 Jan 2012 16:38:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Data-driven documents talk</title>
		<link>http://blog.technologyofcontent.com/2012/01/data-driven-documents-talk/</link>
		<comments>http://blog.technologyofcontent.com/2012/01/data-driven-documents-talk/#comments</comments>
		<pubDate>Sun, 29 Jan 2012 16:37:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[d3]]></category>
		<category><![CDATA[javascript]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=393</guid>
		<description><![CDATA[A few days back I gave a talk about D3.js at London Web Standards. The video of the talk is available as are the talk slides. The slides are freely available to re-use, and you can view the source to see what is going on, I have tried to keep everything inline in the slides [...]]]></description>
			<content:encoded><![CDATA[<p>A few days back I gave a talk about <a href="http://mbostock.github.com/d3/">D3.js</a> at <a href="http://www.londonwebstandards.org/">London Web Standards</a>. The <a href="http://vimeo.com/35580586">video of the talk is available</a> as are the <a href="http://lws.node3.org">talk slides</a>. The slides are freely available to re-use, and you can view the source to see what is going on, I have tried to keep everything inline in the slides to make it easy to understand. I used the <a href="http://slides.html5rocks.com/">HTML5 rocks</a> slides as a basis, which worked very well. You need to click on some of the slides to make stuff happen&#8230;</p>

<p>As I said in the talk, D3 is really great to use, as it is about code not configuration, and lets you really do what you want. It is purely based on working with data and the DOM, without other abstractions layers. I hightly recommend it!</p>

<p>One thing I mentioned in the talk was using Google spreadsheets for storing arbitrary data that doesn&#8217;t already have somewhere to live. I have found this worked out quite well with clients who want an easy &#8220;CMS&#8221; for data that they can update and feed straight through to graphs and maps etc on the website. A few people asked about this, so I thought I would give a quick guide.</p>

<p>I have made a public Google spreadsheet <a href="https://docs.google.com/spreadsheet/ccc?key=0Agr9u3lfdOGHdHZoc1dVYXhDdl9GSk5SbVRBSGZYRFE">for these examples</a>. Will probably get messed up as it is public!</p>

<p>The easy way to use this is just to use CSV, as once you make the spreadsheet public, the interface will show you the CSV URL (of the first sheet only, regardless of the setting). In this case <a href="https://docs.google.com/spreadsheet/pub?hl=en_US&amp;hl=en_US&amp;key=0Agr9u3lfdOGHdHZoc1dVYXhDdl9GSk5SbVRBSGZYRFE&amp;single=true&amp;gid=0&amp;output=csv">the link is https://docs.google.com/spreadsheet/pub?hl=en_US&amp;hl=en_US&amp;key=0Agr9u3lfdOGHdHZoc1dVYXhDdl9GSk5SbVRBSGZYRFE&amp;single=true&amp;gid=0&amp;output=csv</a> which gave me the following:</p>

<p><pre>
date,colour,value,genre,french
2011-01-01,red,2,rock,roches
2011-01-02,blue,1,grunge,grunge
2011-01-03,orange,66,pop,pop
2011-01-04,green,4,metal,métallique
2011-01-05,pink,6,jazz,le jazz
2011-01-06,brown,5,punk,Punk
2011-01-07,purple,33,rap,Rap
2011-01-08,black,12,reggae,reggae
2011-01-09,white,11,alternative,alternatives
2011-01-10,gray,5,dance,danse
2011-01-11,violet,14,country,pays
2011-01-12,yellow,9,blues,le blues
2011-01-13,indigo,10,funk,funk
</pre></p>

<p>This is clearly usable in D3, but it does have cross domain issues in most browsers, so you would normally have to proxy it back to your domain if you are calling via Ajax.</p>

<p>The options in the &#8220;publish to the web&#8221; menu for giving you URLs are HTML, CSV, text, PDF, XLS or ODS, none of which are what we really want, which is JSONP. However there is a JSONP option, it is just a very different URL, which is designed to be got through the API. The problem with the API is all the calls are authenticated so it is a real pain to just extract a (public!) URL. After hacking around with the Python client example code, I found a (messy) workaround however.</p>

<p>Every spreadsheet on Google has an ID, and each sheet within it also has an ID. These do not correspond to the keys in the public access links. You can get the spreadsheet ID from within the Javascript API available for spreadsheets, as <code>SpreadsheetApp.getActiveSpreadsheet().getId()</code>. You cannot get the sheet ID, but the first sheet you create always has the ID <code>od6</code>, the next <code>od7</code>, and then some other values that do not seem to change. So to write a little function that gives you the spreadsheet JSONP URL, do the following:</p>

<p>&lt;</p>

<p>ol>
<li>Go to Tools / Script Editor from your spreadsheet</li>
<li>Add the following function:
<pre>
function getURL() {
  return "http://spreadsheets.google.com/feeds/list/" + SpreadsheetApp.getActiveSpreadsheet().getId() + "/" + "od6/public/values/?alt=json-in-script&amp;callback=";
}
</pre>
</li>
<li>Save the script</li>
<li>You can now use <code>=getURL()</code> as a formula in the spreadsheet which will return the URL, in this case <code>http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values?alt=json-in-script&amp;callback=</code> you just need to add your callback name.</li></p>

<p>If we open that URL with a callback of <code>test</code> we get back:
<pre>
// API callback
test(
{ "encoding" : "UTF-8",
  "feed" : { "author" : [ { "email" : { "$t" : "justin@specialbusservice.com" },
            "name" : { "$t" : "justin" }
          } ],
      "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
            "term" : "http://schemas.google.com/spreadsheets/2006#list"
          } ],
      "entry" : [ { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: red, value: 2, genre: rock, french: roches",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "red" },
            "gsx$date" : { "$t" : "1/1/2011" },
            "gsx$french" : { "$t" : "roches" },
            "gsx$genre" : { "$t" : "rock" },
            "gsx$value" : { "$t" : "2" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cokwr" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cokwr",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/1/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: blue, value: 1, genre: grunge, french: grunge",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "blue" },
            "gsx$date" : { "$t" : "1/2/2011" },
            "gsx$french" : { "$t" : "grunge" },
            "gsx$genre" : { "$t" : "grunge" },
            "gsx$value" : { "$t" : "1" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cpzh4" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cpzh4",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/2/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: orange, value: 66, genre: pop, french: pop",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "orange" },
            "gsx$date" : { "$t" : "1/3/2011" },
            "gsx$french" : { "$t" : "pop" },
            "gsx$genre" : { "$t" : "pop" },
            "gsx$value" : { "$t" : "66" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cre1l" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cre1l",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/3/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: green, value: 4, genre: metal, french: métallique",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "green" },
            "gsx$date" : { "$t" : "1/4/2011" },
            "gsx$french" : { "$t" : "métallique" },
            "gsx$genre" : { "$t" : "metal" },
            "gsx$value" : { "$t" : "4" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/chk2m" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/chk2m",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/4/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: pink, value: 6, genre: jazz, french: le jazz",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "pink" },
            "gsx$date" : { "$t" : "1/5/2011" },
            "gsx$french" : { "$t" : "le jazz" },
            "gsx$genre" : { "$t" : "jazz" },
            "gsx$value" : { "$t" : "6" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ciyn3" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ciyn3",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/5/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: brown, value: 5, genre: punk, french: Punk",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "brown" },
            "gsx$date" : { "$t" : "1/6/2011" },
            "gsx$french" : { "$t" : "Punk" },
            "gsx$genre" : { "$t" : "punk" },
            "gsx$value" : { "$t" : "5" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ckd7g" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ckd7g",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/6/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: purple, value: 33, genre: rap, french: Rap",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "purple" },
            "gsx$date" : { "$t" : "1/7/2011" },
            "gsx$french" : { "$t" : "Rap" },
            "gsx$genre" : { "$t" : "rap" },
            "gsx$value" : { "$t" : "33" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/clrrx" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/clrrx",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/7/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: black, value: 12, genre: reggae, french: reggae",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "black" },
            "gsx$date" : { "$t" : "1/8/2011" },
            "gsx$french" : { "$t" : "reggae" },
            "gsx$genre" : { "$t" : "reggae" },
            "gsx$value" : { "$t" : "12" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cyevm" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cyevm",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/8/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: white, value: 11, genre: alternative, french: alternatives",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "white" },
            "gsx$date" : { "$t" : "1/9/2011" },
            "gsx$french" : { "$t" : "alternatives" },
            "gsx$genre" : { "$t" : "alternative" },
            "gsx$value" : { "$t" : "11" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cztg3" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cztg3",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/9/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: gray, value: 5, genre: dance, french: danse",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "gray" },
            "gsx$date" : { "$t" : "1/10/2011" },
            "gsx$french" : { "$t" : "danse" },
            "gsx$genre" : { "$t" : "dance" },
            "gsx$value" : { "$t" : "5" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d180g" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d180g",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/10/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: violet, value: 14, genre: country, french: pays",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "violet" },
            "gsx$date" : { "$t" : "1/11/2011" },
            "gsx$french" : { "$t" : "pays" },
            "gsx$genre" : { "$t" : "country" },
            "gsx$value" : { "$t" : "14" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d2mkx" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d2mkx",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/11/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: yellow, value: 9, genre: blues, french: le blues",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "yellow" },
            "gsx$date" : { "$t" : "1/12/2011" },
            "gsx$french" : { "$t" : "le blues" },
            "gsx$genre" : { "$t" : "blues" },
            "gsx$value" : { "$t" : "9" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cssly" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cssly",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/12/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: indigo, value: 10, genre: funk, french: funk",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "indigo" },
            "gsx$date" : { "$t" : "1/13/2011" },
            "gsx$french" : { "$t" : "funk" },
            "gsx$genre" : { "$t" : "funk" },
            "gsx$value" : { "$t" : "10" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cu76f" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cu76f",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/13/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          }
        ],
      "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values" },
      "link" : [ { "href" : "https://spreadsheets.google.com/pub?key=tvhsWUaxCv_FJNRmTAHfXDQ",
            "rel" : "alternate",
            "type" : "text/html"
          },
          { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values",
            "rel" : "http://schemas.google.com/g/2005#feed",
            "type" : "application/atom+xml"
          },
          { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values?alt=json-in-script",
            "rel" : "self",
            "type" : "application/atom+xml"
          }
        ],
      "openSearch$startIndex" : { "$t" : "1" },
      "openSearch$totalResults" : { "$t" : "13" },
      "title" : { "$t" : "data",
          "type" : "text"
        },
      "updated" : { "$t" : "2012-01-29T15:22:57.368Z" },
      "xmlns" : "http://www.w3.org/2005/Atom",
      "xmlns$gsx" : "http://schemas.google.com/spreadsheets/2006/extended",
      "xmlns$openSearch" : "http://a9.com/-/spec/opensearchrss/1.0/"
    },
  "version" : "1.0"
}
);
</pre></p>

<p>Now you may think that is odd JSON, but it is a literal JSON conversion of ATOM, which is why it has a lot of apparent junk. The bit we want is the array of objects in <code>feed.entry</code> which has among other entries
<pre>
[         { ...
            "gsx$colour" : { "$t" : "red" },
            "gsx$date" : { "$t" : "1/1/2011" },
            "gsx$french" : { "$t" : "roches" },
            "gsx$genre" : { "$t" : "rock" },
            "gsx$value" : { "$t" : "2" },...
          }, ...
]
</pre>
Each column heading will be preceded by <code>gsx$</code>, and have a <code>$t</code> value, so you just need your JSONP callback function to iterate over <code>feed.entry</code> and parse <code>gsx$date.$t</code> and so on, which is pretty simple. This then solves the cross domain issues and you can easily call this direct from the page to render a graph in a callback without any server side code at all.</p>

<p>Bonkers, you may think. But it does work, and what you get is a reliable easy to use http addressible tabular data source. Really, the whole of Google docs is like this, half genius half biscuit. It could be a great service, but almost seems to be wrapped up in legacy already. It could be so much better, but improvement seems slow; the work seems to have gone into the actual editing side not the backend, which is also saddled with Excel compatibility goals. A lot of scope for competition in this space.</p>

<p>Anyway, I hope that helps. I really do think for many applications it is a great backend system, if you hide the complexity from the end users!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2012/01/data-driven-documents-talk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Re-architecting for the ‘green’ cloud and lower costs</title>
		<link>http://blog.technologyofcontent.com/2011/07/re-architecting-for-the-%e2%80%98green%e2%80%99-cloud-and-lower-costs/</link>
		<comments>http://blog.technologyofcontent.com/2011/07/re-architecting-for-the-%e2%80%98green%e2%80%99-cloud-and-lower-costs/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 17:56:41 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[cloud]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=381</guid>
		<description><![CDATA[These are my slides from my talk at London Cloudcamp, 6 July 2011. I will write up some more on this later.]]></description>
			<content:encoded><![CDATA[<p>These are my slides from my talk at <a href="http://www.cloudcamp.org/london/2011-07-06">London Cloudcamp</a>, 6 July 2011. I will write up some more on this later.</p>

<p><img alt="" src="http://public.edge3.org/slides-0.png" title="Slide 0" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-1.png" title="Slide 1" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-2.png" title="Slide 2" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-3.png" title="Slide 3" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-4.png" title="Slide 4" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-5.png" title="Slide 5" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-6.png" title="Slide 6" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-7.png" title="Slide 7" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-8.png" title="Slide 8" class="aligncenter" width="480" />
<img alt="" src="http://public.edge3.org/slides-9.png" title="Slide 9" class="aligncenter" width="480" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/07/re-architecting-for-the-%e2%80%98green%e2%80%99-cloud-and-lower-costs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>This blog is now ipv6 enabled</title>
		<link>http://blog.technologyofcontent.com/2011/05/this-blog-is-now-ipv6-enabled/</link>
		<comments>http://blog.technologyofcontent.com/2011/05/this-blog-is-now-ipv6-enabled/#comments</comments>
		<pubDate>Fri, 27 May 2011 14:28:30 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ipv6]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=353</guid>
		<description><![CDATA[I remembered this morning that World IPv6 day is coming up soon, on June 8, so I thought I might get ready in advance, and let any of you retro old tech IPv4 people know how you need to get up to speed. I actually had native IPv6 ADSL many years ago, with the now [...]]]></description>
			<content:encoded><![CDATA[<p>I remembered this morning that <a href="http://isoc.org/wp/worldipv6day/">World IPv6 day</a> is coming up soon, on June 8, so I thought I might get ready in advance, and let any of you retro old tech IPv4 people know how you need to get up to speed.</p>

<p>I actually had native IPv6 ADSL many years ago, with the now defunct <a href="http://www.mythic-beasts.com/bcn.html">Black Cat Networks</a>, although at some point ipv6 routing just stopped working, and actually apart from being able to ping6 my friend who had the same setup it was not terrible useful, although I did test all my code and make sure it was all ipv6 ready. But now there are actually compelling reasons for IPv6, like the fact it is getting harder to get IPv4 addresses. In particular I have been spending some time on Linux containers (lxc), and if you want to run network isolated containers they need addresses, and if you are doing this on many cloud services you cannot get extra IPv4 addresses, so either you use internal NAT, yuk, or use IPv6. Also the first steps towards Amazon web services <a href="http://aws.amazon.com/about-aws/whats-new/2011/05/24/elb-ipv6-zoneapex-securitygroups/">supporting IPv6</a> came out the other day, with ELB supporting IPv6 termination, although there is not yet support for IPv6 addresses on EC2 instances directly, although you can add tunnels; see <a href="http://www.quora.com/Does-Amazon-Web-Services-support-IPv6">my answer on Quora for the current status</a>.</p>

<p>So how do you go about getting yourself IPv6 enabled? First this blog, which is currently hosted on <a href="http://www.slicehost.com/">Slicehost</a>, which does not yet have native IPv6 support, although they will have later in the year, as part of their full merger into Rackspace. I keep wondering about moving to Linode as it is cheaper, although the way they <a href="http://www.linode.com/IPv6/">have set up IPv6 seems odd to me</a>.</p>

<p>Stage one is signing up with someone who will give you an IPv6 tunnel to an IPv4 address. I would recommend using Hurricane Electric, as they have global tunnel endpoints and a good service. <a href="http://tunnelbroker.net/">Sign up here</a>, and you can have 5 free tunnels. It is very simple, just provide the IPv4 endpoint, and your details, choose an endpoint near your server, and you will get given an IPv6 /64 address and some helpful instructions as to how to configure it based on your OS. They had an endpoint in Dallas where my Slicehost is, which has a ping time of 1ms, which is nice. I used the modern style <code>ip</code> setup, which I just added to <code>/etc/rc.local</code> so it is recreated on reboot:</p>

<pre><code>modprobe ipv6 
ip tunnel add he-ipv6 mode sit remote 216.218.224.42 local 67.23.6.148 ttl 255
ip link set he-ipv6 up
ip addr add 2001:470:1f0e:6e6::2/64 dev he-ipv6
ip route add ::/0 dev he-ipv6
</code></pre>

<p>Very easy, no authentication or anything as it is all fixed by IP address. My IP is <code>2001:470:1f0e:6e6::2</code>. Whats the /64 you ask? Thats a block of 18,446,744,073,709,551,616 IP addresses, minus the first and last which are not used. That is generally the smallest globally routable allocation, and should be find for an organization to allocate to their users and devices, including virtual devices and so on; it also makes it easy to autoconfigure devices, as they can use their MAC addresses or random numbers to pick an address within the allocated block, so configuration like DHCP is not generally needed.</p>

<p>Next we need to configure DNS records for IPv6, that is AAAA records in addition to the A records that give IPv4 addresses. If your DNS provider does not support AAAA records it is probably time to get a new one! I use <a href="http://www.zerigo.com">Zerigo</a> for my Blog domain, and it was very easy to add records pointing at <code>2001:470:1f0e:6e6::2</code>.</p>

<p>After that you need to make sure your services are listening on IPv6. <code>netstat</code> lists IPv4 only listening sockets as <code>*:*</code>, while IPv6 sockets, which can listen on IPv4 addresses too as there is a legacy mapping, are listed as <code>[::]:*</code> instead. In my case, I had to change Nginx to listen on IPv6 too, with the config being <code>listen [::]:80;</code> instead of <code>listen :80;</code>. Restart the web server, and then you can use the <a href="http://ipv6-test.com/validate.php">IPv6 test site validator</a> to check for you that everything is up and ready, and get your badge:</p>

<p><!-- IPv6-test.com button BEGIN -->
<a href='http://ipv6-test.com/validate.php?url=referer'><img src='http://ipv6-test.com/button-ipv6-small.png' alt='ipv6 ready' title='ipv6 ready' border='0' /></a>
<!-- IPv6-test.com button END --></p>

<p>All very well, but you really want to test it yourself don&#8217;t you. And of course you have other software you need to test on IPv6, so you want to set up your desktop machine with IPv6 too. Time to make another tunnel. Now this is one of those cases where maybe you do want to try this at home. Or at least tell your sysadmin before setting it up in the office. With IPv6 machines are directly addressable on the internet, and the tunnel probably bypasses any firewalling. WIth all the IPv6 addresses, portscanning is really difficult, unlike IPv4, but you should treat machines as connected and keep them secure, which of course you do at home, but judging from the amount of IE6 around does not happen at work so much.</p>

<p>Now my home connection is ADSL with a not very good modem, that I have been meaning to replace with <a href="http://www.draytek.co.uk/products/vigor120.html">a proper one</a>. The main issue is I don&#8217;t really know how stable my IP is, so just going to have to experiment. If I had a UK server I could make my own tunnel I suppose, but will see how I manage with a Hurricane Electric tunnel which I will have to recreate if the IPv4 address changes, possibly with a new IPv6 address too. My guess is the IPv4 address will be pretty stable, it was the same over a box reset just now. Also Hurricane Electric have an <a href="http://ipv4.tunnelbroker.net/ipv4_end.php">API call</a> to change your tunnel address! They also have an endpoint in London, 20ms away from me.</p>

<p>First issue was that Hurricane Electric complained it could not ping my IP address. I looked through the config options on the O2 ADSL box (really a Speedtouch WL780) and there was nothing, so the internet found the answer. <code>telnet</code> to the box, login as <code>SuperUser</code> with password <code>O2Br0ad64nd</code> and type <code>service system ifadd name=PING_RESPONDER group=wan</code>, then <code>saveall</code> to write the config to flash and all will be well. There is a <a href="http://www.elion.ee/docs/abi_info/kasiraamat/speedtouch_st706wl-780wl_cli.pdf">manual for the router online</a>.</p>

<p>Next realization though was that I need to get the ADSL NAT to forward the encapsulated IPv6 traffic to my Linux box, as the router has no ability to do IPv6 itself. At some point I will move the termination to my nice ultra low power dual core 1GHz ARM <a href="http://trimslice.com/web/">Trimslice</a> box. So back to the <code>telnet</code> and some helpful guides, <a href="http://www.hetlab.tk/kostunrix/ipv6-tunnel-werkt">all written in Dutch</a>, thanks Chrome for the translations. Make sure you don&#8217;t translate the router commands! Replace YOURNATIP with the NAT address of your IPv6 endpoint.</p>

<pre><code>expr add name=IPv6to4_prot_41 type=serv proto=41
firewall rule add chain=forward_host_service index=10 name=map_41 serv=IPv6to4_prot_41 log=enabled state=enabled action=accept
nat tmpladd group=wan type=nat outside_addr=0.0.0.1 inside_addr=YOURNATIP protocol=6to4 weight=50
saveall
</code></pre>

<p>Then set up the Hurricane Electric tunnel on the endpoint, using the NAT address:</p>

<pre><code>modprobe ipv6
ip tunnel add he-ipv6 mode sit remote 216.66.80.26 local 192.168.1.77 ttl 255
ip link set he-ipv6 up
ip addr add 2001:470:1f08:1962::2/64 dev he-ipv6
ip route add ::/0 dev he-ipv6
</code></pre>

<p>Now you should be able to <code>ping6</code> inbound, and outbound to your IPv6 address, and reach <code>ipv6.google.com</code> for example. Going to an IPv6 test site such as <a href="http://test-ipv6.com/">http://test-ipv6.com/</a> finds the IPv4 and IPv6 addresses. It does point out that the name server I am using does not have any IPv6 support though, which could cause issues with IPv6 only sites with IPv6 only name servers. Not going to fix that for the moment though. So we can browse the web on IPv6, at least, although it is quite hard to tell if you actually are. You can check your web server logs, IPv4 addresses are listed in IPv6 notation as <code>::ffff:188.220.243.64</code>, while the real IPv6 we are browsing from are given in full, so we can see that Firefox for example will use IPv6 addresses by default when available.</p>

<p>One more thing is the rest of the machines on the network at home (or in the office!). We have our /64 allocation, so lets use it. All you need to do is run a server that will reply to the stateless autoconfiguration requests, which is <code>radvd</code> on Linux:</p>

<pre><code>apt-get install radvd
sysctl -w net.ipv6.conf.all.forwarding=1
</code></pre>

<p>Set the <code>sysctl</code> setting in <code>/etc/sysctl.conf</code> too so it is persistent. This stops you receiving IPv6 router advertisements and lets you forward IPv6. Then create an <code>/etc/radvd.conf</code> for the interface you want to broadcast on, yes mine is <code>br0</code> as I have an bridge setup for my VMs.</p>

<pre><code> interface br0
{
  AdvSendAdvert on;
  MaxRtrAdvInterval 4;
  prefix 2001:470:1f09:1962::/64
  {
    AdvRouterAddr on;
  };
};   
</code></pre>

<p>Now you can start <code>radvd</code> and set up the service to be auto restarted. And then by magic any computers in the vicinity that do not have IPv6 disabled will automagically get an IPv6 address, based on their MAC address and your prefix. Magic! My Linux netbook got one, but I could not ping anything. First issue was that my bridge (or in your case your network interface) did not have an IPv6 address, so I added one with <code>ip addr add 2001:470:1f09:1962::2/64 dev br0</code>. Now I could <code>ping6</code> that, but no external routes worked. It turns out that IPv6 forwarding is all very well, but you need explicit routing tables too. So add <code>ip -6 route add 2000::/3 via 2001:470:1f08:1962::1</code>, where <code>2000::/3</code> is all global routes, and the <code>2001:470:1f08:1962::1</code> is the other end of your IP6 tunnel. Sorted! Well kind of, both the machines, Linux and Mac, that I am testing with are constantly losing routing and ability to <code>ping6</code> the gateway. Odd, the gateway machine is fine. OK fixed, see <a href="http://www.tunnelbroker.net/forums/index.php?topic=159.0">this forum page</a>, actually the link address is not the allocated routing address. <code>2001:470:1f08:1962::/64</code> is the network of the point to point link, but my routed network is the one after, that is <code>2001:470:1f09:1962::/64</code>. This is listed when you login to Hurricane Electric again. I have changed the configs above now. Everything just working as expected on all the computers.</p>

<p>Note that you should be able to do all this with a Mac OSX machine or Windows machine as the router or server, the setup will be a little different, but Hurricane Electric should give you the help, along with your OS vendor. As for your routers, you may have to read some Dutch too!</p>

<p>So there you are, a quick guide to getting started with IPv6. Happy protocol upgrades!</p>

<p>Note: I have now moved my blog to <a href="http://www.hetzner.de">Hetzner</a> which has native IPv6 availability, which makes life much simpler. If you are selecting a new provider, always choose one that supports IPv6! Rackspace/Slicehost my former provider seems rather behind the times.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/05/this-blog-is-now-ipv6-enabled/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scripting languages grow up</title>
		<link>http://blog.technologyofcontent.com/2011/05/scripting-languages-grow-up/</link>
		<comments>http://blog.technologyofcontent.com/2011/05/scripting-languages-grow-up/#comments</comments>
		<pubDate>Sun, 22 May 2011 20:08:29 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=345</guid>
		<description><![CDATA[There is a lot of focus on APIs, often REST APIs, but one aspect of API design, and code design that is sometimes missed is scripting. Actually one of the REST design principles is to use mobile code in the right places, but we usually only see that in the situation of Javascript being sent [...]]]></description>
			<content:encoded><![CDATA[<p>There is a lot of focus on APIs, often REST APIs, but one aspect of API design, and code design that is sometimes missed is scripting. Actually one of the REST design principles is to use mobile code in the right places, but we usually only see that in the situation of Javascript being sent to browser clients.</p>

<p>General purpose scripting languages have been with us for a long time, since <a href="http://en.wikipedia.org/wiki/REXX">Rexx</a> was started in 1979 and Tcl in 1988, followed by Lua in 1993; depending on what you define as a scripting language, as you might include IBM&#8217;s Job Control Language, or shells such as ksh. Indeed there is some confusion in the <a href="http://en.wikipedia.org/wiki/Scripting_language">Wikipedia article</a> about what they really are, and the term has been used for all sorts of languages.</p>

<p>A scripting language is a language that is designed to either control an external program or set of programs, providing an easy way to construct compositions of the components being provided by the library or toolkit, or the opposite construction, to be embedded inside another language to control parts of the application in a dynamic way, like Javascript controls rendering in a browser. These are normally referred to as extending and embedding.</p>

<p>The reasons for the two different ways vary, with extending being for use cases like shell scripting where there is a large library of programs with a uniform interface which can be composed to perform a multitude of tasks. The performance of the shell script language is not particularly important is it is mostly just building system level compositions, such as pipes, or making simple conditions and loops. Other examples though are more complex, and the successful general purpose scripting languages have a full set of programming language types and constructs, including first class functions, inheritance and so on; languages like shell scripts and Tcl which only have a single string type staying limited to smaller areas. Essentially these  become domain specific languages (DSLs) for working in a paarticular domain. Structuring your code so that it acts as a set of libraries for the domain, while embedding these in a scripting language that does the plumbing is a good way of creating a flexible design, and indeed if you cannot refactor your application as independent libraries with a scripting glue then it is probably not very well designed.</p>

<p>The other way round, embedding, is <a href="http://www.twistedmatrix.com/users/glyph/rant/extendit.html">sometimes unfairly seen as a bad idea</a>, but in the right situations it makes a  lot of sense, for example for embedding a query language, or for avoiding server round trips by coordinating a set of commands, such as the way databases embed store procedure languages, or the recent <a href="http://antirez.com/post/scripting-branch-released.html">embedding of Lua in Redis</a>.</p>

<p>Why not do everything in one language? The original reasons were that &#8220;real&#8221; programming languages were statically typed, compiled, and had terrible string handling (yes C, we are looking at you, a language which once had <code>gets</code>), while scripting languages had garbage collection, dynamic everything, interpreted environments with friendly errors, simple string libraries, and were extremely slow, maybe 100 times slower than C. They also used to have fairly poor module structuring, and other facilities for programming in the large. This has not stopped people building large projects using largely scripting languages (Vignette in Tcl being an early example), particularly with the LAMP stack which started off as a simple glue between a web server and a database, but has grown to much larger applications.</p>

<p>What has changed is a gradual convergence, as some of the more friendly features of scripting languages, such as garbage collection and better string libraries started to move into mainstream languages with Java, and the JIT compiler that really started gaining popularity with the JVM has recently been seriously applied to scripting languages, in particular Javascript, Lua and Python, which are making a bid for serious performance. LuaJIT now performs similarly or better than Java in many benchmarks, while PyPy and Javascript are rapidly getting within a small factor of Java. This does not mean that there are not still many places where static memory allocation and the low level guarantees of C are not useful, such as in database design and so on, and of course there are large libraries of existing, well tested software.</p>

<h2>Foreign functions</h2>

<p>Another gradual change is the development of FFI (foreign function interface) libraries. The original open source <a href="http://en.wikipedia.org/wiki/Libffi"><code>libffi</code></a> has been around since 1996, and Python has had <code>ctypes</code> for a long time too, but there have been issues with these bindings. While they are relatively easy to construct compared to writing a full C binding, they are messy in some languages, although the Ruby ones are fairly readable, a binding to <code>puts</code> in libc being defined with:</p>

<pre><code>module Foo
  extend FFI::Library
  ffi_lib FFI::Library::LIBC
  attach_function :puts, [ :string ], :int
end
</code></pre>

<p>The second problem after syntax, was performance. Most ffi bindings were very slow compared to native C bindings, by a factor that was significant for most use cases. However this has started to change too, first with the <a href="http://blog.segment7.net/2008/01/15/rubinius-foreign-function-interface">Rubinius bindings</a> from a few years back, which have a very small overhead, of just a function call which performs the necessary type conversion, and then more recently with the <a href="http://luajit.org/ext_ffi.html">LuaJIT FFI library</a> which not only has a usable syntax, as it has most of a C header file parser built in, it also is natively understood by the JIT compiler so it can generate code that has no overhead at all, actually less than the standard C Lua bindings.</p>

<p>So the following simple program that mainly executes a fast (virtual) system call:</p>

<pre><code>local ffi = require "ffi"

ffi.cdef[[
struct timeval {
  long tv_sec;
  long tv_usec;
};
int gettimeofday(struct timeval *tv, void *tz);
]]

local tv = ffi.new("struct timeval")

for i = 1,100000000 do
  ffi.C.gettimeofday(tv, nil)
  if tv.tv_usec == 0 then print "." end
end
</code></pre>

<p>generates the following assembly for the inner loop, where the <code>call r12</code> is a direct call to the libc syscall wrapper:</p>

<pre><code>-&gt;LOOP:
394cffd0  mov rdi, [rsp+0x8]
394cffd5  xor esi, esi
394cffd7  call r12
394cffda  cmp qword [rbx+0x10], +0x00
394cffdf  jz 0x394c0024 -&gt;5
394cffe5  add ebp, +0x01
394cffe8  cmp ebp, 0x05f5e100
394cffee  jle 0x394cffd0    -&gt;LOOP
394cfff0  jmp 0x394c0028    -&gt;6
</code></pre>

<p>And that runs marginally faster than the C equivalent, showing the advantages of an FFI library that is natively understood by the JIT compiler, as well as of course the analysis that goes behind making sure this is a valid optimisation, such as being able to register allocate the loop variable. It is also nice to see a scripting language generating nice assembler! LuaJIT is currently the fastest dynamic language available by a large margin.</p>

<p>So are there any disadvantages to a well designed FFI interface? Well, having been <a href="https://github.com/justincormack/ljsyscall">using the LuaJIT one for a while</a>, the issues are mostly with how some C code is written. A lot of C code is not well written, well encapsualted code. The ABI may depend on all sorts of conditions, not just the architecture, but also the build options of the program, all wrapped in <code>#ifdef</code> conditionals. Macros are used a lot, sometimes generating code for runtime, which of course then has to be reimplemented in the scripting language. The preprocessor is overused, with people rarely using <code>enum</code> instead, and C enums have odd semantics, as they are always cast ints. Scripting languages rareky if ever use stack allocation, while C libraries assume that it is often the norm, so libraries do not hide their internal structures with <code>void *</code> pointers that they heap allocate, which would often be much easier. Also C libraries have historically been written to support old versions of C, so without variable length arrays, and without the sized integer types such as <code>uint32</code> etc. So it can get messy to interface with, requiring a lot of testing and extra code wrappers to make things work well. Best to stick to well designed code if at all possible.</p>

<p><a href="http://www.flickr.com/photos/justincormack/2158917829/" title="After the rain by Justin Cormack, on Flickr"><img src="http://farm3.static.flickr.com/2175/2158917829_fee1a7f319.jpg" width="500" height="279" alt="After the rain"></a></p>

<h2>Structuring scriptable programs</h2>

<p>The easiest way to write code that is friendly to being scripted is if most of the code is structured as a set of libraries, exposing clean operations and with clear semantics for allocation and deallocation; a good way to write code anyway. For maximum portability, C is easier to interface with than C++, due to name mangling and C++ exceptions, as well as the fact that you may be interfacing to a language that does not really do object orientation in the way you might use it in C++. C++ exceptions have marginal support in FFI interfaces, and explicit error handling at the external interfaces is more easily usable. Don&#8217;t use thread local return values, like <code>errno</code> either. Callbacks in the same scripting VM may also be a problem in some environments, that is calls from the scripting language to C then back to a script callback. Generally owning the event loop in a library is annoying, and this is a case of that to some extent, it is easier in that case if you just call into user code, as in Node.js. These are generally sensible organisational principles for code anyway, so it should be possible to script any well written code.</p>

<p>Note I didnt really mention Java and .Net here. Most people largely only interface within their own runtime. While there are JCM and .Net versions of most programming languages, they are less well supported in general, and especially in the Java case very slow, often similar to the native interpreter, or a little faster but not as fast as native JIT compilers; although there have been recent improvements, the JVM is not currently friendly to dynamic languages. It is of course possible to expose a C API to Java code, through JNI, as it works both ways, to allow Java to be scripted from a non JVM scripting language, although it seems to be less common, and it also has heavier performance costs than the usual C to non-managed scripting language boundary. Of course the big advantage of staying within these frameworks is that calling other languages say within the JVM is very easy, as the runtime understands the calling conventions, so interoperability is very simple, so many Java programs offer Rhino scripting say, but it will be slow. Instead there are statically typed but more appropriate languages for DSLs that can be used, such as Scala or Clojure.</p>

<p>For calling into user code, for example as in the Redis scripting API, or the older but similar example of database stored procedure languages, another case where more standard scripting languages are now available such as <a href="http://pllua.projects.postgresql.org/">Lua in Postgres</a>, the aim is to make the exposed operations easy to work with, and to allow use of native language features, such as how iterators work, and to use <a href="http://en.wikipedia.org/wiki/Coroutine">coroutines</a> if that is appropriate, as for example in the <a href="https://github.com/chaoslawful/lua-nginx-module">Lua embedding in Nginx</a> which uses coroutines so that apparently synchronous code can be run asynchronously. The use cases are many, one to improve APIs so that you minimise round trips and moving what would be client side operations to the server, sometimes to implement atomic operations that could not be specified over an API in a performant way, as with stored procedures in a database. Another use is the Node.js case, to embed a very well known and usable language in some low level code (the asynchronous native library), an environment that would otherwise only be availabe in C. Many large programs are actually structured largely as scripting language layers, for example <a href="http://lua-users.org/lists/lua-l/2006-01/msg00111.html">over 40% of Adobe Lightroom code is written in Lua</a>, and much of Firefox is written in Javascript.</p>

<h2>Summarising</h2>

<p>There has been a huge effort in making scripting languages perform well. Performing well while interfacing in a really easy way to external code is another big step to making scripting a default part of the design of the majority of large scale code, as well as to help integrate the huge installed and tested codebases already out there. Scripting languages generally do not have an imperative to avoid externally programmed core code, unlike say Java, for compatibility reasons, or Go for simplicity reasons. This makes them excellent glue languages, combined with dynamic typing that tends to allow easy modification. If scripting languages were as fast as say C, there are still people who prefer to use statically typed languages with more deterministic compilation and potentially runtime guarantees, and some sorts of libraries are more likely to be available on some languages than others, skewing the choices. But if you are writing complex memory management code in C++, it could be time to switch parts of the code with unclear lifetimes to a scripting language. Mixing languages has never been easier, or more compelling.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/05/scripting-languages-grow-up/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Search, SQL, NoSQL, Persistence</title>
		<link>http://blog.technologyofcontent.com/2011/04/search-sql-nosql-persistence/</link>
		<comments>http://blog.technologyofcontent.com/2011/04/search-sql-nosql-persistence/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 14:01:20 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[cloud]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[nosql]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=330</guid>
		<description><![CDATA[I highly recommend the Enterprise Search London meetup, there are lots of interesting talks, thanks to our intrepid organizer Tyler Tate. Last meetup, H. Stefan Olafsson from Twigkit gave a short talk about the relation between relational databases and search engines, and whether you need a relational database if you have a search engine. Craigslist [...]]]></description>
			<content:encoded><![CDATA[<p>I highly recommend the <a href="http://www.meetup.com/es-london/events/17010043/">Enterprise Search London</a> meetup, there are lots of interesting talks, thanks to our intrepid organizer <a href="http://twitter.com/tylertate">Tyler Tate</a>. Last meetup, <a href="http://twitter.com/mrolafsson">H. Stefan Olafsson</a> from <a href="http://www.twigkit.com/">Twigkit</a> gave a short talk about the relation between relational databases and search engines, and whether you need a relational database if you have a search engine.</p>

<p><a href="http://xkcd.com/886/">
<figure>
<img src="http://imgs.xkcd.com/comics/craigslist_apartments.png" alt="Craigslist apartments" width="400"/>
<figcaption>Craigslist Apartments, by XKCD</figcaption>
</figure></a></p>

<p>Now this has been something  have been thinking about recently, and there are people who are moving big parts of their systems to just be built on search, such as the <a href="http://www.guardian.co.uk/open-platform">Guardian API</a> which is <a href="http://www.guardian.co.uk/open-platform/blog/what-is-powering-the-content-api">served from Apache Solr</a>. In this case though, the search engine is still not the system of record for the core data, which is still the Oracle based CMS which did not scale up enough to serve the API. There was some discussion at the talk about search engines that do support persistence (the D in ACID databases), something Lucene used to have a bad reputation for. My view here though is that, while actually making <code>fsync</code> work properly is a good thing, and you should not buy software that cannot recover from crashes, persistence involves a lot more than this now, such as replication, audit, versioning and access control. Building this directly into search products is a mistake. Another issue is that search engines are denormalized, and data stores of record should really be normalized to a large extent, to minimise the amount of data to be replicated.</p>

<p>There are two approaches that should work instead, however.</p>

<p>The first is more or less the current approach, to use the search engine as an index to a persistent store. I really like this approach if we follow it to its logical conclusion, which is that the persistent store in this type of application architecture should not be a relational database, but it should be a document store, that is a file, an HTTP resource, a document in a NoSQL document database, or an object in a replicated cloud storage system like S3. Modularize the database application, and split the persistence function from the index function. The persistence function provides the durability, versioning and audit and access control, with replication, backup. This can update the search index, and potentially any other types of index, such as a graph database for querying relationships, potentially even a relational database if that is the best way of querying some aspects of the data.</p>

<p>Obviously there is a potential consistency issue, if updates from the document store happen slowly, so potentially there is an eventual consistency model. Historically search was a bad offender here, as dynamic updates were not the norm and everything was batched into nightly updates, but that is going away and dynamic updates are more normal for search indexes. In principle you can have more consistency, especially in an architecture where there are fixed releases that can be consistently indexed, rather than distributed rolling updates, you choose your architecture and take your choice. Small consistency lags rarely matter in a lot of applications.</p>

<p>So you end up with an architecture with a well defined persistence layer that is not a relational database, and a set of indexes appropriate to the application, almost certainly including a full text search engine, but perhaps a graph engine too. Maybe you <a href="http://highscalability.com/blog/2011/4/6/netflix-run-consistency-checkers-all-the-time-to-fixup-trans.html">run consistency checks</a> on your indexes for peace of mind.</p>

<p>The second approach is to see that search engines were some of the original NoSQL data stores, building custom storage and indexing engines, because they had such difficult problems. Indeed Google&#8217;s BigTable, and so the ancestry of a lot of NoSQL products came from search. However the search engines around now have not yet refactored themselves on top of the NoSQL engines that have emerged from this work, although this is starting with <a href="http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/">Lucandra</a> which is Lucene persisted in Cassandra, which looks promising, offering seamless replication and distribution, and <a href="https://github.com/akkumar/hbasene">HBasene</a>, an HBase Lucene backend. These make a huge amount of sense to me, as if you are developing sophisticated search algorithms, not having to build the whole index and persistence layer as well is a big advantage, as well as the scale out potential. Of course this approach does not conflict with the first one, in fact you could choose a NoSQL backend that is aimed more at read performance than persistence, and at storing small index values fast. The hard bits with this are that the search engines have specifically customised their data storage for the particular use cases, and reworking this onto a more general backend has few apparent advantages; as you can see from the examples above, most of these changes have come from people already using the backends in question and who want a single database to manage all their data requirements, particularly once they are working with high availability and replication. Software modularity really is not at the right level yet is it, I blame object oriented programming for this lack of reusability.</p>

<p>Anyway, back to the main point. For applications like content management, an architecture based on a content store that deals with persistence, versioning, access control, replication, with a set of indexes based on search engine techniques, graph databases, and anything else your applications needs. Ideally the indexes are all based on a common set of low level primitives so the backend can be swapped out or shared between the search store and other application specific indexing requirements, so there is a single low level indexing infrastructure that can be available as a common scalable service, with different implementations available. This type of architecture is quite buildable now, and is certainly used in quite a few applications, and I think it will become much more widespread, particularly in the cloud where it seems more natural, certainly for many types of application that fit into a document type model, such as content based applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/04/search-sql-nosql-persistence/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Sandboxing for multi-tenant applications</title>
		<link>http://blog.technologyofcontent.com/2011/04/sandboxing-for-multi-tenant-applications/</link>
		<comments>http://blog.technologyofcontent.com/2011/04/sandboxing-for-multi-tenant-applications/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 11:44:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[cloud]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[multitenancy]]></category>
		<category><![CDATA[multitenant]]></category>
		<category><![CDATA[PAAS]]></category>
		<category><![CDATA[SAAS]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=313</guid>
		<description><![CDATA[Overview of sandboxing techniques in Linux]]></description>
			<content:encoded><![CDATA[<p>If you are building a SAAS application it naturally supports multiple tenants; if you are building a PAAS platform it may well do too. Multitenancy may even go all the way down, maybe you are building a SAAS application on a PAAS platform on IAAS. Most of the writing on sandboxing is around desktop applications or browser sandboxing, so I thought it would be helpful to write a survey from a more cloud point of view, as most of the cloud writing seems to be about database issues. I also did not find an overview of all the solutions in one place for comparison. Note that I am not a security professional, although I have that kind of devious thought process and I am fairly good at finding security holes in applications, so you should take professional advice. I am also only going to cover Linux systems here; there is enough to cover without going further afield. The solutions are similar on other platforms but of course the differences are important.</p>

<p><a href="http://dilbert.com/strips/comic/2009-11-19/" title="Dilbert.com"><img width="480" src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/70000/4000/100/74150/74150.strip.gif" border="0" alt="Dilbert.com" /></a></p>

<p>Data segregation and access control are also important topics, but I am not going to cover them much in this post, as they are orthogonal. If your code is not secure, you can be pretty sure that there is a risk of your access controls being subverted, especially if the application is monolithic. There are interesting issues in <a href="http://waimingmok.wordpress.com/2009/03/29/multi-tenancy-salesforcecom/">how to manage data segregation</a>, and how it is stored.</p>

<p>What are the threats we are trying to mitigate? If you are building a PAAS platform, something like <a href="http://heroku.com/">Heroku</a>, which quite a lot of people seem to be doing after Heroku got bought for $212m by Salesforce, then your entire business model is to run untrusted code from people you don&#8217;t really know. They may have intentional security holes, or accidental ones, which they may not even know about, and they may be looking to attack you. One <a href="http://blog.phpfog.com/2011/03/22/how-we-got-owned-by-a-few-teenagers-and-why-it-will-never-happen-again/">recent example is the case of PHPFog</a> who had their entire infrastructure taken over.</p>

<p>If you are developing SAAS applications you may be less worried, after all you just have to develop a secure application surely? It turns out to be a bit more complex than that. First many more complex applications do allow user code to run, or want to allow this, for customisation; at this point your application starts to become a platform. Aside from that, most applications process some form of external data that could have security issues. There have been widespread security holes in most media processing code, PDF, zlib, images and so on, causing buffer overflows and arbitrary code execution. In addition your applciation code could itself have bugs that allow remote code execution, or disclosure of data for the system tenants. As a service provider your reputation relies on optimal protection of the users data.</p>

<p>So the basic idea is of course that you build your system out of components based on their risk, and requirements, on the least privilege principle, where you try to put them in a sandboxed environment where they can do as little as possible. Obviously you do not have to sandbox at all, or you can choose a model with risks, but in an increasingly hostile world it is at least worth knowing what the better options are and how they could be built.</p>

<h2>Virtualization</h2>

<p>Virtualization is a key tool for user isolation, keeping them off real hardware and in a self contained environment which just looks like a computer. Of course you need an appropriate firewall as well, as there will be network access, and you probably don&#8217;t want it to be indiscriminate, just locked down to what is necessary to provide the service. The main issue is if you are running on a virtualized service such as Amazon EC2 that limits the smallest VM you can have, and hence sets a floor on your charges and profitability for small users of the service; this may or may not be a big issue for your application.</p>

<p>Pros: good isolation as cannot see anything else on the computer. Cons: heavyweight, as each instance needs a kernel and at least a skeleton bootable OS plus a fair memory overhead; not nestable with any performance (you have to use UML), so no use if you already run in a virtual environment; careful network firewalling necessary as you cant just pass sockets for communication for example.</p>

<h2>Interpreters</h2>

<p>Running code purely under an interpreter seems a very safe option, and indeed it is if you deal with a few risk factors. First, you need to make sure that the language has no libraries that can load or execute unsafe code. Some languages (like <a href="http://stackoverflow.com/questions/966162/best-way-to-omit-lua-standard-libraries">Lua</a> make this easier than others. Always whitelist not blacklist features. For real security you want everything running in the interpreter, otherwise you are in the situation where you may be calling say a native image library with security issues. Obviously there is a big performance hit, so this works best for smaller pieces of code, places where none of the other isolation methods are appropriate, or as a temporary measure before introducing more sandboxing. Once you introduce say a JIT compiler you probably need to isolate the code, as a JIT compiler has to be able to make writeable memory executable, which makes attacks much easier.</p>

<p>Pros: very secure in the right situation. Cons: performance; non-interpreted code that is called may have security flaws that need sandboxing.</p>

<h2>Managed code (Java and .Net)</h2>

<p>These bytecode JIT compilers with their own sandboxes are a practical hybrid between Interpreters and native code validation. They are however complex systems, and Java in particular has had a number of vulnerabilities (for example <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-5353">CVE-2008-5353</a> to pick one at random), as has .Net (eg <a href="http://www.sophos.com/support/knowledgebase/article/112313.html">MS10-077</a>) so cannot be considered to be a complete security solution.</p>

<p>Pros: vendor support, widely tested. Cons: complex environments increase risks; unclear how finegrain access controls are.</p>

<h2>Native code validation</h2>

<p>The Google Native Code browser plugin (NaCl) uses a <a href="http://code.google.com/p/nativeclient/wiki/Papers">code validator to check the binary code</a>. There are some restrictions on the code to make it checkable (code generation, such as a JIT has a special interface, there are alignment constraints and other details, necessitating a modified toolchain). This is a similar approach of course to that used in Java bytecode, but with the more difficult problem of native x86 machine code. This sandbox provides an interface layer, somewhat like the operating systems system call layer, but more restricted. In addition this whole set of code is wrapped in an outer sandbox as well, using chroot and pid namespaces.</p>

<p>Pros: supports very general code with little slowdown. Cons: needs code to be targetted for it; aimed at computationally intensive code rather than IO based code; not easily portable as the security model depends on architectural features, although it is portable between operating systems; risk of incorrect validation.</p>

<h2>Chroot plus</h2>

<p>The Unix <code>chroot</code> call on its own, which isolates a process into a part of the filesystem is not very secure, it is easy to get out of it using the ptrace command on another process that is outside the chroot. Running each process as a different user helps here, as then the process will not have another process it can attach to with ptrace as it will not have permission. There are some potential race conditions in setting the new user that may cause an issue here. An example of this type of sandbox done well is <a href="http://plash.beasts.org/environment.html">Plash</a>.</p>

<p>Pros: portable across unix versions. Cons: hard to do securely; cannot restrict network access.</p>

<h2>LXC</h2>

<p>Linux containers, unlike the equivalents in BSD and Solaris, are really a set of namespacing tools for different aspects of the system, set when a process calls <code>clone</code>. It can be viewed as a better set of tools to do <code>chroot</code> style isolation as well as a same-kernel virtualisation model; the tools support both running a whole system with startup scripts etc, or just a single process. Because it has been developed incrementally some of the support is new and if you want to run a whole system I recommend using something very new: Ubuntu 11.04 works nicely and has an <code>lxcguest</code> package, but I had odd issues with 10.10 not being fully isolated, although these might have been configuration. If you use it for single process isolation it is more straightforward as you can have a much more minimal environment. Currently the items that can be namespaced are process IDs, so your process cannot see or ptrace anything outside the container, file system, so it has its own mounts, network so it just sees a virtual network that can be firewalled as appropriate or can be passed a physical network adaptor exclusively, UTS so the process can see its own hostname, IPC for the SYSV IPC namespace. Coming soon is the addition of addition of the user namespace, so that new containers can be created by non root users.</p>

<p>Pros: process isolation done properly; allows controlled network isolation; still allows passing of file descriptors unlike virtualisation. Cons: only supported on newer Linux versions for some of the features.</p>

<h2>Chroot/container hybrids</h2>

<p>The current <a href="http://code.google.com/p/chromium/wiki/LinuxSUIDSandbox">default Chrome Linux sandbox</a> uses a mix of <code>chroot</code> and container calls, for maximum compatibility with  common distributions; in fact it will work without the container calls but with reduced security. It uses <code>chroot</code> for filesystem isolation, PID namespace to isolate processes, disables ptrace with <code>prctl</code> (which is not a complete mitigation as this is reversible).</p>

<p>Pros: more compatibility, upgrades <code>chroot</code> model towards a container. Cons: network access unrestricted.</p>

<h2>Seccomp</h2>

<p>Seccomp is a very restricted security sandbox that has shipped with the Linux kernel for quite a while, that can be set using the <code>prctl</code> system call. It then only allows four system calls, <code>read</code>, <code>write</code>, <code>_exit</code> and <code>sigreturn</code>. This is very restrictive, as the process cannot even allocate memory, so it is rarely used. It also turned out to have a <a href="http://www.securityfocus.com/bid/33948/info">bug</a> on 64 bit machines that allowed some other system calls. However Google did produce another <a href="http://lwn.net/Articles/346902/">sandbox for Chrome</a> based on it, using a very restricted helper thread to perform memory allocations and other system calls. The thread is quite complex as it runs in the same process as the hostile code, so there are quite a few complexities, and it is not so clear that the particular solution as such works for general purposes, but similar approaches could be suitable for some problems.</p>

<p>Pros: small kernel whitelist with restricted additions. Cons: complex and architecture dependent code running in a difficult environment; may not be suited for all uses.</p>

<h2>Ptrace</h2>

<p>The <code>ptrace</code> system call, used for debugging, can also be used to sandbox a process, as it can intercept system calls. However it is beset with race conditions and other problems, and a hostile process can circumvent it. There do not seem to be any fixes at the moment.</p>

<p>Pros: portable. Cons: not reliable.</p>

<h2>Selinux</h2>

<p>The Selinux mandatory access controls, designed by the NSA, is a complex but very powerful set of access controls for processes. The big advantage from a sandboxing point of view is that the controls are enforced by the kernel, and are very fine grained, such as access to particular ports, files, sockets and system calls. Items such as files can be relabelled as they are processed, so for example you could <a href="http://selinuxproject.org/page/PipelineDemo">not give users access to files before they had been validated or virus checked</a>. Selinux adoption has been slow, with Redhat the first to really push it into their distribution, gradually being followed by others, but many users simply disabling it when it caused issues. Most distributions use it in the &#8220;targeted&#8221; policy, which only puts external facing daemons in a controlled state, and lets normal users do everything they could do before, but gradually more types of policy are being added, such as a user sandbox to run untrusted code. There is an extensive <a href="http://oss.tresys.com/projects/refpolicy">reference policy</a> which the distributions base their code on which is a good reference for detailed customisation. It is also possible to push selinux controls into applications, <a href="http://lwn.net/Articles/242087/">such as Postgres</a>, and to use it to store user validation through an application.</p>

<p>Pros: fine-grained controls, kernel mediated; encourages modular architectures; encourages a security as code model. Cons: not installed everywhere; complex; another system description language; best suited to very modular architectures; needs to be maintained with code, or people may just disable it to make applications work; some performance hit, estimated at 7% but obviously very application dependent.</p>

<h2>Mitigation techniques</h2>

<p>I have included these, although they are not a whole sandbox, because security is layered and they can be used to increase security in a sandbox that has some potential risks. There are a lot of potential techniques here, so I won&#8217;t cover them all. <a href="http://en.wikipedia.org/wiki/Address_space_layout_randomization">Address space layout randomisation</a> (ASLR) is one technique, making it harder for an attacker to know where parts of the executable that they need to call to create an exploit are. This requires <a href="http://en.wikipedia.org/wiki/Position-independent_code">position independent code</a> (PIC), and has some default support in Linux, but more is available for example in the <a href="http://pax.grsecurity.net/docs/pax.txt">PaX project</a>. Another option, also supported by PaX, is to disable the ability to make writeable memory areas executable, which makes it impossible to inject new executable code into a process at runtime; this ability is however required by JIT compilers, something that <a href="http://daringfireball.net/2011/03/nitro_ios_43">caused issues with Javascript</a> when it was introduced recently in iOS. Another area is <a href="http://en.wikipedia.org/wiki/Buffer_overflow_protection">stack buffer overflow prevention</a>, for which there is now <code>gcc</code> support. These policies have rarely been used when compiling entire Linux distributions, with the notable exception of <a href="http://www.gentoo.org/proj/en/hardened/">Hardened Gentoo</a>, although they can also be used for individual applications.</p>

<p>Pros: adds more protection at little cost. Cons: only a mitigration, not a sandbox; some binaries need these capabilities for valid reasons.</p>

<h2>Conclusions</h2>

<p>As is probably clear from this brief summary, security is not just a simple compiler flag, it is a complex design process with a lot of work to do. It is an architectural issue to a large extent, as the more self contained your units are, the easier it is to use the least privilege principles, as for operating system controls the process is the unit of privilege (remember the Unix philosophy). Tooling and testing is fairly limited off the shelf, and debugging can be more difficult, so the overall cost of security is not negligible. On the other hand the cost of not implementing security is very high, particularly in the case of SAAS platforms where the industry is being held to a very high standard.</p>

<p>Which methods to choose? For a SAAS or PAAS multitenant platform, where the base OS is entirely under your control, some combination of lxc, selinux and other mitigation techniques seems to be a clear winner. This can be enhanced, starting with either lxc or selinux as a base and then adding more protections and more fine grain seperation of processes as things move on.</p>

<p><br/><br/></p>

<p><em>I am currently available for employment opportunities, so if you are looking for someone in architecture, operations, development who is interested in issues like this <a href="http://twitter.com/justincormack">get in touch</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/04/sandboxing-for-multi-tenant-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Utility Computing, or will the real cloud please get off its *AAS</title>
		<link>http://blog.technologyofcontent.com/2011/03/utility-computing-or-will-the-real-cloud-please-get-off-its-aas/</link>
		<comments>http://blog.technologyofcontent.com/2011/03/utility-computing-or-will-the-real-cloud-please-get-off-its-aas/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 22:04:48 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[cloud]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[utility compputing]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=306</guid>
		<description><![CDATA[Thanks to Dave Nielsen and Simon Wardley for evangelizing Douglas Parkhill and the history of cloud; this is quite a historical post after digging around books and writing Wikipedia articles. A lot of things are changing with the way we buy, sell, write and use software at the moment. We do live in very interesting [...]]]></description>
			<content:encoded><![CDATA[<p><em>Thanks to <a href="http://twitter.com/davenielsen">Dave Nielsen</a> and <a href="http://twitter.com/swardley">Simon Wardley</a> for evangelizing Douglas Parkhill and the history of cloud; this is quite a historical post after digging around books and writing Wikipedia articles.</em></p>

<p>A lot of things are changing with the way we buy, sell, write and use software at the moment. We do live in very interesting times. The big umbrella movement called cloud is one of the most interesting things; for example there is a huge shift to on demand software and hardware, ideal for the agility it enables, as well as disintermediating expensive sales processes.</p>

<p>Hardware started commoditising a long time back, with the huge economies of scale in chipmaking and computers, but wheat has really hit is the 2005 shock when <a href="http://www.gotw.ca/publications/concurrency-ddj.htm">clock speeds hit the ceiling</a> and Moore&#8217;s Law just went into more cores. Scale out was the only real option then for performance, although JIT compiler technology has been a hot topic, and if you program in some dynamic languages like Javascript you might be forgiven for thinking computers were still getting significantly faster. Simultaneously simpler lower power devices have been shifting up the performance curve, reaching a point where some people are starting to <a href="http://www.calxeda.com/">seriously build ARM servers</a>, and mobile devices are reaching useful performance for wide ranging tasks.</p>

<p>Many thought that at this point we would be programming SMP architectures with shared memory models, but while you can now get systems with very large number of cores in the SMP/NUMA world, but these are very expensive systems, and little software outside specialist areas. The majority of multisocket systems now are probably virtualized to appear as multiple single socket ones. Scale out, message passing, commodity hardware are mostly where things are, with a sideshow of commoditized GPU technology giving us another parallelism to work with, or another competitor to the ageing x86 architecture. China is building its own <a href="http://en.wikipedia.org/wiki/Loongson">MIPS64</a> based CPU too, designed to run Linux, more competition to lower prices. Computation, networking, storage are all getting very cheap.</p>

<p>I replaced my last dual socket 2GHz powerpc machine with a new dual core machine thats a fair bit faster but about 10-20% of the power consumption, and now almost all the ten or so computers around the house are small, low power, and dedicated to one function: netbook, 2 mobile phones, ipad, VOIP phone, router, storage, laptop, a couple of small computers. The highest power consumption is probably the old laptop, a last reminder of the old Intel that used to make chips up to 100W. It makes no sense for me to invest in large amounts of computer power in the home, because the bandwidth is not there: it takes too long to get larger datasets here, so I try to do computation where the data is and where there is bandwidth, so I outsource the computing.</p>

<p>When <a href="http://en.wikipedia.org/wiki/Douglas_Parkhill">Douglas Parkhill</a> wrote The Challenge of the Computer Utility in 1966, while working at the sinisterly capitalized <a href="http://www.mitre.org/">MITRE</a>, the reason for people to be interested in computers as utilities was the expense of actually owning a computer, which cost around $2m at the time, but could be rented at $450 an hour. But the economics of treating computers as a utility, improved utilisation, availability on demand are just as valid now, as well as the ability to shorten development feedback times and share common source code which he also cites. Utilities are largely centralised for reasons of economy of scale, ease of balancing changing demand, and ability to provide large quantities. Obviously some people have their own generators or solar cells and water supplies, and some people and companies sell power into the grid, but the majority comes from large sources, using varying technologies.</p>

<p>Other than PCs, which are increasingly mobile, most businesses don&#8217;t keep a lot of computing power on premise, most of it is in datacentres. And those datacentres are increasingly getting larger and further away, as the cost of power and real estate pushes them to cheaper locations. And they get bigger for cooling efficiency reasons, and start to turn into billion dollar projects, vastly more expensive in real terms than the computers of 1966. On demand computing suddenly starts to make sense for almost all businesses who do not wish to get into this type of expenditure. It also suddenly makes computation costs transparent, in terms of how much computation or accuracy is cost effective for a particular task, even if killing fixed cost budgeting is going to massively disrupt the financial planning of enterprises (another change that will need more computing power to manage).</p>

<h2>What does a computer utility look like?</h2>

<p><figure>
<img src="http://public.edge3.org/Brush_central_power_station_dynamos_New_York_1881.jpg" alt="Brush electric light station in New York, 1880s">
<figcaption><a href="http://en.wikipedia.org/wiki/Charles_F._Brush">Victorian sysadmins</a></figcaption>
</figure></p>

<p>The early days of utilities are dominated by several issues, one being standardization, whether it be <a href="http://en.wikipedia.org/wiki/Mains_electricity#History_of_voltage_and_frequency">AC frequency and voltage</a>, another being technological change, as from hollowed out trees, to clay pipes to plastic in water transmission, or financial boom and bust of the railway industry as the level of demand and supply were balanced out. Competition between new utilities and old ones was also significant, railways vs canals, roads vs railways, gas lighting vs electricity, hydraulic power vs electric power, and regulatory and legal issues from compulsory purchase, nationalisation and especially <a href="http://en.wikipedia.org/wiki/Munn_v._Illinois">price regulation</a>.</p>

<p>In the end though, the centralized delivery model means you end up with standardized services being delivered, you cannot choose the voltage and frequency you want your electricity delivered (other than wholesale bulk offerings, like three phase), the temperature or pressure of the tap water, the frequency of the trains, or the level of congestion on the roads. Utilities often have slightly odd externalities and natural distribution monopolies that vary from one to another.</p>

<p>Computer software architectures are very non standardized right now, as we are in the craft period of software. There is a lot more standardization than in 1966, when there was less mass production, and LSI was only just starting. Common abstractions have been slowly appearing, virtual memory being an early one in the <a href="http://en.wikipedia.org/wiki/Burroughs_large_systems">B5000 in 1961</a>, hiding the memory hierarchy in a way that simplifies a lot of code, even if other code needs to understand the abstraction for performance; that was around the same time that high level languages were being developed with mutable variables that matched the semantics of virtual memory. The next big abstraction, from the late 1960s, was the <a href="http://en.wikipedia.org/wiki/Unix">Unix</a> (almost) everything is a file model, a universal stream processing model for many types of data and connections with a single software interface, which also allowed compositional software. In 1970 <a href="http://en.wikipedia.org/wiki/Edgar_F._Codd">Codd</a> created the relational data model, an abstraction for databases. It was a productive time, in which <a href="http://en.wikipedia.org/wiki/Paul_Baran">Paul Baran</a> also invented the packet switched network made of unreliable components, another key building block of the cloud architecture. In the 1970s we had the language virtual machine, starting with the <a href="http://en.wikipedia.org/wiki/UCSD_Pascal">UCSD p-System</a> which heavily influenced Java, the beginning of virtualisation of hardware and operating system characteristics that has had some, although still arguably limited, success and standardisation.</p>

<p>The characteristics of a utility architecture are those embodied in the internet: distributed, failure tolerant (up to a limit of course, we still get water and electricity outages, with very weak SLAs), standardised interfaces (plug sockets, HTML), usage agnostic, scalable. In particular the six constraints of the <a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm">REST architectural style</a>, client-server, stateless, cacheable, layered, optional code on demand, uniform interfaces currently give us our best architectural model of a scaleable, on-demand system, and so our best model of what the abstractions of a truly utility infrastructure will look like. Another characteristic is of course the use of open source software, which represents both the commoditisation of infrastructure software, and the right price for utility and on demand scaleable software.</p>

<p><a href="http://www.flickr.com/photos/justincormack/4158836526/" title="Homage to Bernd and Hiller Becher (blue period) by Justin Cormack, on Flickr"><img src="http://farm3.static.flickr.com/2580/4158836526_ba83849e19_m.jpg" width="171" height="240" alt="Homage to Bernd and Hiller Becher (blue period)" /></a></p>

<h2>Current environments</h2>

<p>If we look at Amazon web services, the closest thing so far to a utility computing service (although with some transitional facilities bolted on), the storage abstraction, S3, looks pretty much like what we would expect from the REST style, resource based, supporting the HTTP concurrency model (etags, conditional updates); note it looks nothing like a POSIX file system, the previous abstraction. In the Amazon model, this is the only really persistent, reliable, scaleable, global storage. Other services like EBS are <a href="http://www.elasticvapor.com/2010/05/failure-as-service.html">not fully reliable</a> but can be snapshotted to S3, while SimpleDB is not really a database in the persistence sense, as it has severe limits on size and is not globally available; it is better seen as an abstract index, fulfilling that part of the database requirements without the persistence. Other storage such as local disks is even more ephemeral. I have another blog post in the works on how best to use these persistence models for real applications.</p>

<p>The network architectures in Amazon are similarly what you would expect for building REST architectures, with load balancers and an HTTP cache layer for example. In contrast, Amazon&#8217;s EC2 provides a Xen virtualised PC as the base abstraction which is pretty software architecture agnostic, compared to say Google App Engine, which provides an abstracted language runtime. The more general platform allows more experimentation, and crucially more legacy transition. We have a lot of historic usage of operating systems, security models, coding habits and <a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/productivity-vs-control-tradeoffs-in-paas.html">productivity issues</a> that make using &#8220;opinionated platforms&#8221; difficult.</p>

<p>In many ways though, whether the API provided on a utility compute resource is the <a href="http://www.oracle.com/technetwork/java/javaee/servlet/index.html">Java servlet standard</a> as in the <a href="http://code.google.com/appengine/docs/whatisgoogleappengine.html">GAE Java environment</a> or virtual network devices as in EC2, thats really a middleware question, and a question of the amount of environment you have to wrap around your code. The Amazon approach encourages people to repackage services for different language communities on top of their offering. The important parts of the architecture are the parts that support failure, scale, reliability. The most important one of these, often neglected, is explicit management of state. The <a href="http://blog.worldturner.com/worldturner/entry/stateless_computing">server itself</a> must be <a href="http://www.infoq.com/presentations/Runtime-Changes">stateless if it is to be failure proof</a>. All state must be managed in the persistent storage abstraction, and software upgrades by starting up new machines, and upgradeable interface specifications, another REST idea. This is a big cultural leap for people, apart from not being used to actually accounting for every data change the program makes, sysadmins are used to patching systems at runtime, rather than starting up new, tested, instances when code needs to be updated. Unless state is explicit it is not testable or reliable, or known, or recoverable. These things were discovered of course in the <a href="http://www.infoq.com/presentations/Systems-that-Never-Stop-Joe-Armstrong">Erlang community</a> where backups are a sign of poorly designed systems that have not covered all the failure cases properly. More on this side of things in other posts soon too.</p>

<p>Once you have realised that there is only scale-out, all state must be explicitly managed in state abstractions, and failure must be designed for all over, the container shape makes much less difference, as your software architecture will be very similar. Sure there are points of difference, packets versus streams say, sync vs async, different security architectures, different programming languages, protocols, serializations and libraries. The network is really the only important IO point for the utility computation; the server environment can be much simpler in principle than the complex server environments we run predominantly now, and thus able to scale better on more and more commodity hardware, highly parallel and low power that we were talking about before.</p>

<h2>Beyond Compute and Store</h2>

<p>I said before that Amazon SimpleDB was best viewed as an index not a database, and lots of the software we use in a web environment is like that, key-value stores, memcache, and some of the NoSQL solutions are providing this type of function, not concentrating mainly on persistence; obvious product areas here are things like full text search which are not the ultimate data source. Others are trying to provide &#8220;small item persistence&#8221;, although most of these systems are not yet being aimed at utility computing provision, so they have not tended to cleanly separate store and index for example, and persistence is mainly aimed at hard drives. Given the historical popularity of databases, and the current flourishing of the NoSQL movement, I can see an opportunity for something like Cassandra with an S3-type backend as a utility computing key-value store. Why not SQL? The main problem is the lack of scalability of JOINs; making it difficult to scale as a true elastic utility service. In addition, I can see a place for a low latency reliable transaction log service built on replicated SSD that shifts over automatically to permanent long term storage (please Amazon), to cut the persistence time, especially for small data items.</p>

<p>Other facilities for utility computing include authorization infrastructure, message queues, payment infrastucture and so on, blending into more specialist services sold as SAAS on the infrastructure.</p>

<h2>Beyond Now</h2>

<p>Our current code model looks a bit like running code inside a web server; indeed that is one way to do organize the code, or you can reverse this and run a web library inside the language library, like Node.js does for example. Architectures could change though, perhaps (sometimes) moving the code to the storage for lower latency; the advantage of strongly decoupled code with explicit state management is that it is much easier to move it around; explicit state management with all lasting state over the network has the same properties as functional programming, indeed as we said before is very much like the Erlang functional process and message passing model. This flexibility in where code runs allows another level of infrastructure virtualisation and efficiency, moving code towards requesters or data based on latency and throughput requirements.</p>

<p>I don&#8217;t think this is going to change how programming works a lot, well at least it should be pretty familiar to web programmers, might be some adjustment for the enterprise lot of course. What is going to change is sysadmin. One of the issues now with cloud solutions is they require too much sysadmin to really allow us to scale up our usage. <a href="http://en.wikipedia.org/wiki/Jevon%27s_paradox">Jevon&#8217;s Paradox</a> cannot come into effect and let us expand capacity as price falls if there are additional costs such as admin overhead. The infrastructure providers have managed to cut staff to less than 1 per 4000 machines, but the users of services like Amazon&#8217;s still have a lot of software environment management overhead. I think we can cut that too, substantially, using stateless servers and other changes, but that is another blog post.</p>

<p>This is perhaps a bit of a run through history, trying to make sense of the journey towards utility computing that has been going on for over 45 years but is really just starting to become mainstream, and where we still have the opportunity to build a new utility, which is something that does not happen very often. Some more concrete explorations coming soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/03/utility-computing-or-will-the-real-cloud-please-get-off-its-aas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CMS or framework and a challenge</title>
		<link>http://blog.technologyofcontent.com/2011/02/cms-or-framework-and-a-challenge/</link>
		<comments>http://blog.technologyofcontent.com/2011/02/cms-or-framework-and-a-challenge/#comments</comments>
		<pubDate>Mon, 28 Feb 2011 15:18:10 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=296</guid>
		<description><![CDATA[I came across an interesting blog post the other day, Drupal as an Application Framework: Unofficially competing in the BostonPHP Framework Bakeoff, where Ben Buckman tried to enter a PHP web framework bakeoff using Drupal, rather than a &#8220;real&#8221; framework; he was not allowed, and in the end CakePHP, Symfony, Zend framework and CodeIgnitor officially [...]]]></description>
			<content:encoded><![CDATA[<p>I came across an interesting blog post the other day, <a href="http://benbuckman.net/tech/11/02/drupal-application-framework-bostonphp-competition">Drupal as an Application Framework: Unofficially competing in the BostonPHP Framework Bakeoff</a>, where <a href="http://twitter.com/thebuckst0p" title="No twitter I won't use your hashbang URLs">Ben Buckman</a> tried to enter a <a href="http://www.meetup.com/bostonphp/events/16011906/">PHP web framework bakeoff</a> using Drupal, rather than a &#8220;real&#8221; framework; he was not allowed, and in the end <a href="http://cakephp.org">CakePHP</a>, <a href="http://www.symfony-project.org/">Symfony</a>, <a href="http://framework.zend.com/">Zend framework</a> and <a href="http://codeigniter.com/">CodeIgnitor</a> officially competed. However Ben cunningly sat in the back and built the test system anyway. What he also did was record his screen for the 38 minutes he was working on this, with commentary added, which is really interesting. The <a href="http://phpbakeoff.newleafdigital.com/">built site is here</a>.</p>

<iframe src="http://player.vimeo.com/video/20286577" width="400" height="225" frameborder="0"></iframe>

<p><a href="http://vimeo.com/20286577">Drupal (unofficially) competing in the BostonPHP Framework Bakeoff</a> from <a href="http://vimeo.com/newleafdigital">New Leaf Digital</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

<h2>The challenge</h2>

<p>So first here comes the challenge to other CMS vendors and system integrators: spend three hours (or less) on repeating this using your system. You get an hour or so for planning, based on <a href="https://bostonphp.mybalsamiq.com/projects/bostonphpjobboard/grid">the wireframes</a>. Note there are some incompletenesses in these (typical client!), like what the tag pages look like, so improvise, and some explicit open decision points (&#8220;require some sort of password&#8221;), so use whatever is easiest. You get up to an hour on the build (points given for less!) The aim is not a polished final shippable product, but a first alpha or beta. Then you are allowed another hour for annotating the screencast , putting the site online for us to look at etc, if you need it. You can do the annotation as you build if you are pushed for time and cut the planning and you should be able to get the whole thing done in an hour, so no excuses for anyone not to have a go.</p>

<p>Basically the site is a simple job board, in a circa 2005 Ruby on Rails style, so there is nothing that a decent web content management system should not be able to do out of the box. If there are bits your system technically finds hard its probably worth fixing them (the Drupal video reveals some small things that are apparently fixed in version 7 for example). But get it close enough, as really the aim is to give people an idea of how usable your system is for this sort of build task, the approach you have to take, and how quick it is! If your data model requires something a little different, by all means explain and make something close. There is no attempt for example to impose a REST architectural style and support an API that supports PUT and DELETE on jobs, but if that comes out of the box or trivially with the CMS you use, by all means show it off and put a bit more 2010 into the solution!</p>

<p>And no trying to get out of this, or I will write up your CMS as &#8220;pretty useless compared to Drupal&#8221; or your web CMS integrator as &#8220;rubbish, use New Leaf Digital, they have balls&#8221;. If you want to do it live as a webcast then feel free to invite me along (but record it too).</p>

<h2>The bit about frameworks and CMSs</h2>

<p>It is an interesting question though, whats the real difference between web development frameworks and CMS systems. Indeed, if you think that web frameworks are nicer in every way, feel free to enter the challenge above using your favourite framework.</p>

<p>My current thinking is that a CMS is best defined as a web development framework plus an IDE that persists configuration to a content repository or database. By all means disgree with me!</p>

<p>By web development framework, I mean something that gives you a full web server, front end and backend integration tools, authentication and access control, templating and so on. A CMS has an IDE in addition, a web frontend that lets you configure it, define datatypes, forms, layouts etc. But generally, unlike say visual code editors which persist to editable files, the results of these changes in the backend are persisted to the same content repository as the actual content, either as database entries or schemas, or as tree nodes (as in JCR based systems). Hence for example these changes do not generally end up in a source code control system, although they may be versioned in the way content is, or they may not as for example Drupal CCK schema changes change the underlying database, while with a standalone framework you might have to script schema changes for upgrades in the traditional way.</p>

<p>Also the issues in deployment with CMS products due to the use of a single repository for content and config are clear, even if many systems do try to make them separable by various means; the Drupal example shows this to some extent as all the frameworks have their <a href="https://github.com/bostonphp/Framework-Bakeoff-2011/">code available on github</a> but to set up your own instance of the Drupal example you would need some code, and a database dump which you would then have to filter the content in the example instance from.</p>

<p>This also makes it clear why CMS systems are usually very tied into one framework, as the IDE has to embody a lot of implicit knowledge of the data model and the way the framework wants to do things, as well as how to persist changes. Designing a more decoupled approach is much more difficult, although perhaps possible (thats another blog post to come, the decoupled CMS). But the streamlined coupling makes solving problems that fit the data model easy and quick, as the challenge above demonstrates. Of course if the type of problem does not match up to the the model that the CMS applies, then it comes down to trying to force a teapot into round hole.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/02/cms-or-framework-and-a-challenge/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Cherry tart</title>
		<link>http://blog.technologyofcontent.com/2011/02/cherry-tart/</link>
		<comments>http://blog.technologyofcontent.com/2011/02/cherry-tart/#comments</comments>
		<pubDate>Thu, 17 Feb 2011 15:02:42 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[recipes]]></category>
		<category><![CDATA[cherries]]></category>
		<category><![CDATA[microformats]]></category>
		<category><![CDATA[pomiane]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=272</guid>
		<description><![CDATA[Recipe: Cherry TartSummary: Pomiane&#8217;s cherry tart recipe Ingredients 8 oz bread dough 3 oz butter 1.5 lbs cherries 5 oz caster sugarInstructions Yes thats bread dough, so make that first (you can make bread with the rest of it). Then try to mix the butter into the dough. It won&#8217;t like this, and will try [...]]]></description>
			<content:encoded><![CDATA[<!--script src=”http://www.recip.ly/static/js/jquery-reciply.js” type=”text/javascript”></script-->

<div class="hrecipe"><h2 class="fn">Recipe: Cherry Tart</h2><p class="summary"><strong>Summary</strong>: <em>Pomiane&#8217;s cherry tart recipe</em></p>
<a href="http://www.flickr.com/photos/justincormack/152775984/" title="cherry tart by Justin Cormack, on Flickr"><img src="http://farm1.static.flickr.com/56/152775984_54d9501665_m.jpg" width="240" height="240" alt="cherry tart" class="photo"/></a>
<div class="ingredients"><h4>Ingredients</h4><ul class="ingredients"><li class="ingredient"> 8 oz bread dough
</li><li class="ingredient"> 3 oz butter
</li><li class="ingredient"> 1.5 lbs cherries
</li><li class="ingredient"> 5 oz caster sugar</li></ul></div><div class="instructions"><h4>Instructions</h4><ol class="instructions">
<li>Yes thats bread dough, so make that first (you can make bread with the rest of it).</li>

<li>Then try to mix the butter into the dough. It won&#8217;t like this, and will try not to mix in. Force it. It will get messy.</li>

<li>Roll out the dough, and line a buttered pie dish with it. Now let it rest for a bit, so the yeast in the bread dough rises, making a lighter pastry. Prick with a fork.</li>

<li>Stone the cherries. I have a special cherry stoning device now, which is rather useful for those two weeks a year we have cherries in season here. Otherwise try a pointy object, but it will get messy.</li>

<li>Fill tart with 2 layers of cherries and sprinkle with caster sugar, cook in a hot oven for 20 minutes. Remove and sprinkle with more sugar.</li>

</ul></div><div class="quicknotes"><h4>Quick Notes</h4><p class="quicknotes">From Cooking with <a href="http://en.wikipedia.org/wiki/Edouard_de_Pomiane">Pomiane</a>, an excellent book from a fine chef of a previous generation, one of the first radio chefs.</p></div><p class="duration">Cooking time (duration): <span class="value-title" title="PT1H0M"></span>60</p><p class="diettype"><span class="hrlabel">Diet type: </span><span class="hritem">Vegetarian</span></p><p class="yield"><span class="hrlabel">Number of servings (yield): </span><span class="hritem">6</span></p><p class="mealtype"><span class="hrlabel">Meal type: </span><span class="hritem">dessert</span></p><p class="tradition"><span class="hrlabel">Culinary tradition: </span><span class="hritem">French</span></p><p class="review hreview-aggregate">My rating:  <span class="rating"><span class="average">5 </span> stars:&nbsp; &#9733;&#9733;&#9733;&#9733;&#9733;<span class="count"> 1</span> review(s)</span></p>

<!--div class="reciply-addtobasket-widget" href="http://blog.technologyofcontent.com/2011/02/cherry-tart/ "></div-->


</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/02/cherry-tart/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>wysiwyg editors in web content management</title>
		<link>http://blog.technologyofcontent.com/2011/01/wysiwyg-editors-in-web-content-management/</link>
		<comments>http://blog.technologyofcontent.com/2011/01/wysiwyg-editors-in-web-content-management/#comments</comments>
		<pubDate>Tue, 04 Jan 2011 12:57:22 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[wysiwyg]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=258</guid>
		<description><![CDATA[I am slowly making up a set of blog posts about the components of content management systems, starting with the earlier one on content repositories. Coming up next will probably be templating systems. In the beginning the web was editable in the browser. Tim Berners Lee made it so. But this did not last for [...]]]></description>
			<content:encoded><![CDATA[<p>I am slowly making up a set of blog posts about the components of content management systems, starting with the earlier one on <a href="http://blog.technologyofcontent.com/2010/09/towards-a-comparison-of-content-repositories/">content repositories</a>. Coming up next will probably be templating systems.</p>

<p>In the beginning the web was editable in the browser. Tim Berners Lee made it so. But this did not last for long. Eventually Microsoft restored this, to some extent, with the <code>contentEditable</code> properties and related Javascript interfaces. Now these have a lot of foibles, and the current HTML5 standardization has not done a huge amount of work here yet, certainly not adding new features or changing the basic interface substantially, so a lot of behaviour is currently underdefined, which makes cross browser compatibility more difficult. While the main desktop browsers all implement the interface, mobile ones do not in general yet, even on form fators such as the iPad, which could very well be an effective content editing device.</p>

<h2>Should you use one at all?</h2>

<p>Almost every CMS supports wysywig HTML editing, but it is not entirely clear how much they are used. To be quite blunt, they have always been among the worst editing environments ever devised, overlaid with a poor HTML model and a tendency towards presentational markup. Most users do their editing elsewhere, at least for originating content. I think though that we are at the point  where this can be improved.</p>

<p>The first thing is to kill the presentational markup, and instead support customised semantic use of classes. On the basic level this is easy to implement; there is probably more work to easily support more advanced HTML templates, for example to mark up recipes or other specialist content in consistent semantic ways through this type of interface. Pluggable schema guidance might be needed for this.</p>

<p>The second is to make the editors themselves sane. First step is of course local offline HTML5 storage, so that drafts are still there if you wander off, your computer runs out if battery or whatever. Then just make them nicer to use, keyboard friendly and so on.</p>

<p>Then there is the concurrency question, you may want to show concurrent editing, and the versioning model that applies. This does not have to be the Google Wave style simultaneous editng, although that is one option. People may be editing content for release in different versions.</p>

<p>Import from other editors is of course still important, and making it as painless as possible for the common tools is of course important.</p>

<p>Also, remember it is an editor, people like a choice of editors, so loosely couple it, make it standalone and easily changed.</p>

<h2>What not to use</h2>

<p>Old school implementations, with nothing now to recommend them that I can see; often they seem to need server side components too, which should not be required now when pure Javascript solutions should be sufficient.</p>

<ol>
<li><p><a href="http://www.openwebware.com/">Open Wysiwyg</a></p>

<p>Not off to a good start here as the <a href="http://www.openwebware.com/wysiwyg/demo.shtml">demo</a> does not work in Chrome, works in Opera and Firefox. I guess it is not really in development then. Does not create paragraphs by default, just uses <code>&lt;br&gt;</code>. Still believes in web safe colours and fonts. A classic example of the old school, but not up to date. Open source.</p></li>
<li><p><a href="http://ckeditor.com/">CKEditor</a></p>

<p>Much more promising <a href="http://ckeditor.com/demo">demo</a>, . Paragraphs are real (uses a <code>&amp;nbsp</code> for blank ones, which is I suppose ok), there is a paragraph style dropdown that manages paragraphs, headings and divs. The style dropdown is a bit disappointing as it sets styles not classes, and you can still set fonts and so on. Works in Chrome, Firefox and Opera. The old school done right. Open source.</p></li>
<li><p><a href="http://tinymce.moxiecode.com/">TinyMCE</a></p>

<p>Ah yes, the classic one. Feature set similar to CKEditor, in taht it treats paragraphs the same, has a dropdown with predefined styles, that are style tags not classes. Seems a bit clunky and flaky in comparison, and has odd terminology, since when has a div been a layer? Both have a smiley insertion feature that suggests that people have been wasting time on pointless stuff rather than reconsidering how these programs should work. Open source.</p></li>
<li><p><a href="http://xstandard.com/">XStandard</a></p>

<p>Good blurb about supporting CSS properly and the importance of markup and presentation separation, but in a bout of massive fail there is no demo on the website. Your reviewer has no intention of installing it to test it, especially as it does not support Linux as a platform in a second fail. Commercial.</p></li>
<li><p><a href="http://www.innovastudio.com/editor.aspx">InnovaStudio</a></p>

<p>No demo either, apart from animated screenshots. Why are these commercial companies so full of fail? Web 2.0 products with no web trial? Seems obsessively to be about dimensions and fonts from the features list, not about separation of concerns.</p></li>
<li><p><a href="http://code-samples.cybervillage.com/activedit/">ActiveEdit</a></p>

<p>ActiveX DHTML with Java fallback for other platforms. Need I say more? Commercial.</p></li>
<li><p><a href="http://contenteditable.com/">contentEditable</a></p>

<p>Well at least here the <a href="http://contenteditable.com/">demo is the homepage</a>. All inline with a simple-ish mechanism to mark blocks as editable. Stupidly though, while it appears to be non modal, all the menus are modal popups, which completely destroys the usability.</p></li>
<li><p><a href="http://www.themaninblue.com/experiment/widgEditor/">widgEditor</a></p>

<p>Very simple editor. Adds divs not paragraphs around everything. Open source.</p></li>
</ol>

<h2>Good implementations</h2>

<p>Largely based on <code>contentEditable</code>, fixing up the inconsistencies, although Dijit is based on the less flexible <code>designMode</code> which means it has to run in an iFrame.</p>

<ol>
<li><p><a href="http://developer.yahoo.com/yui/3/editor/">YUI editor</a></p>

<p>Note there is also a <a href="http://developer.yahoo.com/yui/editor/">YUI 2 version</a>. In many ways the <a href="http://developer.yahoo.com/yui/3/examples/editor/editor-instance.html">demo</a> is not much use, as of course this is part of the YUI framework and needs some Javscript infrastructure to customise it effectively. There is also a <a href="http://developer.yahoo.com/yui/examples/editor/toolbar_editor.html">YUI2 version demo</a>, and linked demos of plugins. Much better than anything above. Open source. Iframe based for compatibility with the grade A browsers.</p></li>
<li><p><a href="http://dojotoolkit.org/reference-guide/dijit/Editor.html">Dijit</a></p>

<p>This is the editor that comes with the Dojo widget set. Still iframe based, not a full contentEditable implementation.</p></li>
<li><p><a href="http://www.aloha-editor.com/">Aloha</a></p>

<p>Some nice <a href="http://www.aloha-editor.com/demos.php">demos available</a>, indeed probably the best demo site of any editor. Aims to fix up the issues and cross browser quirks contentEditable, and looks nice too. Open Source.</p></li>
</ol>

<h2>Other ways 1: Canvas</h2>

<p>A survey of what is up is not complete without mentioning <a href="https://mozillalabs.com/skywriter/">Skywriter</a>, which is a (code) editor written entirely in canvas. As far as I can see this is an utterly pointless approach long run. It is especially a bad idea for HTML, where fitting in with the CSS of the page matters, so lets ignore this.</p>

<h2>Other ways 2: pure Javascript</h2>

<p>You can in principle implement the whole of a <code>contentEditable</code>-like editor in pure Javascript, using a little div as a cursor and everything, no help at all from the browser. That is a lot of work, of course. I know of two implementations, one being the <a href="http://googledocs.blogspot.com/2010/05/whats-different-about-new-google-docs.html">2010 release of Google Docs</a>, which explains why the did that, in order to be able to provide the best cross browser support. The other implementation I know of is the one for <a href="http://cms.squizsuite.net/">Squiz CMS</a>, no online demo, not open source and tied into one product. This is pretty functional although it had some browser compatibility issues last time I used it, and ther website still says it does not support Webkit.</p>

<h2>Other ways 3: Markdown and Wiki markup</h2>

<p>I have to say I use Markdown a lot, this blog is generally written in it. There are two issues I see. One is that it is limited, and while you can add native HTML, this often means everything has to be redone in HTML (trying to add anchors to a list for example). There are extensions with more features, but then you get into compatibility issues. ALso there is no sanely reversible transformation that will generate the sme output as there are notational choices, so it is generally best to keep the content in Markdown all the way through. There are much the same issues with Wiki markup, although it is extensible too, which causes more issues of trying to remember constructs, and compatibility is weakened. The difficulty is that HTML is not a very good markup for humans to write, but once you get past the very restricted domain that say Markdown handles, producing something that works well is hard, maybe impossible.</p>

<h2>Other ways 4: Don&#8217;t use HTML</h2>

<p>A number of people I have worked with, and some content management systems go with a very minimalist use of HTML, pretty much text is paragraphs, everything else is structured fields. A slightly enhanced version of this, effectively a very minimalist tiny markup, corresponding to even less than Markdown is <a href="http://blog.programmableweb.com/2009/11/11/content-portability-building-an-api-is-not-enough/">the COPE system</a>.</p>

<p>Personally I believe we need to expand the use of rich text, for many use cases &#8220;just the words&#8221; is not enough, and we need equations, images, notes, asides, sidebars, tables, captions, and all the richness of rich text, so I don&#8217;t think this is viable for much serious writing, and we need to expand from the dumbed down web that this enforces.</p>

<h2>Concluding</h2>

<p>While the wysiwyg editor as it has been in content management systems is pretty awful, I think we need to expand the principle and try to do better, go back to the really editable web, and support well structured semantic HTML in rich ways, not forms and simple text boxes. There is a lot to do to get this working though still.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/01/wysiwyg-editors-in-web-content-management/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Content Delivery Network via Amazon Web Services: CloudFront: blog.edge3.org

Served from: blog.technologyofcontent.com @ 2012-02-04 14:08:46 -->
