<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Technology of Content &#187; programming</title>
	<atom:link href="http://blog.technologyofcontent.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.technologyofcontent.com</link>
	<description>Ramblings on the technology of content management</description>
	<lastBuildDate>Sun, 29 Jan 2012 16:38:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Data-driven documents talk</title>
		<link>http://blog.technologyofcontent.com/2012/01/data-driven-documents-talk/</link>
		<comments>http://blog.technologyofcontent.com/2012/01/data-driven-documents-talk/#comments</comments>
		<pubDate>Sun, 29 Jan 2012 16:37:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[d3]]></category>
		<category><![CDATA[javascript]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=393</guid>
		<description><![CDATA[A few days back I gave a talk about D3.js at London Web Standards. The video of the talk is available as are the talk slides. The slides are freely available to re-use, and you can view the source to see what is going on, I have tried to keep everything inline in the slides [...]]]></description>
			<content:encoded><![CDATA[<p>A few days back I gave a talk about <a href="http://mbostock.github.com/d3/">D3.js</a> at <a href="http://www.londonwebstandards.org/">London Web Standards</a>. The <a href="http://vimeo.com/35580586">video of the talk is available</a> as are the <a href="http://lws.node3.org">talk slides</a>. The slides are freely available to re-use, and you can view the source to see what is going on, I have tried to keep everything inline in the slides to make it easy to understand. I used the <a href="http://slides.html5rocks.com/">HTML5 rocks</a> slides as a basis, which worked very well. You need to click on some of the slides to make stuff happen&#8230;</p>

<p>As I said in the talk, D3 is really great to use, as it is about code not configuration, and lets you really do what you want. It is purely based on working with data and the DOM, without other abstractions layers. I hightly recommend it!</p>

<p>One thing I mentioned in the talk was using Google spreadsheets for storing arbitrary data that doesn&#8217;t already have somewhere to live. I have found this worked out quite well with clients who want an easy &#8220;CMS&#8221; for data that they can update and feed straight through to graphs and maps etc on the website. A few people asked about this, so I thought I would give a quick guide.</p>

<p>I have made a public Google spreadsheet <a href="https://docs.google.com/spreadsheet/ccc?key=0Agr9u3lfdOGHdHZoc1dVYXhDdl9GSk5SbVRBSGZYRFE">for these examples</a>. Will probably get messed up as it is public!</p>

<p>The easy way to use this is just to use CSV, as once you make the spreadsheet public, the interface will show you the CSV URL (of the first sheet only, regardless of the setting). In this case <a href="https://docs.google.com/spreadsheet/pub?hl=en_US&amp;hl=en_US&amp;key=0Agr9u3lfdOGHdHZoc1dVYXhDdl9GSk5SbVRBSGZYRFE&amp;single=true&amp;gid=0&amp;output=csv">the link is https://docs.google.com/spreadsheet/pub?hl=en_US&amp;hl=en_US&amp;key=0Agr9u3lfdOGHdHZoc1dVYXhDdl9GSk5SbVRBSGZYRFE&amp;single=true&amp;gid=0&amp;output=csv</a> which gave me the following:</p>

<p><pre>
date,colour,value,genre,french
2011-01-01,red,2,rock,roches
2011-01-02,blue,1,grunge,grunge
2011-01-03,orange,66,pop,pop
2011-01-04,green,4,metal,métallique
2011-01-05,pink,6,jazz,le jazz
2011-01-06,brown,5,punk,Punk
2011-01-07,purple,33,rap,Rap
2011-01-08,black,12,reggae,reggae
2011-01-09,white,11,alternative,alternatives
2011-01-10,gray,5,dance,danse
2011-01-11,violet,14,country,pays
2011-01-12,yellow,9,blues,le blues
2011-01-13,indigo,10,funk,funk
</pre></p>

<p>This is clearly usable in D3, but it does have cross domain issues in most browsers, so you would normally have to proxy it back to your domain if you are calling via Ajax.</p>

<p>The options in the &#8220;publish to the web&#8221; menu for giving you URLs are HTML, CSV, text, PDF, XLS or ODS, none of which are what we really want, which is JSONP. However there is a JSONP option, it is just a very different URL, which is designed to be got through the API. The problem with the API is all the calls are authenticated so it is a real pain to just extract a (public!) URL. After hacking around with the Python client example code, I found a (messy) workaround however.</p>

<p>Every spreadsheet on Google has an ID, and each sheet within it also has an ID. These do not correspond to the keys in the public access links. You can get the spreadsheet ID from within the Javascript API available for spreadsheets, as <code>SpreadsheetApp.getActiveSpreadsheet().getId()</code>. You cannot get the sheet ID, but the first sheet you create always has the ID <code>od6</code>, the next <code>od7</code>, and then some other values that do not seem to change. So to write a little function that gives you the spreadsheet JSONP URL, do the following:</p>

<p>&lt;</p>

<p>ol>
<li>Go to Tools / Script Editor from your spreadsheet</li>
<li>Add the following function:
<pre>
function getURL() {
  return "http://spreadsheets.google.com/feeds/list/" + SpreadsheetApp.getActiveSpreadsheet().getId() + "/" + "od6/public/values/?alt=json-in-script&amp;callback=";
}
</pre>
</li>
<li>Save the script</li>
<li>You can now use <code>=getURL()</code> as a formula in the spreadsheet which will return the URL, in this case <code>http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values?alt=json-in-script&amp;callback=</code> you just need to add your callback name.</li></p>

<p>If we open that URL with a callback of <code>test</code> we get back:
<pre>
// API callback
test(
{ "encoding" : "UTF-8",
  "feed" : { "author" : [ { "email" : { "$t" : "justin@specialbusservice.com" },
            "name" : { "$t" : "justin" }
          } ],
      "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
            "term" : "http://schemas.google.com/spreadsheets/2006#list"
          } ],
      "entry" : [ { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: red, value: 2, genre: rock, french: roches",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "red" },
            "gsx$date" : { "$t" : "1/1/2011" },
            "gsx$french" : { "$t" : "roches" },
            "gsx$genre" : { "$t" : "rock" },
            "gsx$value" : { "$t" : "2" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cokwr" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cokwr",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/1/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: blue, value: 1, genre: grunge, french: grunge",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "blue" },
            "gsx$date" : { "$t" : "1/2/2011" },
            "gsx$french" : { "$t" : "grunge" },
            "gsx$genre" : { "$t" : "grunge" },
            "gsx$value" : { "$t" : "1" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cpzh4" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cpzh4",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/2/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: orange, value: 66, genre: pop, french: pop",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "orange" },
            "gsx$date" : { "$t" : "1/3/2011" },
            "gsx$french" : { "$t" : "pop" },
            "gsx$genre" : { "$t" : "pop" },
            "gsx$value" : { "$t" : "66" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cre1l" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cre1l",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/3/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: green, value: 4, genre: metal, french: métallique",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "green" },
            "gsx$date" : { "$t" : "1/4/2011" },
            "gsx$french" : { "$t" : "métallique" },
            "gsx$genre" : { "$t" : "metal" },
            "gsx$value" : { "$t" : "4" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/chk2m" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/chk2m",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/4/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: pink, value: 6, genre: jazz, french: le jazz",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "pink" },
            "gsx$date" : { "$t" : "1/5/2011" },
            "gsx$french" : { "$t" : "le jazz" },
            "gsx$genre" : { "$t" : "jazz" },
            "gsx$value" : { "$t" : "6" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ciyn3" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ciyn3",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/5/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: brown, value: 5, genre: punk, french: Punk",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "brown" },
            "gsx$date" : { "$t" : "1/6/2011" },
            "gsx$french" : { "$t" : "Punk" },
            "gsx$genre" : { "$t" : "punk" },
            "gsx$value" : { "$t" : "5" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ckd7g" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/ckd7g",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/6/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: purple, value: 33, genre: rap, french: Rap",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "purple" },
            "gsx$date" : { "$t" : "1/7/2011" },
            "gsx$french" : { "$t" : "Rap" },
            "gsx$genre" : { "$t" : "rap" },
            "gsx$value" : { "$t" : "33" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/clrrx" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/clrrx",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/7/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: black, value: 12, genre: reggae, french: reggae",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "black" },
            "gsx$date" : { "$t" : "1/8/2011" },
            "gsx$french" : { "$t" : "reggae" },
            "gsx$genre" : { "$t" : "reggae" },
            "gsx$value" : { "$t" : "12" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cyevm" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cyevm",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/8/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: white, value: 11, genre: alternative, french: alternatives",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "white" },
            "gsx$date" : { "$t" : "1/9/2011" },
            "gsx$french" : { "$t" : "alternatives" },
            "gsx$genre" : { "$t" : "alternative" },
            "gsx$value" : { "$t" : "11" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cztg3" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cztg3",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/9/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: gray, value: 5, genre: dance, french: danse",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "gray" },
            "gsx$date" : { "$t" : "1/10/2011" },
            "gsx$french" : { "$t" : "danse" },
            "gsx$genre" : { "$t" : "dance" },
            "gsx$value" : { "$t" : "5" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d180g" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d180g",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/10/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: violet, value: 14, genre: country, french: pays",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "violet" },
            "gsx$date" : { "$t" : "1/11/2011" },
            "gsx$french" : { "$t" : "pays" },
            "gsx$genre" : { "$t" : "country" },
            "gsx$value" : { "$t" : "14" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d2mkx" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/d2mkx",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/11/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: yellow, value: 9, genre: blues, french: le blues",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "yellow" },
            "gsx$date" : { "$t" : "1/12/2011" },
            "gsx$french" : { "$t" : "le blues" },
            "gsx$genre" : { "$t" : "blues" },
            "gsx$value" : { "$t" : "9" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cssly" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cssly",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/12/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          },
          { "category" : [ { "scheme" : "http://schemas.google.com/spreadsheets/2006",
                  "term" : "http://schemas.google.com/spreadsheets/2006#list"
                } ],
            "content" : { "$t" : "colour: indigo, value: 10, genre: funk, french: funk",
                "type" : "text"
              },
            "gsx$colour" : { "$t" : "indigo" },
            "gsx$date" : { "$t" : "1/13/2011" },
            "gsx$french" : { "$t" : "funk" },
            "gsx$genre" : { "$t" : "funk" },
            "gsx$value" : { "$t" : "10" },
            "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cu76f" },
            "link" : [ { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values/cu76f",
                  "rel" : "self",
                  "type" : "application/atom+xml"
                } ],
            "title" : { "$t" : "1/13/2011",
                "type" : "text"
              },
            "updated" : { "$t" : "2012-01-29T15:22:57.368Z" }
          }
        ],
      "id" : { "$t" : "https://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values" },
      "link" : [ { "href" : "https://spreadsheets.google.com/pub?key=tvhsWUaxCv_FJNRmTAHfXDQ",
            "rel" : "alternate",
            "type" : "text/html"
          },
          { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values",
            "rel" : "http://schemas.google.com/g/2005#feed",
            "type" : "application/atom+xml"
          },
          { "href" : "http://spreadsheets.google.com/feeds/list/tvhsWUaxCv_FJNRmTAHfXDQ/od6/public/values?alt=json-in-script",
            "rel" : "self",
            "type" : "application/atom+xml"
          }
        ],
      "openSearch$startIndex" : { "$t" : "1" },
      "openSearch$totalResults" : { "$t" : "13" },
      "title" : { "$t" : "data",
          "type" : "text"
        },
      "updated" : { "$t" : "2012-01-29T15:22:57.368Z" },
      "xmlns" : "http://www.w3.org/2005/Atom",
      "xmlns$gsx" : "http://schemas.google.com/spreadsheets/2006/extended",
      "xmlns$openSearch" : "http://a9.com/-/spec/opensearchrss/1.0/"
    },
  "version" : "1.0"
}
);
</pre></p>

<p>Now you may think that is odd JSON, but it is a literal JSON conversion of ATOM, which is why it has a lot of apparent junk. The bit we want is the array of objects in <code>feed.entry</code> which has among other entries
<pre>
[         { ...
            "gsx$colour" : { "$t" : "red" },
            "gsx$date" : { "$t" : "1/1/2011" },
            "gsx$french" : { "$t" : "roches" },
            "gsx$genre" : { "$t" : "rock" },
            "gsx$value" : { "$t" : "2" },...
          }, ...
]
</pre>
Each column heading will be preceded by <code>gsx$</code>, and have a <code>$t</code> value, so you just need your JSONP callback function to iterate over <code>feed.entry</code> and parse <code>gsx$date.$t</code> and so on, which is pretty simple. This then solves the cross domain issues and you can easily call this direct from the page to render a graph in a callback without any server side code at all.</p>

<p>Bonkers, you may think. But it does work, and what you get is a reliable easy to use http addressible tabular data source. Really, the whole of Google docs is like this, half genius half biscuit. It could be a great service, but almost seems to be wrapped up in legacy already. It could be so much better, but improvement seems slow; the work seems to have gone into the actual editing side not the backend, which is also saddled with Excel compatibility goals. A lot of scope for competition in this space.</p>

<p>Anyway, I hope that helps. I really do think for many applications it is a great backend system, if you hide the complexity from the end users!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2012/01/data-driven-documents-talk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scripting languages grow up</title>
		<link>http://blog.technologyofcontent.com/2011/05/scripting-languages-grow-up/</link>
		<comments>http://blog.technologyofcontent.com/2011/05/scripting-languages-grow-up/#comments</comments>
		<pubDate>Sun, 22 May 2011 20:08:29 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=345</guid>
		<description><![CDATA[There is a lot of focus on APIs, often REST APIs, but one aspect of API design, and code design that is sometimes missed is scripting. Actually one of the REST design principles is to use mobile code in the right places, but we usually only see that in the situation of Javascript being sent [...]]]></description>
			<content:encoded><![CDATA[<p>There is a lot of focus on APIs, often REST APIs, but one aspect of API design, and code design that is sometimes missed is scripting. Actually one of the REST design principles is to use mobile code in the right places, but we usually only see that in the situation of Javascript being sent to browser clients.</p>

<p>General purpose scripting languages have been with us for a long time, since <a href="http://en.wikipedia.org/wiki/REXX">Rexx</a> was started in 1979 and Tcl in 1988, followed by Lua in 1993; depending on what you define as a scripting language, as you might include IBM&#8217;s Job Control Language, or shells such as ksh. Indeed there is some confusion in the <a href="http://en.wikipedia.org/wiki/Scripting_language">Wikipedia article</a> about what they really are, and the term has been used for all sorts of languages.</p>

<p>A scripting language is a language that is designed to either control an external program or set of programs, providing an easy way to construct compositions of the components being provided by the library or toolkit, or the opposite construction, to be embedded inside another language to control parts of the application in a dynamic way, like Javascript controls rendering in a browser. These are normally referred to as extending and embedding.</p>

<p>The reasons for the two different ways vary, with extending being for use cases like shell scripting where there is a large library of programs with a uniform interface which can be composed to perform a multitude of tasks. The performance of the shell script language is not particularly important is it is mostly just building system level compositions, such as pipes, or making simple conditions and loops. Other examples though are more complex, and the successful general purpose scripting languages have a full set of programming language types and constructs, including first class functions, inheritance and so on; languages like shell scripts and Tcl which only have a single string type staying limited to smaller areas. Essentially these  become domain specific languages (DSLs) for working in a paarticular domain. Structuring your code so that it acts as a set of libraries for the domain, while embedding these in a scripting language that does the plumbing is a good way of creating a flexible design, and indeed if you cannot refactor your application as independent libraries with a scripting glue then it is probably not very well designed.</p>

<p>The other way round, embedding, is <a href="http://www.twistedmatrix.com/users/glyph/rant/extendit.html">sometimes unfairly seen as a bad idea</a>, but in the right situations it makes a  lot of sense, for example for embedding a query language, or for avoiding server round trips by coordinating a set of commands, such as the way databases embed store procedure languages, or the recent <a href="http://antirez.com/post/scripting-branch-released.html">embedding of Lua in Redis</a>.</p>

<p>Why not do everything in one language? The original reasons were that &#8220;real&#8221; programming languages were statically typed, compiled, and had terrible string handling (yes C, we are looking at you, a language which once had <code>gets</code>), while scripting languages had garbage collection, dynamic everything, interpreted environments with friendly errors, simple string libraries, and were extremely slow, maybe 100 times slower than C. They also used to have fairly poor module structuring, and other facilities for programming in the large. This has not stopped people building large projects using largely scripting languages (Vignette in Tcl being an early example), particularly with the LAMP stack which started off as a simple glue between a web server and a database, but has grown to much larger applications.</p>

<p>What has changed is a gradual convergence, as some of the more friendly features of scripting languages, such as garbage collection and better string libraries started to move into mainstream languages with Java, and the JIT compiler that really started gaining popularity with the JVM has recently been seriously applied to scripting languages, in particular Javascript, Lua and Python, which are making a bid for serious performance. LuaJIT now performs similarly or better than Java in many benchmarks, while PyPy and Javascript are rapidly getting within a small factor of Java. This does not mean that there are not still many places where static memory allocation and the low level guarantees of C are not useful, such as in database design and so on, and of course there are large libraries of existing, well tested software.</p>

<h2>Foreign functions</h2>

<p>Another gradual change is the development of FFI (foreign function interface) libraries. The original open source <a href="http://en.wikipedia.org/wiki/Libffi"><code>libffi</code></a> has been around since 1996, and Python has had <code>ctypes</code> for a long time too, but there have been issues with these bindings. While they are relatively easy to construct compared to writing a full C binding, they are messy in some languages, although the Ruby ones are fairly readable, a binding to <code>puts</code> in libc being defined with:</p>

<pre><code>module Foo
  extend FFI::Library
  ffi_lib FFI::Library::LIBC
  attach_function :puts, [ :string ], :int
end
</code></pre>

<p>The second problem after syntax, was performance. Most ffi bindings were very slow compared to native C bindings, by a factor that was significant for most use cases. However this has started to change too, first with the <a href="http://blog.segment7.net/2008/01/15/rubinius-foreign-function-interface">Rubinius bindings</a> from a few years back, which have a very small overhead, of just a function call which performs the necessary type conversion, and then more recently with the <a href="http://luajit.org/ext_ffi.html">LuaJIT FFI library</a> which not only has a usable syntax, as it has most of a C header file parser built in, it also is natively understood by the JIT compiler so it can generate code that has no overhead at all, actually less than the standard C Lua bindings.</p>

<p>So the following simple program that mainly executes a fast (virtual) system call:</p>

<pre><code>local ffi = require "ffi"

ffi.cdef[[
struct timeval {
  long tv_sec;
  long tv_usec;
};
int gettimeofday(struct timeval *tv, void *tz);
]]

local tv = ffi.new("struct timeval")

for i = 1,100000000 do
  ffi.C.gettimeofday(tv, nil)
  if tv.tv_usec == 0 then print "." end
end
</code></pre>

<p>generates the following assembly for the inner loop, where the <code>call r12</code> is a direct call to the libc syscall wrapper:</p>

<pre><code>-&gt;LOOP:
394cffd0  mov rdi, [rsp+0x8]
394cffd5  xor esi, esi
394cffd7  call r12
394cffda  cmp qword [rbx+0x10], +0x00
394cffdf  jz 0x394c0024 -&gt;5
394cffe5  add ebp, +0x01
394cffe8  cmp ebp, 0x05f5e100
394cffee  jle 0x394cffd0    -&gt;LOOP
394cfff0  jmp 0x394c0028    -&gt;6
</code></pre>

<p>And that runs marginally faster than the C equivalent, showing the advantages of an FFI library that is natively understood by the JIT compiler, as well as of course the analysis that goes behind making sure this is a valid optimisation, such as being able to register allocate the loop variable. It is also nice to see a scripting language generating nice assembler! LuaJIT is currently the fastest dynamic language available by a large margin.</p>

<p>So are there any disadvantages to a well designed FFI interface? Well, having been <a href="https://github.com/justincormack/ljsyscall">using the LuaJIT one for a while</a>, the issues are mostly with how some C code is written. A lot of C code is not well written, well encapsualted code. The ABI may depend on all sorts of conditions, not just the architecture, but also the build options of the program, all wrapped in <code>#ifdef</code> conditionals. Macros are used a lot, sometimes generating code for runtime, which of course then has to be reimplemented in the scripting language. The preprocessor is overused, with people rarely using <code>enum</code> instead, and C enums have odd semantics, as they are always cast ints. Scripting languages rareky if ever use stack allocation, while C libraries assume that it is often the norm, so libraries do not hide their internal structures with <code>void *</code> pointers that they heap allocate, which would often be much easier. Also C libraries have historically been written to support old versions of C, so without variable length arrays, and without the sized integer types such as <code>uint32</code> etc. So it can get messy to interface with, requiring a lot of testing and extra code wrappers to make things work well. Best to stick to well designed code if at all possible.</p>

<p><a href="http://www.flickr.com/photos/justincormack/2158917829/" title="After the rain by Justin Cormack, on Flickr"><img src="http://farm3.static.flickr.com/2175/2158917829_fee1a7f319.jpg" width="500" height="279" alt="After the rain"></a></p>

<h2>Structuring scriptable programs</h2>

<p>The easiest way to write code that is friendly to being scripted is if most of the code is structured as a set of libraries, exposing clean operations and with clear semantics for allocation and deallocation; a good way to write code anyway. For maximum portability, C is easier to interface with than C++, due to name mangling and C++ exceptions, as well as the fact that you may be interfacing to a language that does not really do object orientation in the way you might use it in C++. C++ exceptions have marginal support in FFI interfaces, and explicit error handling at the external interfaces is more easily usable. Don&#8217;t use thread local return values, like <code>errno</code> either. Callbacks in the same scripting VM may also be a problem in some environments, that is calls from the scripting language to C then back to a script callback. Generally owning the event loop in a library is annoying, and this is a case of that to some extent, it is easier in that case if you just call into user code, as in Node.js. These are generally sensible organisational principles for code anyway, so it should be possible to script any well written code.</p>

<p>Note I didnt really mention Java and .Net here. Most people largely only interface within their own runtime. While there are JCM and .Net versions of most programming languages, they are less well supported in general, and especially in the Java case very slow, often similar to the native interpreter, or a little faster but not as fast as native JIT compilers; although there have been recent improvements, the JVM is not currently friendly to dynamic languages. It is of course possible to expose a C API to Java code, through JNI, as it works both ways, to allow Java to be scripted from a non JVM scripting language, although it seems to be less common, and it also has heavier performance costs than the usual C to non-managed scripting language boundary. Of course the big advantage of staying within these frameworks is that calling other languages say within the JVM is very easy, as the runtime understands the calling conventions, so interoperability is very simple, so many Java programs offer Rhino scripting say, but it will be slow. Instead there are statically typed but more appropriate languages for DSLs that can be used, such as Scala or Clojure.</p>

<p>For calling into user code, for example as in the Redis scripting API, or the older but similar example of database stored procedure languages, another case where more standard scripting languages are now available such as <a href="http://pllua.projects.postgresql.org/">Lua in Postgres</a>, the aim is to make the exposed operations easy to work with, and to allow use of native language features, such as how iterators work, and to use <a href="http://en.wikipedia.org/wiki/Coroutine">coroutines</a> if that is appropriate, as for example in the <a href="https://github.com/chaoslawful/lua-nginx-module">Lua embedding in Nginx</a> which uses coroutines so that apparently synchronous code can be run asynchronously. The use cases are many, one to improve APIs so that you minimise round trips and moving what would be client side operations to the server, sometimes to implement atomic operations that could not be specified over an API in a performant way, as with stored procedures in a database. Another use is the Node.js case, to embed a very well known and usable language in some low level code (the asynchronous native library), an environment that would otherwise only be availabe in C. Many large programs are actually structured largely as scripting language layers, for example <a href="http://lua-users.org/lists/lua-l/2006-01/msg00111.html">over 40% of Adobe Lightroom code is written in Lua</a>, and much of Firefox is written in Javascript.</p>

<h2>Summarising</h2>

<p>There has been a huge effort in making scripting languages perform well. Performing well while interfacing in a really easy way to external code is another big step to making scripting a default part of the design of the majority of large scale code, as well as to help integrate the huge installed and tested codebases already out there. Scripting languages generally do not have an imperative to avoid externally programmed core code, unlike say Java, for compatibility reasons, or Go for simplicity reasons. This makes them excellent glue languages, combined with dynamic typing that tends to allow easy modification. If scripting languages were as fast as say C, there are still people who prefer to use statically typed languages with more deterministic compilation and potentially runtime guarantees, and some sorts of libraries are more likely to be available on some languages than others, skewing the choices. But if you are writing complex memory management code in C++, it could be time to switch parts of the code with unclear lifetimes to a scripting language. Mixing languages has never been easier, or more compelling.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/05/scripting-languages-grow-up/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Sandboxing for multi-tenant applications</title>
		<link>http://blog.technologyofcontent.com/2011/04/sandboxing-for-multi-tenant-applications/</link>
		<comments>http://blog.technologyofcontent.com/2011/04/sandboxing-for-multi-tenant-applications/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 11:44:49 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[cloud]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[multitenancy]]></category>
		<category><![CDATA[multitenant]]></category>
		<category><![CDATA[PAAS]]></category>
		<category><![CDATA[SAAS]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=313</guid>
		<description><![CDATA[Overview of sandboxing techniques in Linux]]></description>
			<content:encoded><![CDATA[<p>If you are building a SAAS application it naturally supports multiple tenants; if you are building a PAAS platform it may well do too. Multitenancy may even go all the way down, maybe you are building a SAAS application on a PAAS platform on IAAS. Most of the writing on sandboxing is around desktop applications or browser sandboxing, so I thought it would be helpful to write a survey from a more cloud point of view, as most of the cloud writing seems to be about database issues. I also did not find an overview of all the solutions in one place for comparison. Note that I am not a security professional, although I have that kind of devious thought process and I am fairly good at finding security holes in applications, so you should take professional advice. I am also only going to cover Linux systems here; there is enough to cover without going further afield. The solutions are similar on other platforms but of course the differences are important.</p>

<p><a href="http://dilbert.com/strips/comic/2009-11-19/" title="Dilbert.com"><img width="480" src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/70000/4000/100/74150/74150.strip.gif" border="0" alt="Dilbert.com" /></a></p>

<p>Data segregation and access control are also important topics, but I am not going to cover them much in this post, as they are orthogonal. If your code is not secure, you can be pretty sure that there is a risk of your access controls being subverted, especially if the application is monolithic. There are interesting issues in <a href="http://waimingmok.wordpress.com/2009/03/29/multi-tenancy-salesforcecom/">how to manage data segregation</a>, and how it is stored.</p>

<p>What are the threats we are trying to mitigate? If you are building a PAAS platform, something like <a href="http://heroku.com/">Heroku</a>, which quite a lot of people seem to be doing after Heroku got bought for $212m by Salesforce, then your entire business model is to run untrusted code from people you don&#8217;t really know. They may have intentional security holes, or accidental ones, which they may not even know about, and they may be looking to attack you. One <a href="http://blog.phpfog.com/2011/03/22/how-we-got-owned-by-a-few-teenagers-and-why-it-will-never-happen-again/">recent example is the case of PHPFog</a> who had their entire infrastructure taken over.</p>

<p>If you are developing SAAS applications you may be less worried, after all you just have to develop a secure application surely? It turns out to be a bit more complex than that. First many more complex applications do allow user code to run, or want to allow this, for customisation; at this point your application starts to become a platform. Aside from that, most applications process some form of external data that could have security issues. There have been widespread security holes in most media processing code, PDF, zlib, images and so on, causing buffer overflows and arbitrary code execution. In addition your applciation code could itself have bugs that allow remote code execution, or disclosure of data for the system tenants. As a service provider your reputation relies on optimal protection of the users data.</p>

<p>So the basic idea is of course that you build your system out of components based on their risk, and requirements, on the least privilege principle, where you try to put them in a sandboxed environment where they can do as little as possible. Obviously you do not have to sandbox at all, or you can choose a model with risks, but in an increasingly hostile world it is at least worth knowing what the better options are and how they could be built.</p>

<h2>Virtualization</h2>

<p>Virtualization is a key tool for user isolation, keeping them off real hardware and in a self contained environment which just looks like a computer. Of course you need an appropriate firewall as well, as there will be network access, and you probably don&#8217;t want it to be indiscriminate, just locked down to what is necessary to provide the service. The main issue is if you are running on a virtualized service such as Amazon EC2 that limits the smallest VM you can have, and hence sets a floor on your charges and profitability for small users of the service; this may or may not be a big issue for your application.</p>

<p>Pros: good isolation as cannot see anything else on the computer. Cons: heavyweight, as each instance needs a kernel and at least a skeleton bootable OS plus a fair memory overhead; not nestable with any performance (you have to use UML), so no use if you already run in a virtual environment; careful network firewalling necessary as you cant just pass sockets for communication for example.</p>

<h2>Interpreters</h2>

<p>Running code purely under an interpreter seems a very safe option, and indeed it is if you deal with a few risk factors. First, you need to make sure that the language has no libraries that can load or execute unsafe code. Some languages (like <a href="http://stackoverflow.com/questions/966162/best-way-to-omit-lua-standard-libraries">Lua</a> make this easier than others. Always whitelist not blacklist features. For real security you want everything running in the interpreter, otherwise you are in the situation where you may be calling say a native image library with security issues. Obviously there is a big performance hit, so this works best for smaller pieces of code, places where none of the other isolation methods are appropriate, or as a temporary measure before introducing more sandboxing. Once you introduce say a JIT compiler you probably need to isolate the code, as a JIT compiler has to be able to make writeable memory executable, which makes attacks much easier.</p>

<p>Pros: very secure in the right situation. Cons: performance; non-interpreted code that is called may have security flaws that need sandboxing.</p>

<h2>Managed code (Java and .Net)</h2>

<p>These bytecode JIT compilers with their own sandboxes are a practical hybrid between Interpreters and native code validation. They are however complex systems, and Java in particular has had a number of vulnerabilities (for example <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-5353">CVE-2008-5353</a> to pick one at random), as has .Net (eg <a href="http://www.sophos.com/support/knowledgebase/article/112313.html">MS10-077</a>) so cannot be considered to be a complete security solution.</p>

<p>Pros: vendor support, widely tested. Cons: complex environments increase risks; unclear how finegrain access controls are.</p>

<h2>Native code validation</h2>

<p>The Google Native Code browser plugin (NaCl) uses a <a href="http://code.google.com/p/nativeclient/wiki/Papers">code validator to check the binary code</a>. There are some restrictions on the code to make it checkable (code generation, such as a JIT has a special interface, there are alignment constraints and other details, necessitating a modified toolchain). This is a similar approach of course to that used in Java bytecode, but with the more difficult problem of native x86 machine code. This sandbox provides an interface layer, somewhat like the operating systems system call layer, but more restricted. In addition this whole set of code is wrapped in an outer sandbox as well, using chroot and pid namespaces.</p>

<p>Pros: supports very general code with little slowdown. Cons: needs code to be targetted for it; aimed at computationally intensive code rather than IO based code; not easily portable as the security model depends on architectural features, although it is portable between operating systems; risk of incorrect validation.</p>

<h2>Chroot plus</h2>

<p>The Unix <code>chroot</code> call on its own, which isolates a process into a part of the filesystem is not very secure, it is easy to get out of it using the ptrace command on another process that is outside the chroot. Running each process as a different user helps here, as then the process will not have another process it can attach to with ptrace as it will not have permission. There are some potential race conditions in setting the new user that may cause an issue here. An example of this type of sandbox done well is <a href="http://plash.beasts.org/environment.html">Plash</a>.</p>

<p>Pros: portable across unix versions. Cons: hard to do securely; cannot restrict network access.</p>

<h2>LXC</h2>

<p>Linux containers, unlike the equivalents in BSD and Solaris, are really a set of namespacing tools for different aspects of the system, set when a process calls <code>clone</code>. It can be viewed as a better set of tools to do <code>chroot</code> style isolation as well as a same-kernel virtualisation model; the tools support both running a whole system with startup scripts etc, or just a single process. Because it has been developed incrementally some of the support is new and if you want to run a whole system I recommend using something very new: Ubuntu 11.04 works nicely and has an <code>lxcguest</code> package, but I had odd issues with 10.10 not being fully isolated, although these might have been configuration. If you use it for single process isolation it is more straightforward as you can have a much more minimal environment. Currently the items that can be namespaced are process IDs, so your process cannot see or ptrace anything outside the container, file system, so it has its own mounts, network so it just sees a virtual network that can be firewalled as appropriate or can be passed a physical network adaptor exclusively, UTS so the process can see its own hostname, IPC for the SYSV IPC namespace. Coming soon is the addition of addition of the user namespace, so that new containers can be created by non root users.</p>

<p>Pros: process isolation done properly; allows controlled network isolation; still allows passing of file descriptors unlike virtualisation. Cons: only supported on newer Linux versions for some of the features.</p>

<h2>Chroot/container hybrids</h2>

<p>The current <a href="http://code.google.com/p/chromium/wiki/LinuxSUIDSandbox">default Chrome Linux sandbox</a> uses a mix of <code>chroot</code> and container calls, for maximum compatibility with  common distributions; in fact it will work without the container calls but with reduced security. It uses <code>chroot</code> for filesystem isolation, PID namespace to isolate processes, disables ptrace with <code>prctl</code> (which is not a complete mitigation as this is reversible).</p>

<p>Pros: more compatibility, upgrades <code>chroot</code> model towards a container. Cons: network access unrestricted.</p>

<h2>Seccomp</h2>

<p>Seccomp is a very restricted security sandbox that has shipped with the Linux kernel for quite a while, that can be set using the <code>prctl</code> system call. It then only allows four system calls, <code>read</code>, <code>write</code>, <code>_exit</code> and <code>sigreturn</code>. This is very restrictive, as the process cannot even allocate memory, so it is rarely used. It also turned out to have a <a href="http://www.securityfocus.com/bid/33948/info">bug</a> on 64 bit machines that allowed some other system calls. However Google did produce another <a href="http://lwn.net/Articles/346902/">sandbox for Chrome</a> based on it, using a very restricted helper thread to perform memory allocations and other system calls. The thread is quite complex as it runs in the same process as the hostile code, so there are quite a few complexities, and it is not so clear that the particular solution as such works for general purposes, but similar approaches could be suitable for some problems.</p>

<p>Pros: small kernel whitelist with restricted additions. Cons: complex and architecture dependent code running in a difficult environment; may not be suited for all uses.</p>

<h2>Ptrace</h2>

<p>The <code>ptrace</code> system call, used for debugging, can also be used to sandbox a process, as it can intercept system calls. However it is beset with race conditions and other problems, and a hostile process can circumvent it. There do not seem to be any fixes at the moment.</p>

<p>Pros: portable. Cons: not reliable.</p>

<h2>Selinux</h2>

<p>The Selinux mandatory access controls, designed by the NSA, is a complex but very powerful set of access controls for processes. The big advantage from a sandboxing point of view is that the controls are enforced by the kernel, and are very fine grained, such as access to particular ports, files, sockets and system calls. Items such as files can be relabelled as they are processed, so for example you could <a href="http://selinuxproject.org/page/PipelineDemo">not give users access to files before they had been validated or virus checked</a>. Selinux adoption has been slow, with Redhat the first to really push it into their distribution, gradually being followed by others, but many users simply disabling it when it caused issues. Most distributions use it in the &#8220;targeted&#8221; policy, which only puts external facing daemons in a controlled state, and lets normal users do everything they could do before, but gradually more types of policy are being added, such as a user sandbox to run untrusted code. There is an extensive <a href="http://oss.tresys.com/projects/refpolicy">reference policy</a> which the distributions base their code on which is a good reference for detailed customisation. It is also possible to push selinux controls into applications, <a href="http://lwn.net/Articles/242087/">such as Postgres</a>, and to use it to store user validation through an application.</p>

<p>Pros: fine-grained controls, kernel mediated; encourages modular architectures; encourages a security as code model. Cons: not installed everywhere; complex; another system description language; best suited to very modular architectures; needs to be maintained with code, or people may just disable it to make applications work; some performance hit, estimated at 7% but obviously very application dependent.</p>

<h2>Mitigation techniques</h2>

<p>I have included these, although they are not a whole sandbox, because security is layered and they can be used to increase security in a sandbox that has some potential risks. There are a lot of potential techniques here, so I won&#8217;t cover them all. <a href="http://en.wikipedia.org/wiki/Address_space_layout_randomization">Address space layout randomisation</a> (ASLR) is one technique, making it harder for an attacker to know where parts of the executable that they need to call to create an exploit are. This requires <a href="http://en.wikipedia.org/wiki/Position-independent_code">position independent code</a> (PIC), and has some default support in Linux, but more is available for example in the <a href="http://pax.grsecurity.net/docs/pax.txt">PaX project</a>. Another option, also supported by PaX, is to disable the ability to make writeable memory areas executable, which makes it impossible to inject new executable code into a process at runtime; this ability is however required by JIT compilers, something that <a href="http://daringfireball.net/2011/03/nitro_ios_43">caused issues with Javascript</a> when it was introduced recently in iOS. Another area is <a href="http://en.wikipedia.org/wiki/Buffer_overflow_protection">stack buffer overflow prevention</a>, for which there is now <code>gcc</code> support. These policies have rarely been used when compiling entire Linux distributions, with the notable exception of <a href="http://www.gentoo.org/proj/en/hardened/">Hardened Gentoo</a>, although they can also be used for individual applications.</p>

<p>Pros: adds more protection at little cost. Cons: only a mitigration, not a sandbox; some binaries need these capabilities for valid reasons.</p>

<h2>Conclusions</h2>

<p>As is probably clear from this brief summary, security is not just a simple compiler flag, it is a complex design process with a lot of work to do. It is an architectural issue to a large extent, as the more self contained your units are, the easier it is to use the least privilege principles, as for operating system controls the process is the unit of privilege (remember the Unix philosophy). Tooling and testing is fairly limited off the shelf, and debugging can be more difficult, so the overall cost of security is not negligible. On the other hand the cost of not implementing security is very high, particularly in the case of SAAS platforms where the industry is being held to a very high standard.</p>

<p>Which methods to choose? For a SAAS or PAAS multitenant platform, where the base OS is entirely under your control, some combination of lxc, selinux and other mitigation techniques seems to be a clear winner. This can be enhanced, starting with either lxc or selinux as a base and then adding more protections and more fine grain seperation of processes as things move on.</p>

<p><br/><br/></p>

<p><em>I am currently available for employment opportunities, so if you are looking for someone in architecture, operations, development who is interested in issues like this <a href="http://twitter.com/justincormack">get in touch</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/04/sandboxing-for-multi-tenant-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Utility Computing, or will the real cloud please get off its *AAS</title>
		<link>http://blog.technologyofcontent.com/2011/03/utility-computing-or-will-the-real-cloud-please-get-off-its-aas/</link>
		<comments>http://blog.technologyofcontent.com/2011/03/utility-computing-or-will-the-real-cloud-please-get-off-its-aas/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 22:04:48 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[cloud]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[utility compputing]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=306</guid>
		<description><![CDATA[Thanks to Dave Nielsen and Simon Wardley for evangelizing Douglas Parkhill and the history of cloud; this is quite a historical post after digging around books and writing Wikipedia articles. A lot of things are changing with the way we buy, sell, write and use software at the moment. We do live in very interesting [...]]]></description>
			<content:encoded><![CDATA[<p><em>Thanks to <a href="http://twitter.com/davenielsen">Dave Nielsen</a> and <a href="http://twitter.com/swardley">Simon Wardley</a> for evangelizing Douglas Parkhill and the history of cloud; this is quite a historical post after digging around books and writing Wikipedia articles.</em></p>

<p>A lot of things are changing with the way we buy, sell, write and use software at the moment. We do live in very interesting times. The big umbrella movement called cloud is one of the most interesting things; for example there is a huge shift to on demand software and hardware, ideal for the agility it enables, as well as disintermediating expensive sales processes.</p>

<p>Hardware started commoditising a long time back, with the huge economies of scale in chipmaking and computers, but wheat has really hit is the 2005 shock when <a href="http://www.gotw.ca/publications/concurrency-ddj.htm">clock speeds hit the ceiling</a> and Moore&#8217;s Law just went into more cores. Scale out was the only real option then for performance, although JIT compiler technology has been a hot topic, and if you program in some dynamic languages like Javascript you might be forgiven for thinking computers were still getting significantly faster. Simultaneously simpler lower power devices have been shifting up the performance curve, reaching a point where some people are starting to <a href="http://www.calxeda.com/">seriously build ARM servers</a>, and mobile devices are reaching useful performance for wide ranging tasks.</p>

<p>Many thought that at this point we would be programming SMP architectures with shared memory models, but while you can now get systems with very large number of cores in the SMP/NUMA world, but these are very expensive systems, and little software outside specialist areas. The majority of multisocket systems now are probably virtualized to appear as multiple single socket ones. Scale out, message passing, commodity hardware are mostly where things are, with a sideshow of commoditized GPU technology giving us another parallelism to work with, or another competitor to the ageing x86 architecture. China is building its own <a href="http://en.wikipedia.org/wiki/Loongson">MIPS64</a> based CPU too, designed to run Linux, more competition to lower prices. Computation, networking, storage are all getting very cheap.</p>

<p>I replaced my last dual socket 2GHz powerpc machine with a new dual core machine thats a fair bit faster but about 10-20% of the power consumption, and now almost all the ten or so computers around the house are small, low power, and dedicated to one function: netbook, 2 mobile phones, ipad, VOIP phone, router, storage, laptop, a couple of small computers. The highest power consumption is probably the old laptop, a last reminder of the old Intel that used to make chips up to 100W. It makes no sense for me to invest in large amounts of computer power in the home, because the bandwidth is not there: it takes too long to get larger datasets here, so I try to do computation where the data is and where there is bandwidth, so I outsource the computing.</p>

<p>When <a href="http://en.wikipedia.org/wiki/Douglas_Parkhill">Douglas Parkhill</a> wrote The Challenge of the Computer Utility in 1966, while working at the sinisterly capitalized <a href="http://www.mitre.org/">MITRE</a>, the reason for people to be interested in computers as utilities was the expense of actually owning a computer, which cost around $2m at the time, but could be rented at $450 an hour. But the economics of treating computers as a utility, improved utilisation, availability on demand are just as valid now, as well as the ability to shorten development feedback times and share common source code which he also cites. Utilities are largely centralised for reasons of economy of scale, ease of balancing changing demand, and ability to provide large quantities. Obviously some people have their own generators or solar cells and water supplies, and some people and companies sell power into the grid, but the majority comes from large sources, using varying technologies.</p>

<p>Other than PCs, which are increasingly mobile, most businesses don&#8217;t keep a lot of computing power on premise, most of it is in datacentres. And those datacentres are increasingly getting larger and further away, as the cost of power and real estate pushes them to cheaper locations. And they get bigger for cooling efficiency reasons, and start to turn into billion dollar projects, vastly more expensive in real terms than the computers of 1966. On demand computing suddenly starts to make sense for almost all businesses who do not wish to get into this type of expenditure. It also suddenly makes computation costs transparent, in terms of how much computation or accuracy is cost effective for a particular task, even if killing fixed cost budgeting is going to massively disrupt the financial planning of enterprises (another change that will need more computing power to manage).</p>

<h2>What does a computer utility look like?</h2>

<p><figure>
<img src="http://public.edge3.org/Brush_central_power_station_dynamos_New_York_1881.jpg" alt="Brush electric light station in New York, 1880s">
<figcaption><a href="http://en.wikipedia.org/wiki/Charles_F._Brush">Victorian sysadmins</a></figcaption>
</figure></p>

<p>The early days of utilities are dominated by several issues, one being standardization, whether it be <a href="http://en.wikipedia.org/wiki/Mains_electricity#History_of_voltage_and_frequency">AC frequency and voltage</a>, another being technological change, as from hollowed out trees, to clay pipes to plastic in water transmission, or financial boom and bust of the railway industry as the level of demand and supply were balanced out. Competition between new utilities and old ones was also significant, railways vs canals, roads vs railways, gas lighting vs electricity, hydraulic power vs electric power, and regulatory and legal issues from compulsory purchase, nationalisation and especially <a href="http://en.wikipedia.org/wiki/Munn_v._Illinois">price regulation</a>.</p>

<p>In the end though, the centralized delivery model means you end up with standardized services being delivered, you cannot choose the voltage and frequency you want your electricity delivered (other than wholesale bulk offerings, like three phase), the temperature or pressure of the tap water, the frequency of the trains, or the level of congestion on the roads. Utilities often have slightly odd externalities and natural distribution monopolies that vary from one to another.</p>

<p>Computer software architectures are very non standardized right now, as we are in the craft period of software. There is a lot more standardization than in 1966, when there was less mass production, and LSI was only just starting. Common abstractions have been slowly appearing, virtual memory being an early one in the <a href="http://en.wikipedia.org/wiki/Burroughs_large_systems">B5000 in 1961</a>, hiding the memory hierarchy in a way that simplifies a lot of code, even if other code needs to understand the abstraction for performance; that was around the same time that high level languages were being developed with mutable variables that matched the semantics of virtual memory. The next big abstraction, from the late 1960s, was the <a href="http://en.wikipedia.org/wiki/Unix">Unix</a> (almost) everything is a file model, a universal stream processing model for many types of data and connections with a single software interface, which also allowed compositional software. In 1970 <a href="http://en.wikipedia.org/wiki/Edgar_F._Codd">Codd</a> created the relational data model, an abstraction for databases. It was a productive time, in which <a href="http://en.wikipedia.org/wiki/Paul_Baran">Paul Baran</a> also invented the packet switched network made of unreliable components, another key building block of the cloud architecture. In the 1970s we had the language virtual machine, starting with the <a href="http://en.wikipedia.org/wiki/UCSD_Pascal">UCSD p-System</a> which heavily influenced Java, the beginning of virtualisation of hardware and operating system characteristics that has had some, although still arguably limited, success and standardisation.</p>

<p>The characteristics of a utility architecture are those embodied in the internet: distributed, failure tolerant (up to a limit of course, we still get water and electricity outages, with very weak SLAs), standardised interfaces (plug sockets, HTML), usage agnostic, scalable. In particular the six constraints of the <a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm">REST architectural style</a>, client-server, stateless, cacheable, layered, optional code on demand, uniform interfaces currently give us our best architectural model of a scaleable, on-demand system, and so our best model of what the abstractions of a truly utility infrastructure will look like. Another characteristic is of course the use of open source software, which represents both the commoditisation of infrastructure software, and the right price for utility and on demand scaleable software.</p>

<p><a href="http://www.flickr.com/photos/justincormack/4158836526/" title="Homage to Bernd and Hiller Becher (blue period) by Justin Cormack, on Flickr"><img src="http://farm3.static.flickr.com/2580/4158836526_ba83849e19_m.jpg" width="171" height="240" alt="Homage to Bernd and Hiller Becher (blue period)" /></a></p>

<h2>Current environments</h2>

<p>If we look at Amazon web services, the closest thing so far to a utility computing service (although with some transitional facilities bolted on), the storage abstraction, S3, looks pretty much like what we would expect from the REST style, resource based, supporting the HTTP concurrency model (etags, conditional updates); note it looks nothing like a POSIX file system, the previous abstraction. In the Amazon model, this is the only really persistent, reliable, scaleable, global storage. Other services like EBS are <a href="http://www.elasticvapor.com/2010/05/failure-as-service.html">not fully reliable</a> but can be snapshotted to S3, while SimpleDB is not really a database in the persistence sense, as it has severe limits on size and is not globally available; it is better seen as an abstract index, fulfilling that part of the database requirements without the persistence. Other storage such as local disks is even more ephemeral. I have another blog post in the works on how best to use these persistence models for real applications.</p>

<p>The network architectures in Amazon are similarly what you would expect for building REST architectures, with load balancers and an HTTP cache layer for example. In contrast, Amazon&#8217;s EC2 provides a Xen virtualised PC as the base abstraction which is pretty software architecture agnostic, compared to say Google App Engine, which provides an abstracted language runtime. The more general platform allows more experimentation, and crucially more legacy transition. We have a lot of historic usage of operating systems, security models, coding habits and <a href="http://natishalom.typepad.com/nati_shaloms_blog/2011/03/productivity-vs-control-tradeoffs-in-paas.html">productivity issues</a> that make using &#8220;opinionated platforms&#8221; difficult.</p>

<p>In many ways though, whether the API provided on a utility compute resource is the <a href="http://www.oracle.com/technetwork/java/javaee/servlet/index.html">Java servlet standard</a> as in the <a href="http://code.google.com/appengine/docs/whatisgoogleappengine.html">GAE Java environment</a> or virtual network devices as in EC2, thats really a middleware question, and a question of the amount of environment you have to wrap around your code. The Amazon approach encourages people to repackage services for different language communities on top of their offering. The important parts of the architecture are the parts that support failure, scale, reliability. The most important one of these, often neglected, is explicit management of state. The <a href="http://blog.worldturner.com/worldturner/entry/stateless_computing">server itself</a> must be <a href="http://www.infoq.com/presentations/Runtime-Changes">stateless if it is to be failure proof</a>. All state must be managed in the persistent storage abstraction, and software upgrades by starting up new machines, and upgradeable interface specifications, another REST idea. This is a big cultural leap for people, apart from not being used to actually accounting for every data change the program makes, sysadmins are used to patching systems at runtime, rather than starting up new, tested, instances when code needs to be updated. Unless state is explicit it is not testable or reliable, or known, or recoverable. These things were discovered of course in the <a href="http://www.infoq.com/presentations/Systems-that-Never-Stop-Joe-Armstrong">Erlang community</a> where backups are a sign of poorly designed systems that have not covered all the failure cases properly. More on this side of things in other posts soon too.</p>

<p>Once you have realised that there is only scale-out, all state must be explicitly managed in state abstractions, and failure must be designed for all over, the container shape makes much less difference, as your software architecture will be very similar. Sure there are points of difference, packets versus streams say, sync vs async, different security architectures, different programming languages, protocols, serializations and libraries. The network is really the only important IO point for the utility computation; the server environment can be much simpler in principle than the complex server environments we run predominantly now, and thus able to scale better on more and more commodity hardware, highly parallel and low power that we were talking about before.</p>

<h2>Beyond Compute and Store</h2>

<p>I said before that Amazon SimpleDB was best viewed as an index not a database, and lots of the software we use in a web environment is like that, key-value stores, memcache, and some of the NoSQL solutions are providing this type of function, not concentrating mainly on persistence; obvious product areas here are things like full text search which are not the ultimate data source. Others are trying to provide &#8220;small item persistence&#8221;, although most of these systems are not yet being aimed at utility computing provision, so they have not tended to cleanly separate store and index for example, and persistence is mainly aimed at hard drives. Given the historical popularity of databases, and the current flourishing of the NoSQL movement, I can see an opportunity for something like Cassandra with an S3-type backend as a utility computing key-value store. Why not SQL? The main problem is the lack of scalability of JOINs; making it difficult to scale as a true elastic utility service. In addition, I can see a place for a low latency reliable transaction log service built on replicated SSD that shifts over automatically to permanent long term storage (please Amazon), to cut the persistence time, especially for small data items.</p>

<p>Other facilities for utility computing include authorization infrastructure, message queues, payment infrastucture and so on, blending into more specialist services sold as SAAS on the infrastructure.</p>

<h2>Beyond Now</h2>

<p>Our current code model looks a bit like running code inside a web server; indeed that is one way to do organize the code, or you can reverse this and run a web library inside the language library, like Node.js does for example. Architectures could change though, perhaps (sometimes) moving the code to the storage for lower latency; the advantage of strongly decoupled code with explicit state management is that it is much easier to move it around; explicit state management with all lasting state over the network has the same properties as functional programming, indeed as we said before is very much like the Erlang functional process and message passing model. This flexibility in where code runs allows another level of infrastructure virtualisation and efficiency, moving code towards requesters or data based on latency and throughput requirements.</p>

<p>I don&#8217;t think this is going to change how programming works a lot, well at least it should be pretty familiar to web programmers, might be some adjustment for the enterprise lot of course. What is going to change is sysadmin. One of the issues now with cloud solutions is they require too much sysadmin to really allow us to scale up our usage. <a href="http://en.wikipedia.org/wiki/Jevon%27s_paradox">Jevon&#8217;s Paradox</a> cannot come into effect and let us expand capacity as price falls if there are additional costs such as admin overhead. The infrastructure providers have managed to cut staff to less than 1 per 4000 machines, but the users of services like Amazon&#8217;s still have a lot of software environment management overhead. I think we can cut that too, substantially, using stateless servers and other changes, but that is another blog post.</p>

<p>This is perhaps a bit of a run through history, trying to make sense of the journey towards utility computing that has been going on for over 45 years but is really just starting to become mainstream, and where we still have the opportunity to build a new utility, which is something that does not happen very often. Some more concrete explorations coming soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2011/03/utility-computing-or-will-the-real-cloud-please-get-off-its-aas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scaling, Security and architecture in 2010</title>
		<link>http://blog.technologyofcontent.com/2010/01/scaling-security-and-architecture-in-2010/</link>
		<comments>http://blog.technologyofcontent.com/2010/01/scaling-security-and-architecture-in-2010/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 18:47:53 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=200</guid>
		<description><![CDATA[This post is about a bunch of stuff I have noticed recently, things that are affecting software and hardware architectures, and security; it is a bit miscellaneous perhaps. As application architectures on the enterprise move towards emulating web scale architectures these trends will affect software more widely. This concentrates on Linux, the operating system the [...]]]></description>
			<content:encoded><![CDATA[<p>This post is about a bunch of stuff I have noticed recently, things that are affecting software and hardware architectures, and security; it is a bit miscellaneous perhaps. As application architectures on the enterprise move towards emulating web scale architectures these trends will affect software more widely. This concentrates on Linux, the operating system the internet is now built on, and how it is modifying the trends to fit with ways of doing things that may be different from what goes on in other communities. Security continues to be more and more important as the environment for applications becomes more hostile.</p>

<h2>Virtualization</h2>

<p>Virtualization mainly started as a way to deal with issues in running multiple services on Windows, due to compatibility issues. This has always been much less of an issue with Linux applications, due to the scale of supporting libraries packaged by distributions. It is still an issue though, for security reasons (apache without suexec for shared hosting still exists, bypassing OS based multi tenancy security, a model that should have gone years ago). KVM, which uses Linux as a hypervisor and uses the hardware virtualization capabilities of newer hardware as now in the Linux kernel, and supported in Redhat Enterprise Linux. I suspect this will gradually overtake Xen and VMWare in areas where only Linux is of interest, due to the built in kernel support; however lighter weight solutions for the security issues such as containers will probably take off instead for many applications where running multiple kernels is unnecessary.</p>

<h2>Containers</h2>

<p>Linux now has a full container model called LXC, similar in principle to BSD jails and Solaris zones. It arrived a bit gradually as a set of patches to namespace various parts of the system such as the process ID space, so a container has its own init process with ID 1 and can have the same IDs as other containers (this also is needed for process migration). There is also a network namespace, so each container has its own loopback device, and independently named network devices (that can for example be bridged back to the host). There is also a read only bind mount which can be used to safely export libraries and binaries to multiple containers with updates done centrally if required; otherwise the container can be managed as a standalone system just sharing the kernel. This environemnt provides a level of secure isolation between containers that solutions such as chroot never had. Processes in containers can be seen from the container host so obviously this needs to be well secured. Because containers do not need hardware support and are very lightweight I think they will grow rarpidly in popularity; they can also run within a virtual machine guest for process isolation inn a virtual environment. Ubuntu 10.04 will have <a href="https://wiki.ubuntu.com/ContainersSpec">full support</a>; earlier versions do work.</p>

<h2>Capabilities</h2>

<p>The old high risk ways of setuid binaries (with broad permissions) are going at last, replaced by a fine grained capabilities system. In principle this means you can drop root capabilities completely, making root an unpriviledged user. There is a <a href="http://ols.fedoraproject.org/OLS/Reprints-2008/hallyn-reprint.pdf">good summary article on this</a> and <a href="http://www.linuxjournal.com/article/10249">another on trying to remove root access</a>. It seems that we will not see pure capabilities based Linux distributions for a while, and will have setuid binaries in general purpose systems, but there is no reason why single application sandboxes should not drop root capabilities in their init process and just use capabilities set in the file system. Fedora seems the furthest ahead in trying this out as a full distribution, and hopefully this will move ahead, adding another security layer in addition to SELinux.</p>

<h2>Sandboxing</h2>

<p>Privilege separation in network applications has been around for a while, but it is starting to spread, with the best example being the <a href="http://blog.chromium.org/2008/10/new-approach-to-browser-security-google.html">Chrome security model</a>. The thing that has really started to change is treating all complex bits of code, such as HTML rendering in Chrome, as potentially hostile as they are likely to be buggy. There is a lot to do to get good security thinking pervasive in application design, but having some well thought out examples is a good start. Currently Linux Chrome seems to offer a <a href="http://code.google.com/p/chromium/wiki/LinuxSandboxing">choice of sandboxing methods</a> of varying effectiveness from a suid helper to using <a href="http://lwn.net/Articles/332974/">seccomp</a></p>

<h2>SELinux</h2>

<p>SELinux has been available in Linux, providing a Mandatory Access Control framework for ten years now, but it has taken that long for it to get really widespread use, mainly pushed by RedHat. Gradually it is extending to other applications, such as mod_selinux for Apache that runs web applications in appropriate security contexts; Postgres SELinux extensions are also available. We are getting to a point when OS security mechanisms can and will be used as they provide the types of security hooks that modern applications need, after a period where we have had applications inventing their own security mechanisms because the OS did not provide the right ones.</p>

<h2>Physicalization</h2>

<p>There was an interesting new buzzword this year: <a href="http://arstechnica.com/business/news/2009/11/basics-of-physicalization.ars">physicalization</a>. Yes just when you tought virtualization was an important new trend, along comes the opposite. What is the idea?</p>

<p>A two socket 8 core server with 16GB RAM and multiple ethernet ports divided into four virtual servers is actually quite expensive compared to four commodity low end boxes. There is a server premium built into the chip manufacture profit model for a start, and also a volume issue.</p>

<p>The price arbitrage is fairly compelling, although the other costs (disks, motherboards, networking) add up and reduce the saving. The example systems are things like <a href="http://www.sgi.com/products/servers/microslice/">SGI&#8217;s Microslice</a> &#8211; yes SGI, that name from the past! This offers dual core but single CPU systems, but with ECC, for significantly lower price and power consumption than typical two way servers, and potentially more throughput per $, for some workloads.</p>

<p>There are even some suggestions that for Linux workloads non x86 architectures (eg ARM) might be competitive for applications that scale out effectively to multiple machines, although I think the risk of introducing these would be high, and there would need to be a big buyer.</p>

<h2>Cloud</h2>

<p>The big coming trend as the world comes out of recession is that cloud computing platforms are cheap, very cheap, compared to in house server provision. Some estimates put it at 20% of cost now, falling to 10% this year. Part of this is economies of scale, part is standardized components and architectural options, and economies of scale in administration. Part of it may be untrue, as there certainly do not appear to be good figures. What is clear is that the SAAS model is compelling for many kinds of product, and fits in with a general movement to charge software as an expense not an investment. There is a lot of hype, and a lot of people have seen the cloud idea before under different names, but the web has produced a viable delivery mechanism, and the uniformity of hosting environments like EC2 cuts costs. Costs such as upgrades are much lower in a SAAS environment too; although the architecture of this software needs to be different to support that.</p>

<h2>Availability</h2>

<p>The last year or so, high availability programming has reached out into awareness a bit. The <a href="http://www.infoq.com/presentations/Systems-that-Never-Stop-Joe-Armstrong">Erlang model</a> has become better known, bringing more awareness of the base elements for building reliable systems such as process supervision. We are starting to see other implementations, such as <a href="http://akkasource.org/">Akka</a>. This is a great move, as availability needs to  move from being a sysadmin and maintenance issue to being a coding issue; for too long effective handling of failure has been ignored by programmers.</p>

<h2>Locks</h2>

<p>As applications start to scale to more threads on multicore CPUs, locking becomes more of an issue. <a href="http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms">Lock-free algorithms</a> are one interesting answer that has emerged that can work well for some  algorithms. Getting past the scaling issues as architectures get more cores needs innovation in lots of areas such as this. Locks are definitely in the sequential areas that limit scaling through <a href="http://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl&#8217;s law</a>.</p>

<h2>Summary</h2>

<p>Software architecture is at an interesting point; the principles of web architecture and the security mindset are gradually feeding into tools and infrastructure and becoming more widespread, and delivery is also changing. Scalable, available and secure systems are the aim.</p>

<p><a href="http://dilbert.com/strips/comic/2009-11-19/" title="Dilbert.com"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/70000/4000/100/74150/74150.strip.gif" border="0" alt="Dilbert.com" width="450"/></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2010/01/scaling-security-and-architecture-in-2010/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>CMS technology choices</title>
		<link>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/</link>
		<comments>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/#comments</comments>
		<pubDate>Sun, 02 Aug 2009 20:40:08 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[CMS]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=33</guid>
		<description><![CDATA[Response to Julian Wraith's "The future of Content Management…" post covering some of my arguments with Jon and some technical decisions that the content management community will have to make to get to that future…]]></description>
			<content:encoded><![CDATA[<p>The future of Content Management is what we make of it right now, it has not been decided or built yet. Remarkably for a market with so many people in it there are no hard and fast rules and nothing definitive. However we are coming to the end of the experimental phase and the hard decisions are going to be made now, and the future for a fairly long period will be determined pretty soon now.</p>

<p>Although the vast majority (but not all) of open source content management systems are continually trying to reinvent the blog, we are talking about internet infrastructure here, and the future of content is going to be open source, like the rest of internet development. I also believe that long term the web project will overwhelm the legacy areas of document management, although it may take some time. Hypertext, the web architecture, XML, HTML, and all those standards are here to stay and to dominate long term. Content management will also become pervasive long term, as the blogging projects show, as the right tools make content management a natural part of workflow. Content management succeeds when it replaces the file and folder paradigm with a content-led paradigm.</p>

<p>In my conversation the other day with <a href="http://jonontech.com/2009/08/01/i-have-a-dream-of-the-cms-future" title="Jon's post on the subject">Jon</a> I was arguing that although we agree on many of the technical issues there are real decisions that need to be made about what needs to be built to get the the content management future. Below are some of my lists of differences. Generally, I think the future of content management is going for the left hand one of the pairs, although some are not clear yet. I have probably missed a lot of the things to determine, but it is a start.</p>

<h2>Architecture &ndash; API differences</h2>

<p>These may cause API and other more significant differences, though some may not matter (eg git can read svn repos, but not vice versa).</p>

<ul>
<li>REST vs SOAP</li>
<li>REST vs Java native interfaces</li>
<li>distributed version control (git) vs file based (SVN)</li>
<li>compositional vs monolithic</li>
<li>structured content vs files</li>
<li>relations vs metadata</li>
<li>web (hypertext) content vs documents</li>
<li>URIs vs referential integrity</li>
<li>web applications with content management vs content management systems</li>
</ul>

<h2>Architecture &ndash; performance differences</h2>

<p>These could have different implementations with different performance characteristics potentially. These are basically IA differences to a large extent, so they do depend on the type of problem being modelled and the modelling process. Models and performance are linked though, and the best we can do is to make parts of this pluggable so that a range of performance characteristics can be used.</p>

<ul>
<li>unstructured vs structured</li>
<li>sparse vs dense</li>
<li>untyped vs typed</li>
<li>NoSQL vs RDBMS</li>
<li>permission hierarchy vs permission graph</li>
<li>scaleable vs local</li>
</ul>

<h2>Development process</h2>

<p>This is key to getting the product to where you want it to be.</p>

<ul>
<li>open source vs proprietary</li>
<li>API driven vs UX driven</li>
<li>ubiquitous content management vs isolated systems</li>
<li>agile vs monolithic</li>
</ul>

<h2>Architecture &ndash; usage differences</h2>

<p>These could potentially just come down to the ways or tools with which components are joined together, maybe they do not affect architecture per se.</p>

<ul>
<li>social media vs controlled content</li>
<li>programming languages (Javascript, XSLT) vs templating systems</li>
</ul>

<p><a href="http://browsertoolkit.com/fault-tolerance.png"><img src="http://browsertoolkit.com/fault-tolerance.png" alt="fault tolerance" width="500px"/></a></p>

<p>6f82f1d2683dc522545efe863e5d2b73</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/08/cms-technology-choices/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Webmachine and web frameworks</title>
		<link>http://blog.technologyofcontent.com/2009/06/webmachine-and-web-frameworks/</link>
		<comments>http://blog.technologyofcontent.com/2009/06/webmachine-and-web-frameworks/#comments</comments>
		<pubDate>Mon, 22 Jun 2009 20:25:28 +0000</pubDate>
		<dc:creator>justin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[erlang webmachine frameworks http REST]]></category>

		<guid isPermaLink="false">http://blog.technologyofcontent.com/?p=28</guid>
		<description><![CDATA[A short post about webmachine, and why it is an interesting web framework.]]></description>
			<content:encoded><![CDATA[<p>I have just started experimenting with <a href="http://bitbucket.org/justin/webmachine/wiki/Home">Webmachine</a>, an Erlang, well, REST framework for lack of anything better to call it.</p>

<p>I think what it does is best explained with the diagram <a href="http://bytebucket.org/justin/webmachine/wiki/http-headers-status-v3.png"><img src="http://bytebucket.org/justin/webmachine/wiki/http-headers-status-v3.png" width="500" alt="http state diagram" /></a>.</p>

<p>All you do is write functions that fill in the decision points in the diagram if necessary (the defaults are mostly sane), and write a list of
dispatch points (pattern matches against the URL in effect). Compared to other REST frameworks such as Rack or WSGI it is much more complete, as it fills in a lot more of the work you need to do, rather than just providing a way to pass back return codes, headers and bodies like many other libraries do. (To be fair some of these add other framework bits of functionality).</p>

<p>I really like this &#8211; it certainly seems the right way to go about the process, and to support the application developer, and I think it is worth a look even if you are not particularly interested in programming in Erlang.</p>

<p>Oh and it can draw you trace pictures of the progress of your requests through the state diagram, which is rather cool too.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.technologyofcontent.com/2009/06/webmachine-and-web-frameworks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Content Delivery Network via Amazon Web Services: CloudFront: blog.edge3.org

Served from: blog.technologyofcontent.com @ 2012-02-04 14:38:07 -->
