This is an old archived post, content maybe out of date, links may be broken and layout may be broken.
This post is highly subjective.
My web CMS architecture separates into 3 circles, Content Management Deployment and Runtime.
The three should be able to operate entirely independently of one another though in an ideal world the Runtime and Content Management may co-exist, for preview of content in the CMS authoring environment.
Take the first circle – content management. It should perform the following roles.
- Authentication and Authorisation (Login to the CMS, control which editors can edit what)
- Versioning (maintain historical content – allow rollback)
- Content and Data types (what content is to be captured and how it should be structured)
- Content Entry (the user interface that is used to enter the content)
Obviously – I am simplifying here. Some CMS may also have some kind of workflow component that defines a process for publishing content.
So with the key point of this post being that the three circles should be able to operate entirely independently of one another it follows that they must communicate in some way.
The job of the deployment circle is: To provide the latest version of all content to the runtime(s).
I’d say the best way to achieve this is to have your CMS output a bunch of files to disc, XML, JSON or whatever – but I’d specify that they should be files and not a database.
For two reasons. First the problem of synchronising files between two places is a problem that has been solved a million times over.
- Bit Torrent Sync
Synchronising relational databases isn’t so easy.
The second reason is that text files are really easy to open up, poke around in, version with GIT, Mercurial etc.
I have a reference file format if you are interested – but at this point we should be discarding historical versions of content and information about who authored it – because the deployment and the runtime circles just don’t care about this.
I’m not into having a relational database anywhere in my ideal CMS architecture. It complicates initial setup – have to install the database, create a schema, setup connection string and so on. If you think cloud, it also makes provisioning new instances of your runtime web application more complicated.
It also removes my ability to use my editor/tooling of my choice if my data type definition is hidden away in a relational database.
The beauty of having a runtime that just receives a bunch of files is that your CMS could be an ASP.NET web application and your runtime could be Java, PHP or whatever you wished.
You may know that I work with Umbraco a lot – and I’ve started to apply some of these concepts to Umbraco.
Upon publish we can serialise our Umbraco content to XML. All integer ID’s are replaced with (X)Paths and property values are scanned for ID’s which again are replaced with paths – this allows Umbraco picker data types to function correctly.
Each node serialises to one XML file – the file is placed in a folder that corresponds to the path within the content hierarchy.
This puts us in a state where we can synchronise this output folder to another file system – but remember that is the job of our deployment circle.
We also discover instances of our ContentConvertor interface at this point, in case we want something other than the DMCF XML that we output (Darren’s made up content format TM), in theory we could have our content represented as XML, JSON, generated as static HTML etc.
Unfortunately this means that we have to write our own runtime “circle”. Umbraco presentation APIs are tied to the Umbraco cache and underlying database schema to some extent. Luckily we can add our own runtime into Umbraco for preview easily – using route hijacking.
What is missing?
In short, loads. By removing the Umbraco runtime from our front end site we lose.
- A content cache
- Examine Indexes
- And lots more….
So for 90%+ of cases we’ll use Umbraco.
Why do all of this?
Mainly to ease some of the complexity of deploying code and content changes into environments. The ability to easily promote and demote code and content just by moving files around is really powerful. It also enables us to build massively scalable websites, that auto scale easily.
Our runtime instances are agnostic of one another, there could be 1 or there could be 500, it just doesn’t matter.
Umbraco is a Swiss army knife and sometimes we just need the corkscrew.
Recently a client (a big one) did a (snail) mailshot to 20 million people. They couldn’t auto scale their call centre so they put a URL on the mailing instead.
The issue here is that they needed to handle (very) high traffic for approximately 10 days. A month later the site was decommissioned completely.
Inside the client network – they have a single server running several Umbraco instances, but when they publish, it just writes some files that are sent to the runtime. It works really well and deals with hundreds of thousands of page view per day.
If massively scalable websites are something you want to do, give me a call.