Threadneedle and RSS

The problem with a developer being around during the design phase of an application is that the developer tends to pull things back to an implementation viewpoint – we can’t help ourselves.

However, a discussion about ThreadNeedle and RSS is, I feel, important at this time.

Why am I not creating ThreadNeedle as a new module on RSS (Rich Site Summary)? After all, as webloggers we’re familiar with RSS, weblogging tools already generate RSS files, and we’re used to using aggregation tools that process RSS. Why am I not piggy-backing ThreadNeedle on to the RSS specification?

RSS started as a way of recording information about channels – sources of information of interest. The adoption of RSS within the weblogging community grew out of Dave Winer’s and Userland’s support of RSS as an XML vocabulary to describe individual weblog postings. With RSS, news aggregators can grab this information, providing it for quick purusal.

RSS 1.0 is based on RDF – Resource Description Framework. RDF is, in reality, a meta-language, a way to describe languages so that any vocabulary can be described in RDF. One aspect of RDF is that it can be used to describe XML vocabularies, something we’ve desperately needed since the inception of XML.

In a manner similar to the relational data model being used to describe different business data within commercial database systems, with RDF you can create different vocabularies for different business uses, and the same tools and technology can work with each. So, I can create a RDF vocabulary for a post-content management system, and a vocabularly for ThreadNeedle, and process both with the exact same Java and Perl APIs as I can use with RSS 1.0. For instance, I’ve processed RDF from all three types of XML documents using Jena (Java API) with absolutely no change to the code I used.

Very powerful. Very handy. What’s been missing from XML since day one.

Best of all, through the use of “namespaces” – ways of identifying which elements belong to what vocabularly – I can combine different vocabularies in one document and the namespace designation prevents element collision: two elements with the same name from two different vocabularies combined in one document.

Within RSS, the use of namespaces is being used to add “modules” to the RSS specification -new additions to the vocabulary to record information about new types of sites, such as WikiWeb. These modules are, in reality, new vocabularies that can stand alone, but are meant to be used with RSS. With this, the core RSS specification doesn’t need to be modified to meet new business requirements (i.e. aggregate information from WikiWeb sites).

Good stuff.

However, RSS has a specific business purpose – to aggregate information from various sources of information, including weblogs, and to allow subscription to same. The point of focus of RSS is a specific news source – a weblog or a WikiWeb or a web site (technically referred to as “channel” within RSS) – and vocabulary elements become adjectives of same.

ThreadNeedle has a different business purpose. For instance, it’s main entity of interest is the discussion thread, which transcends any one source of any one point on the dialog thread. In addition, there is a connectivity between thread points that is critical information to capture – again something that’s not important from a business requirement standpoint for RSS.

Bottom line: trying to add blogthreading as a module to RSS would be the same as trying to use a banking database for an insurance company application. Yes, both are financial applications and both support customers and have to meet certain levels of accountability (government, stock holders, and so on). However, at this point the similarity ends – the business models differ.

More information:

RSS 1.0 spec
W3C RDF
RDF Primer