Danny Ayers writes an interesting thread related to a talk that Marten Mickos, CEO of MySQL, gave at the Web 2.0 conference summit. According to Greg Linden, who was at the conference:
The idea is that "structured data should be open sourced", linked, and easily accessible. The idea is to do something like Google does for unstructured data (web documents) for structured data (database records).
This is not a new idea. People usually talk about this as querying heterogeneous distributed databases. The trick is matching up disparate data definitions and smoothing over bad data. And that is quite a trick.
As Danny and others mention, what Mickos is discussing is using a structured data format for interoperability (can we say 'RDF'), as well as a querying method that can work with heterogeneous data sources (can we say 'SPARQL'). Ian Davis wrote:
This is where it became evident that there is a deep disconnect between the traditional database community and the semantic web community. Mårten’s response was rather vague, that this wasn’t as broad as the semantic web and that the semweb includes unstructured data so wasn’t appropriate.
What a shame and what a failure of the semantic web community if the CEO of MySQL AB cannot see how his vision for an interconnected web of data is the same as ours! We must try harder and demonstrate at all levels the value of the semantic web approach to people like Mårten. SWEO and SWIG will help, but the convincing arguments will come from the practical applications of the semantic web being developed to solve real world problems.
A big Amen, Ian.
Danny, in the interests of 'any data is better than no data', is willing to take Mickos' big mother MySQL database for a spin:
Ok, so how would you go about making a distributed RBDMS that might work on a global scale? Well you might want to start with keys that will work in such an environment. No need to invent any GUIDs, just reuse the web's ID field, URIs. What about table (relation) structure? There's obviously going to be a problem trying to create top down schema that could work in such a diverse environment as the world. So you need to break things down into a minimal form, i.e. binary relations, and allow them to be interconnected. How can you enable interlinking on such as scale? Identify the relations with URIs too. Keep going for 5 minutes and you've got RDF. You'd probably want a query language that worked against it too, and maybe even like it to look like SQL. Go on, call it SPARQL. Deploy these on HTTP (which is also based on URIs) and you've got a Web of Data, the Semantic Web.
Hee. Sneaky semantic web people.
Danny just scratched on the issues associated with a global relational data store. One major difference between the relational model and RDF is that the relational model assumes data agreement before mapping; RDF assumes that data agreement will happen sometime, but isn't too terribly worried about it because any data is welcome, and we can use the data we have now while we work things through.
