March 9th, 2007

A person can't be interested in meta stuff without having to weigh in on the newest Friday darling: Freebase.

What is Freebase? It ain't crack, unless you consider it a virtual form of something to blow up one's nose. According to the founder, Danny Hillis, “We’re trying to create the world’s database, with all of the world’s information…”

Tim O'Reilly is enthusiastic about it:

After all, is there really room for what at first glance appears to be a bastard child of wikipedia and the Open Directory Project, another site that purports to collect and organize all the world's information in one place?

But once you understand a bit about what metaweb is doing, you realize just how remarkable it is. Metaweb has slurped in the contents of several of the web's freely accessible databases, including much of wikipedia, and song tracks from musicbrainz. It then turns its users loose on not just adding more data items but making connections between them by filling out meta tags that categorize or otherwise connect the data items, using a typology that can be extended by users, wiki-style.

Though I won't repeat Nick Carr's bad word, I agree with him on what will be needed:

Of course, relying on a rag-tag band of volunteers, all afflicted with those nasty evolutionary bugs, brings its own problems, particularly in an effort that, unlike Wikipedia, requires a great deal of consistency and precision in terminology. Freebase's ability to attract and manage a human horde will be critical to its success.

How serendipitous, then, that this product release follows right on in the footsteps of the recent Wikipedia scandal. One wants to place one's tongue firmly in cheek when hailing yet another centralized information source that relies, yet again, on masses of happy, responsible volunteers–none of whom have personal issues with needing to be seen as an authority; nor an obsessive need to 'own' the process via 16,000 or so edits.

Danny Ayers takes issue with Tim's mis-categorization of existing semantic web technologies:

I do have to take issue with the good Mr. O'Reilly on one point - his contrast between a bottom-up approach in freebase and the opposite in the W3C's approach is almost diametrically out…The Semantic Web languages were designed to support distributed, bottom-up development (although centralised and top-down also also possible where appropriate) - c.f. fractal ontologies. The "messy sprawl of potentially overlapping assertions" is a feature. On the other hand, freebase appears to be essentially a centralised database.

Michael Arrington had a promising title to his post, and then dropped the ball by joining into the vast euphoric gasp of delight:

Freebase looks to be what Google Base is not: open and useful. I imagine there will be more than one forehead self-smacked at Google HQ tomorrow, as they think “We could have done this.”

Actually, I imagine the folks at Google are laughing. It's the people at Yahoo that are wincing right now.

Returning to Tim's post, according to him, what sets Freebase apart from Wikipedia is that the data is structured:

Much as in wikipedia, any entry listing an entity not already known creates a new page. But these entries are structured — if I add a person, they are of type person. If I add a location, they are of type location. If I add a company, they are of type company. And each of these things comes with certain relationships, and that allows other entries to be automatically updated.

As it stands now with Wikipedia, if you muck up a page, you've mucked up the page. It's tedious but simple to fix the problem: you reverse out the edits. What happens, though, if adding a relationship propagates the error (or deliberate falsehood) to other pages? And then people, assuming such to be correct, add other relationships, tying in yet more entities? At what point can an error be backed out without serious consequences to a significant chunk of the system?

Going back to Tim's description of the application:

But hopefully, this narrative will give you a sense of what Metaweb is reaching for: a wikipedia like system for building the semantic web. But unlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomy approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions.

This paragraph has three misunderstandings and one downright contradiction.

The W3C's approach to the semantic web is to provide the functionality for people to define the relationships between data in an ever expanding universe of loosely related knowledge, using whatever vocabulary works. If none works, create your own. The whole point on having a model such as RDF is to provide flexibility, while still allowing a way of pulling all the bits together. There is no controlled ontology, there is only a rigorous model underlying the vocabularies so that the data can eventually be mapped together. (If wanted–trying to tie all the world's data together will reach a point of diminishing returns.)

In other words, MetaWeb could have used RDF to implement it's functionality, and none of us would know and most people wouldn't care. There is nothing in what the W3C has proposed that's counter to anything MetaWeb hopes to achieve.

However, I have to assume that Tim misunderstood MetaWeb's plans for Freebase's functionality when he talks about any group of people being able to extend or adjust the ontology behind Freebase. Adding a tag is adding data–that's not the same as dynamically extending an ontology. Adding a new category is adding data, not modifying the structure. I find it highly unlikely that Freebase is as 'loose' as Tim seems to think it can be. Not without that charming 'messy scrawl' becoming an ugly ontology war to rival, and beat the drum on, any form of edit wars that happen with Wikipedia.

In addition, folksonomies exist now, but they do so independent of any one individual or organization. The whole point of Yahoo Pipes is that you can take a tag from Flickr and use it to match tagged items in Technorati, pulling both from categories in an RSS feed: Pipes providing the glue to pull together discrete web services in order to output a single desired result. It's based on understood web protocols, one specific type of vocabulary (syndication feeds), and the concept of tagged data. What MetaWeb is providing isn't anything new 'conceptually'. What is new, is the single-ownership.

Finally, there's this: …a wikipedia like system for building the semantic web…

You can't have a Wikipedia and a Web and have the same thing. The whole concept behind the web is separate pieces of data scattered all throughout the world. No one entity controls 'the web': it's very essence is decentralized. That's the power, too, because no one person or group has a lock-hold on what is 'truth'.

Wikipedia is the opposite–it's a big living blob of data, kept in check, barely, by volunteers. Volunteers who edit. A disproportionately small group of users who edit a disproportionately large amount of Wikipedia. Sixteen thousand of those edits were made by a 24 year old who lied about having a PhD in theology, and used "Catholicism For Dummies", as backup for his decisions.

Now we're going to do the same, except that we're going to increase the number of dimensions on which actions such as these can occur, while we again tell the world that it need not go further than this site in order to find truth.

Frankly, that doesn't seem like the way to build the 'intelligent' web of the future to me.

Comments
1
Lists Etc - 4:47 pm 3/9/2007

Wow, what happened to all the different blogs that you had, Shelley, a google search led to a url that just redirected me here. It is strange that when everyone is splitting their one blog into several, based on topic, you seem to be merging them back into one(learnt that from your about page). This design looks simple and pleasant, but if you don't mind my saying it, it looks a little old - with the blocks and all that. I really shouldn't say it since my blog is not a pleasure to behold either :)

Good luck with the ongoing merge.
\me goes back to lurking…

2
Bud Gibson - 5:25 pm 3/9/2007

This may be off topic, but I guess my observation about Google base is that it is at least in part just another interface into their web index. Do a Google base query for reviews, and you'll see that they have effectively scraped "snippets" from a bunch of review web sites and are handing back the results to you.

So oddly, though many decry google base as closed, it's more web like than what's describe for freebase.

3
Bessed - 10:32 pm 3/9/2007

Freebase.com…

Ten of the best sites about the Web site Freebase. Know of another site that should be listed here? Leave your suggestion at the bottom of this page.
1. Freebase - Web site aims to build "an open, shared database of the world's knowledge&#8…

4
Anon - 12:11 am 3/10/2007

Can you not assign to innerHTML when serving pages as xml please, throws and blanks the page…

var oldBody = document.body.innerHTML;
document.body.innerHTML = "" + oldBody + "" + "" + "";

5

"What happens, though, if adding a relationship propagates the error (or deliberate falsehood) to other pages? And then people, assuming such to be correct, add other relationships, tying in yet more entities? At what point can an error be backed out without serious consequences to a significant chunk of the system?"

If these are serious questions, here's one serious answer - assumptive truth maintenance.

6
Shelley - 12:08 pm 3/11/2007

Serious questions. Want to expand on your answer? Not all my readers have taken advanced studies in AI. I sure haven't.

Thanks to all those who have contributed to the discussion. Comments are now closed, but you can contact the author of the post directly.