Categories
RDF Semantics

The elephant strikes again

Danny Ayers and Bill de hÓra got into a bit of back and forth on RDF and Bill’s response reminded me again of the blind wise men and the elephant. Each person describes a different creature when asked to describe the whole from just the part they touched: one touching the leg assumes the creature is a large, stumpy creature; the tail, skinny and flexible and so on.

Bill lists out several areas where attention paid by the SemWeb group could increase overall adoption of RDF. In turn:

 

Better communication, algorithms, tools, etc. to map RDF into relational databases. How can we do so and increase efficiency. This one has actually happened, but perhaps not been as marketed. I know that in the Java world, the Jena folks have put a great deal of thought and energy into efficient RDF storage opimized for SPARQL access (do we use Sparql now?). As Danny stated, tests have been made in systems with millions and millions of triples stored, and relatively quick access of same using Sparql–as quick as a relational query against joined relational tables.

I think this is one that has to go back to Bill: exactly what language/API or data store solution did you use that resulted in such slow performance?

Bill’s issues associated with integration was, at least to me, a bit confusing. I don’t think one can say to companies: use RDF thus and thus and you’ll save so much money, because one can’t say the same about the relational model or the network model or even the old hierarchical model. However, give us a specific task, and we might be able to point out where RDF can be used effectively. Or perhaps we’ll say, no, a relational model is better for that purpose. We might even recommend hierarchical–no one model is perfect for every use.

Where Bill seems to be focused is on data interchange: moving data from tool to tool. Well, now that is one I am very familiar with and this is one that can be extremely effectively managed by RDF as much, or more, than a specifically defined XML vocabulary.

Think of a hub-and-spoke system, whereby heterogeneous applications all access data from a common core, which acts as a buffer to prevent hard coded dependencies between these applications. One can define an XML format to be used between each application and the buffer, but this means that each application has to create XML to whatever format is needed for each buffer. Commercial applications won’t do this, so this means the company has to create a mapping–a layer between the application’s API and the system core, which adds to the overall complexity of the system.

The company developers have to do this not because the business model might differ in whatever degree between the core and the application, but because XML is physically dependent, and this means that the mapping has to be just so before the data can even be processed in order to be merged. In other words: there’s a pure tech dependency on the structure of the XML that transcends even the business model.

RDF eliminates this. A statement is a statement is a statement. It removes positional dependencies, very similar to how the relational model removed such when it was adopted over the older network or hierarchical models. If all the spokes ‘spoke’ RDF (sorry, couldn’t resist), then all that the hub/core would need to do is map each to the other, and this doesn’t even require any additional coding if such business mapping is defined using OWL. All that would be required to add a new application that produces an RDF conversion of its data export/service response is the addition of rules to the core’s rules database using OWL.

You can’t get there with relational databases. I know, I worked with EDI (Electronic Data Interchange) for years. You also can’t get there with XML because there is no rules mechanism inherent in XML. Of course not: why over-complicate a nice, clean syntactic format? XML wasn’t meant to be a rules database. Relational wasn’t meant to be a rules database (but does very nicely for pre-defined, isolated business model use). RDF, on the other hand, eats rules for lunch.

Think of Scotty: Use the right tool for the job.

Moving on to Bill’s list, his big item is ORM (Object-Relational Mapping) and widgets, and it’s true, I’ve not necessarily worked with an object-relational mapping or direct form to RDF widgets. As such, I can’t respond to this one, other than to say if people have created these, I’d like to see them myself.

I think what’s a bigger issue, though, is that Bill demonstrates the wise men and the elephant conundrum: Bill has certain items he considers important, including one that I wouldn’t necessarily consider important. He sees ORM as bigger than syndication feeds, yet for all my tweaking of syndication feed folks, I consider feeds probably one of the most significant advances in technology in years. Why? Because it has had a major impact on relatively standard behavior: how we get access to current news.

That’s the deal: Bill is thinking purely as a developer working with specific problems, and wants to see how RDF can help him. If we continue to get caught up in specific tech-to-tech comparions, RDF will continuously be placed into a position of loss: being compared against specific uses of technology for which it really wasn’t designed, and when it comes up short, which of course it will, being discounted in such a way that it’s not considered for use where it would be effective.

What the RDF community needs to do is go beyond just the technology-to-technology mapping and look at the overall picture: where does the semantic web community want to go, and what piece of this best fits RDF?

Sheila Lennon writes about the John Markoff Times’ article on common sense and Web 3.0 and gives her interpretation of what she sees as the semantic web:

Real world: When restaurants are online in realtime (yet to happen), my computer could display Providence restaurants serving cordon bleu tonight at what prices, ask me to choose one (around when?), then make a reservation, reserve a portion of chicken cordon bleu for me, and notify the restaurant’s computer if I’m hung up in traffic. It will not think about chicken cordon bleu. Its mouth will not water.

And — being my agent — it will not suggest I’ve had enough calories already today and should have salad instead.

Spot on. Sheila is looking at the elephant, not the trunk, tail, and so on.

My mistake with my first interest in RDF and the semantic web is I wanted a web of meaning. Perhaps something like this will happen in the future, but I doubt it because from my earlier attempts at getting others involved I found out something important: what I wanted was not what most people wanted. What most people want is what Sheila is describing: systems that work together seamlessly; that integrate immediately; that help us do something we couldn’t do before.

However, they don’t want these systems to do what we can do well, which is have an opinion or differentiate based on nebulous qualities such as taste and desire. We do this very well ourselves, thank you. What we need from computers is to help us sift through the raw stuff and that’s where the semantic web enters the picture.

Ultimately the semantic web consists of data stores in relational databases, hierarchical data structures, RDF, and even tags embedded into a page–anything that gets us closer to realtime data access (“What’s cooking tonight?”), ‘smart’ agents, and the restaurant’s cooking schedule integrating with our portable GPS devices (and given a rule of locations in thus and thus zone is equivalent to so many minutes from restaurant). No one data model will work for all of this, which is why we need to carve out that piece that best belongs to RDF, and stop trying to force it into a relational model replacement, or microformat killer.

(“Woman on camel comes upon man in the desert; crawling on his hands and knees, faintly crying out “Water, water.” She grabs her bladder filled with pomegranate juice and jumps down, holding the the opening to his lips. The man tastes the juice and pushes her away. “I asked for water”, he says.)

Which leads us back to the last issue of Bill’s I’m going to touch on and that is tags and other microformats. I have been one of those who scoffed at microformats, and I still do. I can see the value of tags, but whether you call them ‘keyword’ associations or categories, they’re a way of adding a description to an item, but not necessarily anything more complex. As such, they have value, but only to a point: in many social software tools such as Flickr, people use these as mnemonics more for themselves than others. Their value to a global whole is directly proportional to the amount of time a person puts into the effort, and most people don’t care that much.

Where tags work well is when a group of people get together and mutually decide on one tag to differentiate the association, such as the tags for a conference. People use the agreed on tag because it’s simple (the tag has already been provided), and they can immediately see the usefulness of such (querying on the tag returns items associated with the event).

Thus the same requirements for metadata can be defined for microformats and RDF: we have to make the effort to define such easy. We have to demonstrate value.

Now, one can say microformats make things easier, but I have a real problem with shoving everything under the sun into one single HTML attribute: class. The class attribute is defined to hold values for stylesheet settings or ‘whatever other use devised by user agents’. What we’ve done, as end users, is shove both microformat AND Ajax information into these attributes, and frankly, I don’t know how much this is impacting on and will continue to impact on these ‘user agents’.

For instance, if I have eight different terms within a single class attribute, all properly separated from each other, does Firefox look at each of these, and then spend time reviewing the stylesheet settings to see if any match? If so, and this use of class is scattered all about a biggish page, and the CSS stylesheets are quite large, this strikes me as impacting, perhaps significantly, against page load times.

We are misusing the class attribute, and that’s my biggest pushback against microformats. The same goes for the use of the class attribute with Ajax.

Now, generating an RDF output from metadata associated with a page, and then serving this up when you tack ‘/rdf/’ (or ‘/meta/’) to the end of the page URL strikes me as a superior method of providing metadata to smart agents. The browsers don’t have to deal with microformat creators and RDF/XHTML mappers converting the HTML class attribute into a CLOB, and the resulting metadata doesn’t have to be limited to whatever layout elements are provided through valid uses of HTML. I’ve never had anyone give me a reason why they feel this is wrong.

(Declarative HTML for Ajax developers is another thing, and best left for another post.)

If the pages are dynamically generated, then the metadata page can be dynamically generated. If the pages are static, when they’re created, the metadata page can be created. And if someone comes along and says something about the page being created by hand, I’m going to come back with, “Oh yeah?”

One last issue that Bill had in his list was with the syntax of RDF/XML. We get so hung up on this so frequently that I think it would be a good idea for the W3C to re-visit this one. After all, if the organization is revisiting HTML in the interest of moving forward, there is that much more reason to revisit RDF/XML.

Categories
RDF

Zoe says read this or Scoble gets it

Zoë, via her goduncle Danny Ayers, sent me an email telling me I should write about an excellent Semantic Web Tutorial by Ivan Herman.

I told her, well I told Danny to tell her, that I wrote the Bad Words (”Semantic Web”) once today, and that I may end up banned from *Scoble’s RSS feed aggregator for this. She, through Danny, said no problem–including a cat picture would make it all okay.

So here I am, pointing you to probably the most in-depth and comprehensive tutorial on RDF I have seen (not to mention a fun use of Ajaxian-like technology in presenting it).

Here, also, is the cat picture so that Scoble won’t ban me from his RSS feed aggregator.

* “I mean it, I really mean it this time. If you don’t provide full feeds I’m going to stop reading you! I know I’ve said this 73 times before, but this time I’m serious! I’m re-a-a-a-ly serious. Here I go…I’m going to unsubscribe you…there you go…you’re gone…no more billions or readers because I stopped reading you! No one knows who you are, now. Who are you? Nobody, because I’m not reading you!”

Categories
Semantics

I am an evil woman

I don’t work for Google, therefore I am exempt from the pledge of “Do no evil” and do evil. Daily if I can.

I don’t smoke, fool around with other people’s husbands (or wives, dogs, or horses for that matter); nor do I do drugs, and I drink in moderation. I don’t run old people down in the streets, nor steal candy from children. I pay taxes, stop at red lights, and rarely go over the speed limit. There isn’t much of a chance for evil doing in my day to day living, so I have to exercise my evil doings online. Luckily, there’s much opportunity for evil doings online.

Take my metadata interest. I’ve been a metadata pusher for years now, even before Google made its noble sounding pronouncement. The only thing is, I’ve been able to quietly go about my evil doing because no one knew it for what it is. Yesterday, though, Greg Yardley recognized what I, and others, have been doing and has sounded a clarion call of warning:

“Profiting off user-generated content is Web 2.0 colonialism.”* That sums up how I feel about the much-praised (and widely backed) Structured Blogging initiative, which makes it easy for bloggers to use microformats to mark-up specific genres of blog posts – reviews, classified listings, and so on. Microformats make blog posts machine-readable, which in turn allow them to be used by applications. Jeff Clavier sees Structured Blogging “eventually pushing blogging into richer types of applications – and enabling new types of aggregation.” Indeed – if adopted, it will. Which is what irks me.

What irks Mr. Yardley? The fact that providing metadata will enable organizations to profit from the metadata. More, to do so without his being recompensed:

But I really don’t want to be placed in a position where I get nothing for my small part in someone else’s eight-digit payday. I don’t want to come across like too much of a tool, but if I’m going to structure my content, I need better ways to control its commercial use.

And thus the evil efforts of those people like me are exposed. I lay before you now: a cyber thief; a stealer of data; no, a pusher if you will — trying to lure you all into the power of Meta.

Pssst. You wanna buy a dream. This is a class A dream.

I dunno. I don’t have any money.

You can’t buy this dream with money.

Well then, what do you want for your dream?

I want your metadata.

My metadata?

Yes, when you publish online, just insert this subliminal message into your page and you’ll have bought a piece of the dream.

But that means I’ll have to do a little extra work.

Yes, but isn’t it worth it, for dreams?

Dreams of what?

The Semantic Web

*gasp of horror, sound of footsteps running away*

There I am, being evil again. As Stow(e) Boyd writesthis is a kinder, gentler blogosphere, and my response to Mr. Yardley is neither kind, nor particularly gentle; especially when you consider that he expresses the concerns that others share. So let me put aside my essential evilness for a moment, and see if I can’t get in touch with my inner Mr. Rogers.

I notice that Greg and Stowe both have Technorati Tags at the bottom of their posts. Stowe must do so because he believes in the messy semantic web; Greg specifically mentions why he does, and that’s because Technorati gives him traffic. He also considers that the traffic from Google compensates him for Google exploiting his published material.

Oddly enough, both instances of traffic generation result from semantic web activity, though neither is particularly precise. For instance, I’m sure that Greg has received many visitors from Google for accidental search results; where a happenstance convergence of words from many posts meet some person’s odd, or not, search request. I imagine, also, that he’s received visitors from Technorati for publishing content under the tag name of “General”. But then, haven’t we all?

Is it a case, then, that when we get traffic in error, we should charge both Google and Technorati for using our bandwidth? Or is the important aspect of the exchange the traffic, regardless of accuracy?

You see, that’s where my evilness truly reaches inspired heights: I want to lessen the traffic that both Stowe and Greg get. Yes, I confess–this is my ultimate goal: to steal hits from webloggers.

By attaching more precise and detailed metadata to their posts, and by convincing search engines to become less enamored of their algorithms (or their horribly misbegotten ideas of centralized metadata stores), I hope to decrease the accidental traffic that both Stowe and Greg get.

But, you say, doesn’t this mean that ultimately Stowe and Greg will get visits that are based on true interest in a specific topic? And couldn’t they, in the end, actually get more traffic because of an increased exposure to the true meaning of what it is they are writing? After all, if Stowe writes on an event and marks it with microformats or structured blogging or even RDF, and if Google or Yahoo or MSN eventually catch on and grab this information, when a person enters a request for information on event into a search engine, wouldn’t Stowe’s entry pop up? Now, this match-up occurs only if the person’s search request happens to match the words that Stowe uses, and Stowe’s page rank is high enough to push other entries down that may, or may not, also be about the event.

True, I say.

But then, you say, isn’t this a good thing?

True, I say.

But then, you ponder, where is the evil?

Ah, I reply, with a smile that exposes far more teeth than is normal: Google and Yahoo and MSN and other companies that aggregate this data make money from the results. And, I smirk, we all know that money is the root of all evil. After all, only the homeless are true saints.

But, but, but, you sputter–they make money now, and for a lot less accuracy!

That’s the kicker, I cackle gleefully! Because now the search results are authenticAuthentic is good, I cry, and I only do evil!

So, you murmer–ears twitched, eyebrows furrowed–you’re exchanging authentic for accurate, and by doing this, you’re therefore turning good into evil?

Precisely!

You look perplexed, you look confused and then you say: I’m sorry, but I don’t see the evil in what you’re doing.

I lose my smile, my shoulders slump, my butt droops, and my cat cries. Saddened by your loss of comprehension at my Plan, I can only shake my head and wonder how a woman can continue to do evil when those around her just don’t get it.

Categories
Semantics

The meta wars

For all that people are saying 2006 is going to be the year of this or that, I think that 2006 is going to be the year of metadata, and as such, we’re about to see some of the bloodiest battles in blogging. She who controls the metadata rules the world, and if the sly hints and nay saying I’m reading online are but a tip of the iceberg, what isn’t visible would make the US Democrats and Republicans blanch and give fervent thanks that though they may be politicians, at least they aren’t, thank (God | politically correct non-sectarian object of choice), in the metadata business.

The Structured Blogging Initiative made its announcement yesterday, with a rollout of Structured Blogging plugins for WordPress and Movable Type. I’ve been playing around with these in order to create OutputThis and you can see my test weblogs based in WordPress (and here), Blogger Blogspot weblog and Movable Type. I installed the WordPress plugin in the Testing 2 weblog, and have been playing around with the different types of SB types, such as reviews, lists, and so on.

First a disclaimer: as of this morning I no longer work for Broadband Mechanics. I will be working on OutputThis, adding new functionality and making any fixes to make it a true 1.0 production system; however, I am doing so as a volunteer.

To reassure folks, I am not going to starve by making this move, and no, there is no acrimonious relationship between me and the Broadband folks. But I did find myself constrained in what I wanted to write to Burningbird, what I felt should be written; worried that because of my relationship with Broadband, I could be hurting that company with what I wrote. Now, though I won’t divulge any confidences I received during my tenure with the company, I feel anything that’s out in the ‘public domain’ so to speak, is fair game.

The plugins that Phil, Kimbro, Marc, and Chad provided are some fairly sophisticated bits of coding, and add a rather impressive set of editing interfaces to Movable Type and WordPress. I thought the use of XML templates, or Micro Content Descriptors(MCD) in order to drop in a new plugin interface to be both open and clever. In addition the code is open source (GPL), and can even be incorporated into other tools by pulling out the bits and pieces you want.

I’ve long thought of extending my own RDF/XML metadata generation through the use of templates that can be used to generate the content. Though we differ in how we provide the metadata–my system provides the metadata as pure RDF/XML when you attache an ‘/rdf/’ to any of my posts, while SB is embedded–this approach of providing format descriptions is very adaptable.

Will I alter the SB WordPress plugin to work with Wordform (my fork of WordPress)? No, but that’s because I’ve chosen a different direction in how I work with metadata. In the end, with the help of Danny Ayers, RDF/XML can be pulled from the SB effort, which means none of our stuff is incompatible.

As for the criticisms, all were valid but there’s a couple I want to specifically address.

Niall Kennedy mentioned during the presentation that the generated XML/XHTML/RSS didn’t validate. Good point, comparable to all those folks who said that Technorati’s performance sucked, and the results were unreliable. At the time of highest criticism of Technorati, Dave Sifry said, “We will fix it”. Yesterday, Marc Canter and the SB team responded with, “We will fix it.” Isn’t it nice to know both organizations are willing to acknowledge user concern about application problems with a willingness to repair them? Compare this with Google, whose only response to user exclamations of, “It’s broke!” is .

(To hear or see the response, you have to wear magic Google filters in order to pull it out of the aether. I’m thinking of selling mine on eBay, but then you’d have to have a filter to see the filter offered for sale. It’s very tough to make a buck nowadays. What’s a girl to do to earn money for the holidays?)

Stowe Boyd wrote:

My bet is that Structured Blogging will fail, not because people wouldn’t like some of the consequences — such as an easy way to compare blog posts about concrete things like record reviews, and so on — but because of the inherent, and wonderful messiness of the world of blogging…

I am not sure who is benefited if everyone falling into line and adopting consistent standards for the structure of blog posts. Perhaps companies like PubSub — one of the driving force behind all this — who would like to be able to sort out all the blog posts about hotels, gadgets, and wine out there, and aggregate the results in some algorithmic fashion, and then make money from the resulting ratings and reviews. But I am not sure that it would be a better world for bloggers, or even blog readers.

So I favor the microformat approach, which is messy, puts more of a burden on the blogger, and will require a host of tools to be built to make it all work. But microformats will work blot tom-up — tiny little tagged bits of information buried in the blog posts — as opposed to structurally. And I am betting — as always — on bottom-up.

My first reaction was to say that Stow Boyd wouldn’t be able to find a leafy, green vegetable in a field of lettuce, but that wouldn’t be civil and god knows, we all need to be civil.

So instead what I’ll say is that microformats, which are adding tags to existing elements such as links, and Structured Blogging are not an either/or; same as neither is incompatible with my own RDF efforts. All efforts are bottom up; all efforts are top down; all support a semantic web because at some point, someone has to make a decision to attach a bit of metadata to a chunk of web space. How you do so is irrelevant.

I can easily create SB structured content from microformatted data, and generate microformatted data from SB content, and RDF/XML from both. Piece. Of. Cake.

As for Boyd’s rather unsubtle dig at so-called hidden agendas and why is PubSub doing this et al–might as well as why Technorati just started Explore if not to bite itself a piece of that richly tasting, and potentially fruity, semantic web pie.

Four years ago, the name of the game was weblogging; three years ago it was syndication; last year was search engines and this year, podcasting; next year it will be metadata. Companies will fuss and fidget and claim to be first or best, or that they’re only operating in the best interests of us (with an implication that other companies are not). We know better, but we don’t mind because the more dirt they dish up on each other, the more flowers the rest of us can plant.

Categories
Semantics Travel

Google base II

Made it to the airport, despite moose in the road.

First time at an airport since 9/11. Had to unload each laptop in their own trays, my shoes, my coat, and my camera into separate trays (not to mention my two bags). But security was very nice and helpful. And hey! Wireless everywhere!

I’ve been reading some of the more positive reactions to Google, such as Michael Parekh and Burnham’s Beat. Burnham writes:

As for RSS, Google Base represents a kind of Confirmation. With Google’s endorsement, RSS has now graduated from a rather obscure content syndication standard to the exautled status of the web’s default standard for data integration.

First of all, Base supports uploads in RSS 1.0, RSS 2.0, and Atom, not just RSS 2.0. Regardless, saying that RSS is some form of default standard for data integration is the same as saying that we can have any data we want — as long as it fits into a primitive single level hierarchy and can be defined with a few simple attributes. Sure, go ahead: build a data empire on that. When you’re done, I have a nice 25 million row Access database to sell you.

He also writes:

In addition, it should not be lost on people that once Google assimilates all of these disparate feeds, it can combine them and then republish them in whatever fashion it wishes.

That’s true — so do think about this, because you may or may not like how Google takes your data and ‘morphs’ it. And if you decide to host content in the space that Google provides? Note that doing so turns over royalty free/copyright free access to whatever it is you upload.

Oh but I can hear the little soldiers now: Sharing is good! All right thinking people share! I don’t have time to point to it, but you might remember the lesson that the Corante Between Lawyers learned when ’sharing’ isn’t completely defined.

Parekh waxes ecstatic on how Base is going to allow Google to effectively wipe the floor with any and all big companies online:

This makes Google Base kind of the elephant being described by blind-folded folks:
1. “It’s Online Classifieds” and will go after Craigslist.
2. “It’s Online shopping” and will go after eBay and Amazon.
4. “It’s an Online repository for photos, music and videos” and will go after Flickr, iTunes and others.
5. “It’s a way to tag content” and will go after del.icio.us and others.
6. “It’s a way to to put resumes online” and will go after Monster, Indeed and others.
7. “It’s a way to do online photos, music, videos, etc.” and will go after Flickr, iTunes, and others.
8. “It’s a way to back into online databases, potentially word processors and spreadsheets”, and so go after Microsoft.

And so on. The answer is it can be all of those things. And none of them.

And as a bonus for Google, it takes some wind from the sails of all these potential competitors, Web 2.0 or not.

I would beg to differ that this can …be all those things. Even if by some stretch and perversion of RSS we can squish all these things into a syndication feed format (remember syndication feeds?), to define a technology in terms of companies squashed shows an alarming corruption of technology, where tech is now valued based on market share rather than any form of good use or design or even interest.

Regardless, every time I see the glow of gold in the eyes of folks, there’s this little devil that pops up and says, “Eh, time to go to work, Shelley”.

Google Base is centralized. No amount of ‘Google desktop’ integration will change the fact that the Google imprint exists on any and all of this metadata. If Base folds, so does your data. This is the wrong approach to take.

Even if we can store our metadata locally and upload to Base, trying to shove all the world in a little bitty syndication feed box shows that we’re not even interested in stretching ourselves into a world of really rich data. We’re willing to settle for tags, more tags, and maybe a title or two. Is this what we see for the bright new world of the future of the web?

Where’s the hunger in folks? Is being able to ‘monetize’ a technology all that matters any more.

Bah.

I think Google Base is a fun experiment, and I’m willing to play a little. It will be interesting to see the directory, especially if the company provides web services that aren’t limited to so many queries a day. But I never forget that Google is in the business to make a profit. If we give it the power, it will become the Wal-Mart of the waves–by default if not by design. Is that what you all want? If it is, just continue getting all misty eyed, because you’ll need blurred vision not to see what should be right in front of you.

See what moose do to me? Nothing like a good scare at 3 in the morning to get the creative juices going. See you all in St. Louis.