Categories
RDF

Ignore the fact that it’s working

Uche Ogbuji is another voice raised in the “RDF is too hard, make it more simple” crew that seems to be have reached a crises all at the same time. Perhaps it’s the moon. Maybe it’s the water.

Uche wrote:

I get the feeling that in trying to achieve the ontological purity needed for the Semantic Web, it’s starting to leave the desperate hacker behind. I used to be confident I could instruct people on almost all of RDF’s core model in an hour. I’m no longer so confident, and the reality is that any technology that takes longer than that to encompass is doomed to failure on the Web.

Well damn, there goes my use of MySQL. PHP, too. I’m also working with REST and SOAP. Then there’s syndication feeds–if anyone thinks you can talk about ’syndication feed’ in less than an hour, you don’t know the people associated with RSS, RDF/RSS, or Atom.

Uche also mentions microformats, but as he’s found, these are anchored to whatever structure is used within a web page, and that’s not encompassing enough for all metadata needs. He then goes on to say that he’ll stick with RDF for now, hoping to be able to do what he needs to do without the more escoteric elements getting in the way.

Getting in the way. Hmmm. Well, let’s see:

Mozilla/Firefox has been quietly using RDF for much of its underlying menu structure and other uses for six years or so now.

RDF/RSS, known as RSS 1.0, has been providing syndication feeds for years.

FOAF is used to drive out networking in various environments.

Isn’t there a music site that outputs its data in RDF? I know the government is heavily into it, but that’s not necessarily a recommendation.

As for my own work, I update the metadata in my photographs in PhotoShop, which is used to provide information such as name, description to Flickr when I upload the pictures. When I embed a photo in this page, I create a data store of RDF information for the photos either by accessing this data directly from the photo, or getting it from web service calls from Flickr. This includes translating the EXIF data into RDF/XML format. I then make all of this accessible just by attaching /rdf/ to any post. This is used to drive out Tinfoil Project, and the photo page. In the photo page, I also reference the Google Maps API to use the geotagging included in RDF to pinpoint on a map where the photo was taken. I also have uses to manage my syndication feed, as well as providing references and pointers to other externally associated web pages.

And these are just the beginning of the uses of RDF I’m incorporating into my pages. Best of all, the data that I generate has been picked up by others–I know because I was asked to clean up my use of dates, which I did. Which means then you can use the data however you want.

Every technology has its controversial elements, its more escoteric side. Most technology has aspects that many of the people using it aren’t even aware. RDF is no different, and one can get by using RDF without even once having to become proficient with reification, or use a container. I know this. I have proved this. I have created several applications, have tried to give away code, have written about it time and again and what…not a damn thing. But then, I’m not one of the heads of RDF.

(What makes a person a ‘head’ in RDF? I could define ‘head’ at this moment, but this is a PG 13 weblog post.)

What’s even more frustrating is that when I focused on the more practical aspects of RDF before the specification was even on the street, I did not receive universal approbation from the RDF community for the fact that my coverage of these more escoteric elements was light. Or that I covered this implementation but not that, and so on. Now, these same people are calling out for a ‘kinder, simpler’ RDF.

*bang bang bang* If a technologist falls over in the forest, does she make a sound?

I am giving a talk called “Pushing Triples: An Introduction to Street RDF” at XML 2005, but I’ve about had it with talking.

I was once challenged to put code down to prove a point. So here is my response: put your code on the table, gentleman. Put your code down. I have.

Categories
Events of note Photography

Balloon Glow

Balloon Glow

Last night, I went to the Forest Park Glow: the lighting of the hot air balloons before today’s hound and hare balloon race. It was about the most amazing thing I’d ever seen. The weather was perfect–cool and overcast and without the heat that’s oppressed the area this summer. The crowd was mellow and excited and friendly, and the balloons! Dozens of them, dotting the hill at Forest Park below the World’s Fair Pavilion. I had my camera on my tripod and spent three hours dashing everywhere to take pictures, always with an ear for the signal to call all balloons to light ’em up; chatting with friendly folk every where I went.

More Glow

When I got home, I became quite sick–whether food poisoning or something else I don’t know, though I’m suspecting the something else. Because of it I had to forgo the actual balloon race today; more time to work on the projects, which makes me so very disciplined. Besides, I had so much fun at the Glow last night that I didn’t mind.

Forest Park Balloon Race Friday Glow

That last paragraph used a semicolon. I use these frequently, without being aware that semicolons are bad, according to US usage. Not, though, according to a great article written by Trevor Butterworth, pointed out by Tim Bray. Now, if I could only cure myself of comma overuse.

Crowded Skies

The fall rains started this week, bringing with them the cool of Autumn and the promise of hikes again in the woods among trees heavy with colorful leaves. I have forgotten these walks; this summer has been too long.

Night Sky and Glow

Categories
RDF Semantics Web

Semantic web lite: same great taste, less reified

Most of the time the feeds at Planet RDF reference isolated items with general interest. Other times, though, the thoughts featured strike sparks against each other, leading to a chain reaction whereby everyone jumps in and Things Happen.

Starting a few days ago, people have been referencing two stories, both of which I find very interesting. The first is Kendall Clark’s SPARQL: Web 2.0 Meet the Semantic Web; the second is Ian Davis Internet Alchemy Crises.

Kendall brings up what’s missing in Web 2.0 is a common query language and it just so happens SPARQL is a common query language, backed up by a common data model (RDF) and syntax (RDF/XML). He suggests that the Web 2.0 folks provide an RDF wrapper to their data, and both groups can then benefit from the same query language, which will make things a whole lot simpler:

So what, really, can SPARQL do for Web 2.0? Imagine having one query language, and one client, which lets you arbitrarily slice the data of Flickr, delicious, Google, and yr three other favorite Web 2.0 sites, all FOAF files, all of the RSS 1.0 feeds (and, eventually, I suspect, all Atom 1.0 feeds), plus MusicBrainz, etc.

And this leads us to Ian Davis and a cognitive crises he underwent at the DC2005 (DC as in Dublin Core), as relates to a pissy-ant, pick-a-une problem of dc:creator:

Danbri referred us to work he had done after the last DC meeting in 2004 on a SPARQL query to convert between the two forms. Discussion then moved onto special case processing for particular properties, along the lines of “if you see a dc:creator property with a literal value then you should insert a blank node and hang the literal off of that”. Note that I’m paraphrasing, no-one actually said this but it was the intent.

That’s when my crisis struck. I was sitting at the world’s foremost metadata conference in a room full of people who cared deeply about the quality of metadata and we were discussing scraping data from descriptions! Scraping metadata from Dublin Core! I had to go check the dictionary entry for oxymoron just in case that sentence was there! If professional cataloguers are having these kinds of problems with RDF then we are f…

Ian then recommended paring down RDF into an implementation subset, which focuses primarily on RDF, as it is used to define relationships. This means jettisoning some of the more cumbersome elements of the model — those that tend to send traditional XMLers screaming from the room:

What if we jilted the ugly sisters of rdf:Bag, rdf:Alt and rdf:Alt and took reification out back and shot it? How many tears would be shed?

What if we junked classes, domains and ranges? Would anyone notice? The key concept in RDF is the relationship, the property.

The end result would be an RDF-Lite: a proper subset of RDF that can be upwardly compatible with the model as a whole, though the converse would not be true. If this subset were formalized, then libraries could be created just for this it that would be significantly less complex, and correspondingly leaner, than libraries needed for the full featured RDF.

This, then, leads back to Kendall’s interest in seeing if Web 2.0 couldn’t be wrapped, morphed, or bridged on to RDF and thus allow us to assume one specific data model, and more importantly, one specific query language for use with all metadata easily and openly available on the web–not just the RDF bits. If a simple subset of RDF could be derived, it could be trivial to map any use of metadata into RDF. More importantly, since the capabilities of the technology is never the issue, those generating the disparate bits of XML or otherwise metadata might actually be willing to go this extra step.

True, an RDF-Lite would not have the same inferential power as the fully aspected RDF model, but frankly, most of our general web-based uses of RDF aren’t using this power anyway. And if we can make RDF tastier to the general web developer, we’re that much closer to an RDFalized web. To Kendall, an RDFalized Web 2.0 could be a powerful thing:

How powerful? Well, imagine being able to ask Flickr whether there is a picture that matches some arbitrary set of constraints (say: size, title, date, and tag); if so, then asking delicious whether it has any URLs with the same tag and some other tag yr interested in; finally, turning the results of those two distributed queries (against totally uncoordinated datasets) into an RSS 1.0 feed. And let’s say you could do that with two if-statements in Python and three SPARQL queries.

Pretty damn cool.

Well, not necessarily. What Kendall describes is something already relatively easy to access through Web services. And, as we’re finding, how tags are used with Flickr differs rather dramatically than how tags are used within delicious, and so on. I do agree that being able to do something like all of this with a couple of statements and SPARQL queries would be nifty; but the technology is still going to be limited based on a common understanding of the data being manipulated. Even with something as simple as tags, we have different understandings of what the term means across different applications.

I don’t necessarily agree across the board with Ian, either. For instance, you can take my blank nodes (bnodes to use popular terminology) only if you pry them from my cold dead APIs, but his general points are good. My own recent work has been focusing more on using RDF for its ability to map the relationships, and less on its participation in grander semantic schemes (though the data is available for any person/bot interested in such).

More, I’ve been exploring the capabilities of using RDF as a lightweight, portable, self-contained database–one to a unit, with unit being weblog page. I’ve been steadily pulling bits of metadata out of MySQL and embedding them into an RDF document, which then drives some of this site’s functionality.

There is a line between taking advantage of MySQL’s caching, versus managing my own with RDF but I’m finding that not only is a hybrid solution quite workable: it is a very effective solution for data that is meant to be open, unrestricted, and consumed by many agents.

The best aspect of all is that because of two specific aspects of RDF–ease of capturing a relationship, and the use of a URI to map the relationships correctly–it’s trivial for me to just ‘throw’ more metadata into the pot, and not have to worry about modifying existing tables in my database, or re-arranging a hierarchy and run into possible namespace collision in a straight XML document. I’m also not constrained by being dependent purely on primitive keyword-value pairs, a limitation that makes it difficult for me to make multiple statements about the same noun-object pairs.

It is all becoming very, very fun, and I am busy ripping the guts out of my current weblog tool implementation in order to incorporate the hybrid data store.

All of this effort, though, presupposes one thing: that I have a small subset of classes to manage the RDF bits, and to meet this, I experimented around with RAP (a PHP RDF library) until I had a trimmed, core set of functionality that, by happenstance, would meet Ian’s criteria for RDF-Lite. There isn’t a SPARQL implementation yet, but I know that this is on the way, and when released, I will use it to replace my use of the existing RDQL implementation.

Categories
Stuff

Focusing

I have several tasks that need finishing, and I’ve set myself a deadline to finish them. The projects include an overdue software documentation project (which should give hope to the poor soul waiting on it); an article on syndication feeds for O’Reilly; an outline of a presentation on RDF for XML 2005; a PHP nuSOAP interface for the new Newsgator API for a client. As such, posting will be light until all of it is finished.

 

 

Last night, I did take a break to go to the Forest Park Glow: the lighting of the hot air balloons before today’s hound and hare balloon race. It was about the most amazing thing I’d ever seen. The weather was perfect–cool and overcast and without the heat that’s oppressed the area this summer. The crowd was mellow and excited and friendly, and the balloons! Dozens of them, dotting the hill at Forest Park below the World’s Fair Pavilion. I had my camera on my tripod and spent three hours dashing everywhere to take pictures, always with an ear for the signal to call all balloons to light ‘em up; chatting with friendly folk every where I went.

 

When I got home, I became quite sick–whether food poisoning or something else I don’t know, though I’m suspecting the something else. Because of it I had to forgo the actual balloon race today; more time to work on the projects, which makes me so very disciplined. Besides, I had so much fun at the Glow last night that I didn’t mind.

 

That last paragraph used a semicolon. I use these frequently, without being aware that semicolons are bad, according to US usage. Not, though, according to a great article written by Trevor Butterworth, pointed out by Tim Bray. Now, if I could only cure myself of comma overuse.

 

The fall rains started this week, bringing with them the cool of Autumn and the promise of hikes again in the woods among trees heavy with colorful leaves. I have forgotten these walks; this summer has been too long.

Look for photos from time to time at my Flickr account.

 

Update:

Fine quip and counter-quip on punctuation and alcoholic beverages by Joe Duemer and Trevor Butterworth–the author of the aforementioned article on semicolons–here.

What I want to know from these masters of English and elixir is: if commas are beer and semicolons are single malt whiskey, then what are dashes and ellipses? Are exclamation points the jello shots of punctuation?

Categories
XHTML/HTML

Repeating

Dare Obasanjo writes:

Repeat after me, a web page is not an API or a platform.

Versioning APIs is hard enough, let alone trying to figure out how to version an HTML website so screen scrapers are not broken. Web 2.0 isn’t about screenscraping. Turning the Web into an online platform isn’t about legitimizing bad practices from the early days of the Web. Screen scraping needs to die a horrible death. Web APIs and Web feeds are the way of the future.

Consider it repeated. Just because people are using XHTML for their pages doesn’t mean that they’re following any specific data model. XHTML is meant to be both open and loose. As for screen scraping: ew, ew, ew.