Categories
Semantics

And Nerds become queens: Yahoo and Smart searches

Recovered from the Wayback Machine.

Great idea on the part of Yahoo to begin incorporating semantic web information into its search open platform. How deep the semantics will go, and in how many directions is still TBA, but I’m please to see interest in microformat and more structured semantic data via RDF. I’ll be even more pleased when we start to see working examples.

Marshall Kirkpatrick believes that Google will follow suit. I just don’t see it. Google might embrace microformats, but the company has long pit its algorithms against human annotation of data, and the semantic web is based on some human annotation–even if the annotation is based, indirectly, on checking an option in a page.

My biggest concern about all of this is if we were to limit semantics to microformats. It’s with relief that I see that Yahoo is going beyond just microformats into the broader scope of the structured semantics based on RDF and its various serializations. Paul Miller also brings up other needed caveats:

The tools to create and embed that structure need to follow, of course. And issues that efforts like Dublin Core struggled with over a decade ago need to be thrashed out in some more detail, as the malicious, the malevolent, the careless and the mischievous rush to ‘game’ the rich structured data with which their web pages will soon be filled.

Putting pressure on the tool makers is essential, though probably not as essential as it once was because most tools provide a plug-in infrastructure that enables expansion. Still, there’s a lot more that tools can do, which is one reason why I’ve been so interested in Drupal: this tools is definitely ahead of this curve.

What’s key to all of this is showing people what they can get if they go that little extra step. I read people who write reviews on books. If we start showing more intelligent search results based on adding a little additional information to their writings that reflect that the work is a book review of a certain book by a certain author, etc., they will, most likely, be willing to spend a little time adding this additional information.

Someday when I’m looking for a new book to download from the web, I’ll be able to pull up a browser in my Kindle ebook reader and see all the reviews written about this book, online. Everywhere. We are so close to making this work, and I’m not normally the type to to tap dance every time someone comes along, breathing the words “semantic web”, through lips moist with anticipation.

Yahoo should have received a hostile takeover bid a long time ago. Lately, the company has been galvanized.

Categories
RDF Semantics Specs Web

Semantic web: dull as dishwater edition

Recovered from the Wayback Machine.

Mathew Ingram has decided that the problem with the semantic web is that it’s as boring as dry toast. Of course, by Mathew’s standard, all the stuff that makes the web work is also boring as hell. It’s probably a good thing, then, that some people looked beyond the need for immediate titaliation when it comes to the tech underlying this environment, or Mathew’s audience for his opinions would be his immediate family members, and perhaps those neighbors not quick enough to run away when seeing him approach.

He also writes:

It’s all about plumbing and widgets and data standards, all of which have names like FOAF and TOTP and SIOC and whatnot. It’s right off the dork-o-meter. The Lone Gunmen from The X-Files would have a hard time getting interested in this stuff, let alone anyone who isn’t married to their slide rule or their pocket protector.

Now, taking Mathew’s complaints of No glitter! No glitter! Mama, Mama, where’s my glitter! seriously, I decided to put my slide rule down for a sec and see if I couldn’t respond to his one statement about no one knowing what this all means.

First, there was the web. The web was dumb, but it was hyperlinked.

Then, there was search. Search followed hyperlinks, scraped pages, massaged keywords and tested the strength of the links. The web was still dumb, but number crunching helped generate some smarts. Think of your favorite dog. Yeah, that smart.

Next, there was the semantic web. The semantic web says, You and I can derive understanding from this blob of text on this page, but applications can’t. Applications can pull keywords and run algorithms, but can only approximate what this blob of text is all about. What if we add a little information to this blob of text so that applications don’t have to crunch numbers or make guesses as to what we mean?

How do we add a little information? A hundred different ways. We can use microformats, or RDFa, or RDF, or whatever the HTML5 people cook up for us. With this little bit of extra information, applications can access a web page list that’s created with UL/LI elements, but instead of having to look at the text in the list and try to guess what the list is all about, it can read that little bit of data and know that the list consists of recommended books. Perhaps they can take that little list of books and use another application to look up these books at Amazon. Or at their library. Or better yet, click a button and load all the books into our Kindle. (Assuming that Mathew doesn’t subscribe to the Steve Jobs school of, “We don’t read, we aint’ got no books, gimme the vids”, school of thought.)

The little bit of information might, instead, be an address for an event, triggering the browser to add that event information to a desktop calendar application.

It could be information about people we know and how we know them, so that when we move from Facebook, which is today’s darling, to MyPowerBase, we can tell MyPowerBase to add all people who we have defined as friends, but not those defined as just contacts.

If the information is embedded in a photo–wow, information embedded in a photo, how dull–when we upload the photo to a site like Flickr, it could automatically be added to a map, with all the other photos from the same location. It can be pulled up on a search someday, when we ask the web to show us all photos for St. Louis, or for a certain block in St. Louis. Perhaps it can even help us find photos that are licensed Creative Commons so we can steal them.

I might write about a product or company, and the little bit of information I add to my post might help others who are thinking of doing business with the company, or buying that product. Sure, search engines can scrape the content and try and gleam useful bits based on keywords such as the product or company name, but we’ve all had enough really strange search results to know how far search can go, no matter how brainy the algorithm.

Someday, I’ll be able to write about movies and add just a little bit of extra information, and we can do the same for movies. Or music. Or cooking recipes (“give me all recipes on the web that use apricot jam and bourbon, but I don’t want chicken”). Or even poetry, though don’t mention poetry around Sir Tim–it makes him peevish.

Mathew is very addicted to FriendFeed, which allows him to pull in all the activities of his friends in various places. I bet if we scratched the surface of this application, a lot of the data that makes the application tick comes courtesy of the semantic web dorks.

I could go on and on, but I’ve already been away from my slide rule too long. Instead I’ll end with the best for last: because all of these different ways of adding that tiny little bit of useful information to blocks of text or photos or video files or what have you are based on agreed upon specifications, we can use applications to merge this data and use it for something new; something we haven’t thought of yet. See, now that’s when it really gets exciting because rather than coming up with an idea and then taking five years to get enough data to test it, we’ll already have the data, at no extra effort or cost.

Maybe I’ve been cooped up in my cube with my computers and code for too long, but that strikes me as kind of interesting. In a dorky sort of way.

Categories
RDF Specs SVG XHTML/HTML

Our bouncing baby markup has growed up

Recovered from the Wayback Machine.

On today’s tenth anniversary of the birth of XML, Norm Walsh writes:

I joined O’Reilly on the very first day of an unprecedented two-week period during which the production department, the folks who actually turn finished manuscripts into books, was closed. The department was undergoing a two-week training period during which they would learn SGML and, henceforth, all books would be done in SGML…My job, I learned on that first day, would be to write the publishing system that would turn SGML into Troff so that sqtroff could turn it into PostScript. “SGML”, I recall thinking, “well, at least I know how to spell it.”

Ah yes. “Unix Power Tools” was formatted as SGML, the one and only book at O’Reilly I worked on that wasn’t in a Word format. I must express a partiality to my NeoOffice, though the SGML system was ideal for cross-referencing and indexing. OpenOffice ODT, or OpenDocument text, will be the most likely format for the next UPT. Just another example of the permanent/impermanence of web trends.

Norm also mentions about HTML5 possibly being the nail in this child of SGML’s coffin, but as I wrote recently, the folks behind HTML5 have solemnly assured us this specification also includes XHTML5. I’d hate to think we’re giving up on the benefits of XHTML just when they’re finally being realized by a more general audience.

Of course, I’m also fond of RDF/XML, which seems to cause others a great deal of pain, the pansies. And I’ve never hidden my SVG fandom and SVG is based in XML. I must also confess to preferring XML over JSON–you know, good enough for granddad, good enough for me. Atom rules. Or is that, Atom rocks? I’m also sure XML has squeezed between the joints of many of my other applications, and I just don’t know it.

Categories
People RDF

Accidental friendships

Recovered from the Wayback Machine.

I tried out one of the applications for Google’s new Social Graph API. The application looks for XFN or a FOAF file connected to your weblog to see who you connect to, and who connects to you.

I don’t have XFN or a FOAF file. I did have one once, though, under my old URL, weblog.burningbird.net, so I tried that URL. No connections outward, of course, since I haven’t had a FOAF file for quite a while. There were, however, a few connections incoming. Just a few–alas, I am so friendless in this friend-saturated environment.

All but one of the incoming connections were from people I know well, though unlike stated in one connection, I’ve not physically met. The only unknown in the list was artisopensource.net. I have no idea who this is and I don’t necessarily recommend that you click the link, either.

The concept of some global space to pull together friends and colleagues does sound intriguing except that, as we’ve discussed in the past in regards to FOAF files, the linkage is one way. Unless both parties maintain a FOAF file and list each other equally, the one-way connection implies nothing.

However, taking this information out of this context removes the known FOAF caveat and we’re left with applications taking a connection at face value: I have physically met Phil, artisopensource.net is a ‘friend’. More importantly, as the years go by our ‘connections’ do change, yet we’ve long known that Google is unwilling to give up any ‘old’ data. I can imagine joining some new social network only to find out the network has sent an invite to be ‘friends’ with the woman who fired you, or the former boyfriend you went through a painful breakup with.

I think the idea of social networks consuming or producing a FOAF file so you can move your ‘social graph’ around from network to network is a good idea. Persisting such information in a centralized store where you have no control over the data does not strike me as …a major step in the development of what I’ve called “the Internet Operating System.” (And what’s with the eblog without the ‘w’ and why is Norm Walsh claiming to be me?)

From what I can see of the associated group forum, I am not the only person raising concerns about the application. (Hey Julian, hey Danny–why aren’t you my friends?) There’s surprisingly few messages in this group considering the fooflah this new application has generated in the buzz sheets. One message mentioned about utilizing this in their medical research, which reminded me that Google now wants to collect health information about all of us in the future, too.

FOAF Papa Dan Brickley and Danny Ayers both say this is the start of interesting times. I agree that there is something interesting about the first web-wide aggregation of semantically annotated data. My concerns are about the focus has been on the data and the functionality, with little consideration of the consequences.

I would also hate to think that the only semantic web possible is one controlled by Google, because it’s the only company with the resources necessary to aggregate all of the separate bits.

On a separate note: Hey! How about that Microsoft/Yahoo thing?

Categories
Burningbird RDF

Stripes

A couple of people have noticed the new look for the weblog, including the stripes. They’re now mentioned in my will.

After much fussing around, I took my color sampling of the photo and used it to create five stripes, each with a different color sampled from the photo, and created a stretch header. I also removed the comment graph. I found the graph to be too distracting, in more ways than one. First of all, it cut across the photo. Secondly, it was like watching your favorite aunt’s heart monitor as she lay on a hospital bed: will it beat, or not? Will it? Won’t it? Will it? Won’t it?

Really, the only heart that should beat in this space is my own.

My next twistie is I’m adding metadata using the RDF in the EXIF portion of the photos in order to drive out a footer to go with the header image. Remember, I can drop my photos into a folder and they’re automatically included, code pulling out color, size, and now metadata in order to ‘present the page’. Is the information cached? Sure–within the photo, each of which becomes a little mini-dataset.