Categories
Semantics

Stop justifying RDF and RDFa

update The discussion on RDFa in HTML5 is quite active on the WhatWG mailing list, and so I’m closing comments down here, and encouraging the discussion in that location. There is no restriction on joining the mailing list. A place to start would be a thread I started but I’m sure new threads will be springing up.

I did want to apologize for assuming that the XHTML errors I had recently were due to WhatWG members having fun at my expense. I’ve had people deliberately break my XHTML-based comments in the past when I’ve written about XHTML, and the break was documented with a screenshot on the website of a WhatWG member. I put 2 and 2 together and came up with 5.


I was reading the back and forth argument about the support for RDFa in HTML5, when it hit me that we, who support RDF, and its embedded serialization technique, RDFa, are going about it all wrong.

The question that gets asked, repeatedly, in the HTML5 and WhatWG mailing lists is What problem does RDFa solve? This typically then leads to lengthy discussions about RDFa versus microformats, how one only needs relclassmeta, and script in order to seemingly record the same information. Or that marking this information up in any way is unnecessary, as people won’t use it, use it badly or for evil purposes, and the only direction forward for the web is natural language processing…yada, yada, yada—you’ve heard it all before.

But what if we stop focusing on the perceived purpose of RDF/RDFa? What if, instead of defending RDFa as a format for discovery of semantics on the web, in competition with other techniques, we focus on RDF, as others have focused on MathML and SVG—as a rich, mature specification with its own unique purpose, and its own unique benefit? In other words, begin with the assumption that RDF has value in, and of itself, and does not need to be “justified”. Instead, let’s focus on whether HTML5 can support RDF—the rich, mature specification—as is, with the existing HTML5 extension mechanisms.

The quintessential aspect of RDF is the triple of subject, predicate, and object. For simplicity’s sake: the thing, the property of the thing, and the property’s value.

For the most part, the thing is identified by a URI, a Uniform Resource Identifier, in order to distinguish it from every other thing when different instances of data are combined. To repeat the underlying basis of this particular thought experiment, disregard, for the moment, that RDF is used to record semantics. Focus, instead, on the essential structure of RDF data structure. Now ask yourself: can we represent RDF within an HTML5 document, using the HTML5’s current mechanism for extensibility? My assertion in this writing is that the answer is, no.

To demonstrate, let’s look at the RDF/XML output derived from an examination of the RDFa currently embedded in this page. Case in point, the following:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:ns0="http://www.w3.org/1999/xhtml/vocab#"
  xmlns:ns1="http://purl.org/dc/elements/1.1/"
  xmlns:ns2="http://www.w3.org/2000/01/rdf-schema#">
  <rdf:Description rdf:about="http://realtech.burningbird.net/semantic-web/semantic-markup/oh-look-its-not-just-us-semantic-web-dweebs-who-noticed">
    <ns1:title rdf:parseType="Literal"><a xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" href="/semantic-web/semantic-markup/oh-look-its-not-just-us-semantic-web-dweebs-who-noticed">Oh, look. It's not just us Semantic Web Dweebs who noticed.</a></ns1:title>

    <ns1:subject rdf:parseType="Literal">Semantic Web: <a xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" href="/semantic-web/semantic-markup">Semantic Markup</a></ns1:subject>
  </rdf:Description>
</rdf:RDF>

The RDFa from which this RDF model was derived is the following:

<div id="node-572" class="node" about="/semantic-web/semantic-markup/oh-look-its-not-just-us-semantic-web-dweebs-who-noticed">
      <h2 class="node-title" property="dc:title">
      <a href="/semantic-web/semantic-markup/oh-look-its-not-just-us-semantic-web-dweebs-who-noticed">Oh, look. It's not just us Semantic Web Dweebs who noticed.</a>
    </h2>          
     <div class="taxonomy">
      Tagged: <ul class="links inline"><li property="dc:subject">Semantic Web: <a href="/semantic-web/semantic-markup">Semantic Markup</a></li></ul>    </div>
...
</div>

The triple we’ll focus on is that a given story (subject), belongs to a particular category of story (predicate), which is this case is “Semantic Markup”.

In the example, the subject is identified with the about attribute attached to the outer div element, which encompasses the actual text of the story. The predicate associated with the subject is identified in the property attribute, which is attached to a list element (li), and the RDF object is the text, “Semantic Markup”, contained within the list item element’s opening and closing tags. The two element attributes used in this example, which are not a part of HTML5, are “about” and “property”. The question then is: can we use HTML5’s current extensibility mechanisms to record the same data, maintaining the same essential structure, in order to derive the same RDF data model when the page is passed to some RDF extraction mechanism?

Goodness knows it would seem to be a simple way to represent the RDF bits in existing HTML5 attributes. For instance, we could add “subject” as another class item and thus eliminate the need for the RDFa property. We already have the link contained within the list item, which would seem to serve the purpose of identifying the object uniquely, and therefore don’t need about. In other words, HTML5’s extension mechanism would seem to be sufficient. Except, of course, it’s not.

If the data so documented existed solely within the page, I could use the class attribute to denote the RDF property, but is the “subject” I use in my document, the same as “subject” in someone else’s document? Who knows. Other than a similarity of text, we have no idea if they mean anything. This is a critical breakdown, too, because precision of data model is also an essential element of RDF. Otherwise, we wouldn’t be able to combine documents found on the web with any degree of confidence.

However, I suppose we could annotate the “subject” class value with an abbreviation of the domain from which it derives, in this case the Dublin Core domain, or “dc:” for short. By doing so, when you have a dc:subject in your document, and I have a dc:subject in my document, and both documents attach this property to the same subject, then the data can be safely merged. There is no confusion about what each of us “means”, when use use “subject”.

Of course, we’ll then have to negotiate for a shared meaning behind “dc:”. And we’ll have to ensure that everyone in the world uses the same designation for Dublin Core. Then we’ll have to repeat this exercise for every existing and new vocabulary that comes along…

Perhaps the abbreviated designation isn’t as feasible as it would first seem. So, what we’ll do, then, is annotate the subject with the full domain name URI, and still use the class attribute:

<li class="inline node http://purl.org/dc/elements/1.1/subject">Semantic Markup</li>

Well, that’s going to be interesting to see in our web page documents. Of course, we’ll have to duplicate the domain name URI with every reference to the property, increasing the overall size of the document. And, unfortunately, the dozens, potentially hundreds of RDF parsers that already exist will have to be modified to account for the difference in handling between RDFa embedded in HTML5, and RDFa embedded in XHTML, but that’s a small price to pay for HTML5 compatibility. Really. The RDFa processors will have to look at every use of class in a document, which potentially could slow down processing, and make the applications more sluggish, but that’s also a small price to pay.

Really.

So, we’ve accounted for the predicate, the property in our triple. Next, we need the ability to uniquely identify the resource.

A possible HTML5 attribute we could use is rel attribute, supplying the URI for the subject. However, a quick glance at the HTML5 Wiki for Rel and we can see that, though rel can be str-e-e-e-e-tched almost beyond recognitions, there are limits. Our use of rel as a way of recording a specific URI does not fit within the HTML5 boundaries for permissible uses of the attribute, because it’s not a repeating value that we can define in a table ahead of time.

In our web pages, we can point out our sweethearts, our timesheets, our muse, and a crush. We can’t, however, use rel to point to the resource to which a specific RDF property is attached.

If not the rel, how about others of the HTML5 attributes? For instance, a likely named alternative is the id. Would id work?

Currently, the HTML5 specification supports id to identify a web page element uniquely, but only an element specific to the document and the document’s DOM, or Document Object Model. It’s handy for whizzing the element about the page using JavaScript, and playing pretty, pretty with CSS, but how will it combine with, say, the data from a hundred web pages? A thousand?

Well, it doesn’t combine at all, because the id supported in HTML5 is semantically not the same as the URI necessary for RDF. Though the name of the game in HTML5 is “overloading R us”, in this case the meaning of the term must stretch too much in order to successfully encompass both needs.

So, what is wrong with using a hypertext link to identify a resource? And convincing the HTML5 crew to add “rdf-resource” to “sweetheart” and “muse” in the list of valid rel attribute values?

Ah, now that’s where the rubber meets the road when it comes to RDF. This takes us all the way back to the beginning of the discussions about RDF, and the emphasis placed on the fact that a URI is not the same as a URL. And though a URL is an instance of a URI, not every instance of a URI can be safely used in place of a URL. In other words, we can’t depend on using a hypertext link to identify a resource.

OK, then, what about limiting our RDF to those cases where the URI is a URL?

Unfortunately, this also fails to map cleanly between HTML5 and RDF. In the example, the actual hypertext link associated with the list element with the given property of “dc:subject” isn’t the RDF triple subject, at all. That link is associated with the web page leading to a list of related postings. It’s handy, but it doesn’t uniquely identify the subject being described. No, the actual resource, or subject, is the story, itself.

Now, the story is identified by a hypertext link, but the link in this case isn’t attached in any meaningful way to the element containing our “dc:subject” property attribute. More importantly, from a viewpoint of achieving a clean mapping between the RDF model and bits embedded within the HTML5 document, there is no logic or set of rules within HTML5 to associate the two; not in such a way that we can guarantee the same RDF data model with each iteration of usage within an HTML5 document.

We can assume there’s another link containing the URI within the parent block somewhere that uniquely identifies the resource. There is no formal logic, however, nor set of rules that guarantees we’ll always be able to derive the same RDF model, each and every time.

In other words, the extension mechanisms built into HTML5 can’t ensure that the embedded data can then be used to safely derive and return a consistent RDF model.

RDFa, on the other hand, does define these rules. Defines them well enough that I can make minor modifications to my Drupal template to embed the RDF data, and use a packaged PHP-based API to pull this same RDF data back out. Not just myself—anyone wanting to annotate their web pages with RDF could do so, without negatively impacting on any other aspect of the page, or its consumption by other agents, such as browsers. And any application can then pull the data out using any number of language-based APIs. Unfortunately, though, RDFa does not fit cleanly into the current HTML5 specification. It doesn’t fit, and seemingly, is not welcome.

In the recent discussions related to once again having to “prove” the worthiness of RDF/RDFa, HTML5 lead editor, Ian Hickson, wrote the following in a note posted to one of the HTML working group’s email list.

Also, while the solutions we’re designing will almost certainly still be in use decades from now, and will almost certainly influence the solutions in use centuries from now, we are not actually designing the solutions for the problems seen decades from now.

That is to say, we are trying to solve the problems of today and the next few years, with a design that will be extensible in the future by the maintainers of HTML once they know what the problems of the future are. HTML5 is not the end of the road; when HTML5 is widely deployed and used, then we will be able to design HTML6 on top of it. And so forth.

Thus there is no need for HTML5 to have author-usable features for extensibility to solve the problems of decades from now. The extensibility mechanisms for authors (and HMTL5 has many …) should solve _today’s_ problems; and the language should be designed in such a way that the future maintainers of HTML can later extend the language to fix their problems. This is just how HTML4 was done; it’s how CSS was done; it’s how XML was done (you can’t invent new XML syntax, for instance, that would require a new version of XML).

In this writing, I’ve only looked at the most trivial aspects of the RDF model and its RDFa serialization. If HTML5 fails with something as primitive as a simple RDF triple, it will certainly continue to fail for anything more complex. However, the point on this writing isn’t to highlight the shortcomings of HTML5, as a whole, but to demonstrate that the extension mechanisms within HTML5 are not sophisticated enough to handle existing needs. Not some future extensiblity, as Ian notes, but a need that exists today.

RDF is a rich data model with widespread use, documented by a mature specification, supported by any number of tools in any number of applications, in use by any number of companies, for any number of purposes. It is not some lightweight Johnny-come-lately that can be disregarded and ignored because it doesn’t satisfy a small group’s determination of what is, or is not, essential to the web. We don’t have to justify our interest in RDF, and therefore are fully within our rights to ask that it be supported in any web page markup currently under development by the W3C.

Now, if the HTML5 working group wishes to demonstrate that the RDF model can be implemented in HTML5, as is, then they should do so. They should not, though, demand that we give up the RDF model in order to support some other model, just because they don’t happen to see the need for RDF for themselves.

We in the RDF community are not asking the HTML5 working group to support …extensibility to solve problems of decades from now. We’re asking for a solution to a problem that exists today. Now. This very moment.

Categories
Burningbird

Web stats

As of this first week in January, 2009, the web statistics at my five main sites read as follows (only values greater than or equal to two percent are listed):

Burningbird (main page)

Browser stats
Browser and version (if provided) Percentage
MSIE 5.5 4.3%
MSIE 6.0 6.8%
MSIE 7.0 14.6%
Firefox 3.0.5 16%
NetWireNews 8.3%
Safari 6.4%
NewsGator 5.3%
Mozilla 2.7%
Operating System
Operating System and version Percentage
Windows XP 28.7%
Windows Vista 9.8%
Windows 2000 4.9%
GNU Linux 2.2%
Mac OS X 22.2%

Burningbird RealTech (this site)

Browser stats
Browser and version (if provided) Percentage
MSIE 5.5 3.8%
MSIE 6.0 13.8%
MSIE 7.0 8.2%
MSIE 8.0 2.2%
Firefox 2.0 2.0%
Firefox 3.0.5 25.3%
Firefox 3.1 6.4%
Safari 9.5%
Opera 5.9%
Mozilla 3.8%
Operating System
Operating system and version Percentage
Windows XP 39.8%
Windows Vista 9.2%
Windows 2000 5.5%
Linux Ubuntu 3.8%
GNU Linux 2.2%
Mac OS X 25.6%

MissouriGreen

Browser stats
Browser and version (if provided) Percentage
MSIE 6.0 8.8%
MSIE 7.0 29%
MSIE 8.0 2.1%
Firefox 2.0 2.0%
Firefox 3.0.5 14.3%
Firefox 3.1 8.7%
Safari 11.2%
Operating System
Operating system and version Percentage
Windows XP 42.7%
Windows Vista 6.7%
Windows 2003 3.9%
Mac OS X 24.3%

Secret of Signals

Browser stats
Browser and version (if provided) Percentage
MSIE 6.0 8.3%
MSIE 7.0 12.6%
MSIE 8.0 2.2%
Firefox 3.0.5 19.9%
Firefox 3.1 20.5%
Safari 10.8%
Opera 5.5%
*Mozilla 2.0%
Operating System
Operating system and version Percentage
Windows XP 39.9%
Windows Vista 10.2%
Windows 2000 5.5%
Mac OS X 32.8%

Just Shelley

Browser stats
Browser and version (if provided) Percentage
MSIE 6.0 12.1%
MSIE 7.0 29.3%
Firefox 2.0 2.0%
Firefox 3.0.5 24.5%
NetWireNews 16.8%
Safari 6.4%
Operating System
Operating system and version Percentage
Windows XP 38.3%
Windows Vista 13%
Windows 2003 4.4%
Mac OS X 27.6%

Analysis:

I’m not surprised to see the Windows 2000 users, and am assuming the MSIE 6 users among my stats are primarily based in the Windows 2000 operating system. This state may continue into the new year because of Microsoft’s decision to provide MSIE7 to Windows XP users and up, without providing an official upgrade path for those people still using Windows 2000. Not every Windows 2000 machine can easily upgrade to Windows XP. However, if people can’t upgrade their OS, they can upgrade their browser to Firefox 3.x or Opera 9.x, and possibly other, supported, browsers.

As for MSIE 5.5, good golly folks, it’s time to move on. And no, these are not Mac Classic users, as the Mac Classic OS percentage is typically less than 1%, if it shows at all in my site stats. No, I would imagine that most of these people bought a Windows 95 or 98 machine that came installed with 5.5, and the thing is now too infested with viruses for them to use, much less upgrade the software.

Speaking of upgrading, Firefox 2.x users, as of December, Mozilla is no longer supporting your browser. Firefox 3.1 is just around the corner, and is very sexy. Time for you to move, too.

There are few other browser percentage surprises. My primarily tech sites, RealTech and Secret of Signals, feature a larger percentage of Firefox users than my two non-tech sites, MissouriGreen and Just Shelley. What was pleasantly surprising, though, is that Firefox is becoming the dominant browser at the sites. Just Shelley is about the only one still heavily dominated by MSIE.

Safari’s use is increasing, which isn’t surprising because it really is the best Mac OS X general browser, as well as now being available in Windows. Safari/Webkit’s graphics rendering engine is the best, a topic on which I’ll have more to talk about, directly, in a writing I’m doing on SVG.

I would have expected, though, some increase in Opera use. I started last year with Opera at about 5%, and it’s still about 5%. Actually, the lack of change is a little spooky—who ever heard of a straight line in a chart related to the web?

But where’s Chrome? That’s what I thought when looking at the stats, and finally spotted it at under 1% for this site, only. What did the pundits say last year? Chrome was going to be a threat to Firefox? Well, I don’t think we need to dump our Firefox t-shirts just yet.

Based on the trends from last year to now, when I compare this year’s stats against next year’s stats, I predict they will show the following:

  • The number of users of the new Windows 7 operating system will be inversely proportional to the number of Windows Vista users
  • More Chrome users, but Firefox and Safari should still see incremental growth.
  • Fewer MSIE users, with most switching to Chrome or Firefox.
  • After MSIE8 releases, we’ll quickly be able to see who are the MSIE personal users, versus MSIE corporate users, because of the MSIE8 upgrade blocker.
  • We’ll see a significant reduction in MSIE corporate users, as many will get laid off.
  • Mac OS X use will continue incremental growth, and everyone will still be questioning Steve Jobs’ health
  • Opera will continue with 5% of the browser market. Spooky.
Categories
Semantics

Oh look it’s not just us Semantic Web dweebs who noticed

A List Apart has a new article out on the Semantics in HTML5. John Allsopp writes

We’ll start by posing the question: “why are we inventing these new elements?” A reasonable answer would be: “because HTML lacks semantic richness, and by adding these elements, we increase the semantic richness of HTML—that can’t be bad, can it?”

By adding these elements, we are addressing the need for greater semantic capability in HTML, but only within a narrow scope. No matter how many elements we bolt on, we will always think of more semantic goodness to add to HTML. And so, having added as many new elements as we like, we still won’t have solved the problem. We don’t need to add specific terms to the vocabulary of HTML, we need to add a mechanism that allows semantic richness to be added to a document as required. In technical terms, we need to make HTML extensible. HTML 5 proposes no mechanism for extensibility.

On reading of which, I hurt my head by banging it, suddenly and with force, against my desk.

Categories
Stuff

Amazon VOD on Roku

Recovered from the Wayback Machine.

A favorite game with Roku owners is to guess which service will be added to the box, first. The game is now over, because evidently, Amazon’s Video On Demand is going to be the next video entry for the Roku boxes.

This puts the box on par with AppleTV in offerings. Well, actually a little beyond AppleTV, with Netflix streaming. Add Hulu and Roku is a video killer.

Categories
Programming Languages

Practice…but not typing

A post by Karl Martino reminded me of Jeff Atwood’s We are typists first, programmers second. Atwood was responding, in hearty agreement, to a post by Steve Yegge, who wrote

I was trying to figure out which is the most important computer science course a CS student could ever take, and eventually realized it’s Typing 101.

The really great engineers I know, the ones who build great things, they can type.

As I wrote in Karl’s comments, saying that fast typing is what makes a great programmer is little different than saying what makes a good carpenter is how fast they swing their hammers.

Fast typing is a by-product of extensive creation, whether that creation is web page markup, a stylesheet, or code. The more we create code, web pages, and designs, the more efficient we get with all of the tools used, including but not limited to, typing.

In addition, times have changed. I have no doubts that today’s generation of kids are speed demons on the keyboard—whether it’s on their cellphone or attached to their computers. A typing class would most likely slow them down.

If anything, what we should be encouraging is more practice with problem solving—the ability to figure something out on one’s own, without having to Google an answer or ask friends on Twitter—not typing.