Media Specs

Notes from writing HTML5 Media

Recovered from the Wayback Machine.

This last weekend I finished my latest book for O’Reilly: HTML5 Media. This is one of O’Reilly’s shorter books (about 100 pages), primarily focused at the eBook market, though you can get a hard copy with print-on-demand.

The book focuses on the HTML5 audio and video elements. I cover how to use the elements in a web page and go into detail on the attributes for each element, as well as cover video and audio codec support. I also devote a couple of chapters on developing with both elements, including how to create a custom control, as well as integrating the media elements with the canvas element and SVG.

In one chapter, I touch on the newest media element API functionality including the brand new and unimplemented media controllers and support for multiple audio, video, and text tracks. Though no browser currently provides support for captions/subtitles, I also explain how to use JavaScript libraries and SRT or WebVTT files to add captions and subtitles to videos.

I enjoyed working on this book. I enjoyed worked with the media elements, though I’m more partial to the video element. Working on the book was also a learning experience—even, at times, an eyebrow raising experience. I thought I would share with you all some of the notes I wrote while working on the book.

WebVTT versus TTML

The WHATWG group started working on a subtitle/caption format based on SRT (SubRip) text format. The original name was WebSRT, but it was recently renamed to WebVTT. The LeanBack Player web site provides a good review of WebVTT.

WebVTT is a pretty basic format, consisting of line numbers, timelines, and text with formatting options. There are plans to add additional capabilities, but what we have now should meet most needs.

There’s been interest in bringing WebVTT over to the W3C. However, the W3C already has a timed text specification, TTML. TTML is an XML based format that is more sophisticated than WebVTT, but also more complicated to use.

I covered WebVTT in the book in detail, but only briefly mentioned TTML. The reason I didn’t spend time with TTML is because of existing support and the industry movement away from XML.

None of the various JavaScript libraries I tested that provided some caption/subtitle support worked with TTML. They worked with SRT or WebVTT, but none that I tried worked with TTML.

Additionally, TTML is an XML format. Now, XML might have been the approach to take a half dozen years ago, when most everything at the W3C was heading in an XML direction. In the last several years, though, we’ve seen the popularity of the RDF/XML serialization fade in favor of Turtle or RDFa, and XHTML2 abandoned in favor of HTML5. SVG is still holding on, but now there’s rumblings of an API that will generate SVG or canvas API calls, and basically hide most of the XMLness of SVG from view. I vaguely remember reading something somewhere that the folks working on TTML were even thinking of creating a JSON version of the spec.

Whether intended or by accident, there is a subtle but noticeable shift away from XML in the W3C. At the same time, there is a strong core of support for XML formats in the W3C. Between both seemingly contradictory paths, I’m thinking we should just skip the interim pain and anguish of yet another format war, and go right to the end point. So I covered SRT and WebVTT and only mentioned TTML in passing.

Protecting the Users from the Big Bad Web Developers

I like HTML5 video and audio, I really do. I had a great deal of fun writing this book. However, despite my affection for these elements I must also admit to some irritation with their design and implementation. (Well, other than the fact that an entire block of the specification changed mysteriously one night, requiring a sudden and unexpected re-write in one of my chapters.)

The part about the HTML5 media elements I like the least is the seeming level of distrust directed at web page authors and developers.

For instance, if you’re creating a custom control and remove the controls attribute, you may think you then have complete control over the media playback. You don’t, though—at least, not in most browsers.

In the section of the HTML5 spec related to the media element’s user interface, implementors are advised to provide playback control in some manner regardless of whether the controls attribute is present or not:

Even when the attribute is absent, however, user agents may provide controls to affect playback of the media resource (e.g. play, pause, seeking, and volume controls), but such features should not interfere with the page’s normal rendering. For example, such features could be exposed in the media element’s context menu.

If you right mouse click on a video element in Firefox, you’re given the options to play or pause the video, mute the volume, play the video in fullscreen, show or hide the controls, as well as save the video or play the video by itself in another page. Chrome provides options to play, pause, or mute the video, as well as show or hide the controls, open the video in another tab, or save the video. Opera’s context menu options are similar to Chrome’s, minus the option to open the video in a new tab. IE10 provides play, pause, mute options, the ability to save the video, and the ability to control playback speed. Safari is the only browser that doesn’t provide context menu options to control the video. At least, not yet.

There is absolutely no way to directly control what does or does not display in the context menu that the browser provides. There is no way to control some of the actions that people can take in the context menu, such as preventing the fullscreen display of the video, if you don’t want it played fullscreen.

If you’re providing custom controls for the video, you have to account for the fact that the video playback is being managed by the context menu as well as your controls. One of my examples in the book provides a video playback control that consists of separate buttons for play, pause, and stop. These controls are disabled based on what action the user takes. It seems like a simple act to just disable and enable the appropriate buttons at the same time you play or pause the video, but you actually have to capture two sets of events: the click events from the buttons, and the play and pause event from the video.

Of course, the amount of extra code to do something like enable and disable buttons based on playback is trivial. But what isn’t trivial is controlling which options are made available to the user. If, for whatever reason, you don’t want the video to be played fullscreen, there is absolutely no way to prevent this from happening with Firefox.

The only way to prevent the context menu from displaying for the video is to provide a transparent div overlay for the video, so that the context menu reflects the div element, not the video. That or turn the video element’s display off, and play the video by redrawing it into a canvas element—a case of overkill, just to be able to control video playback.

The conflict between the context menu and customization isn’t the only web developer/author restriction.

There are the times when the web page author wants the audio or video to begin automatically when the page loads. The media elements do provide attributes for this: autoplay and loop. To ensure automatic playback, the author removes the controls attribute, adds autoplay and possibly loop, and when the page loads, the media element begins playing. The web page author can also remove the audio element completely from display so all that’s left is the sound. The video element is, of course, left displayed, but the control UI should not be showing. We can’t control the context menu options, but at least the control UI isn’t displaying. Well, not unless scripting is disabled, that is.

If the user has scripting disabled, the control UI is automatically re-displayed with the media element—even if you don’t want it to be displayed. If scripting is disabled, you cannot control the visibility of the control UI. According to the HTML5 specification:

If the attribute is present, or if scripting is disabled for the media element, then the user agent should expose a user interface to the user.

I’ve been told by members of the HTML WG that “should” in this context is equivalent to “must”. The two terms are not the same, but I gather that they become one in HTML5 land.

Currently, only Opera provides a visual control UI when scripting is disabled. Firefox doesn’t display a visual control UI when scripting is disabled regardless of whether the controls attribute is present or not. Safari, Chrome, and IE currently do not display the control UI. If “should” is equivalent to “must”, then Firefox, Safari, Chrome, and IE are all in error in their handling of disabled scripting and the media elements. I imagine bugs will be filed, if they haven’t already been filed, and these browsers will also automatically add the control UI when scripting is disabled.

I hear people cheering. You’re cheering, aren’t you? You’re all insanely happy with the power given the end user with the HTML5 video and audio elements.

Most of us remember those times when we opened a web page and some horrid music was blaring, or a video automatically plays with some idiot in a suit talking about his constipation. If we’re at work, we keep our machines permanently muted, lest something embarrassing blare out at an inopportune moment. If screen readers are not sophisticated enough to automatically lower background sound when a page is opened, the background sound competes badly with the reader.

Automatic audio, bad. Automatic video, bad.

I also imagine most of you have forgotten your visits to sites where you expected music or a video to play, and how much you enjoyed a well crafted multimedia experience.

Consider sites devoted to movies. Currently the last of the Harry Potter movies, the latest Transformer, and the new Spielberg Super 8 are playing in movie theaters. All three movies have their own web sites. If you open all three movie sites, you’ll find extensive use of both audio and video media.

The Harry Potter site opens with a preview of the movie with its own custom control that automatically starts playing as soon as it is sufficiently loaded. Among the options not provided with this video are the ability to open the movie out of context of the frame, such as opening the video in fullscreen. At most you can start or stop the video, choose a different video format, or skip the video and go to the site offerings.

The site offerings page has audio playing in the background. In addition, the bottom of the page features an animated video of owls. You can do nothing to stop either.

The Transformer movie site also provides background sound, as well as a video that begins to play automatically and loops continuously in the splash page. When you enter the site, another video plays continuously in the background of the page. Again, sound is used. There is very little about the site that is text-based: it’s all eye and ear candy.

The Super 8 movie site provides an automatically playing trailer with its own control. The page also has background audio when the trailer is finished. One of the sections of the site is the Editing Room. This page features a video playing automatically in an old 8mm style. Once this video if finished, rows of film are displayed. You can click a control that opens another video, again in super 8 style, providing a back story for the movie. You’re provided with controls to play the video and mute the sound.

None of these movie sites provide a context menu for their videos, other than what you would expect to see with a Flash movie. None of the sites allowed you to open the videos in fullscreen or play in a separate tab, because the videos are part of an integrated whole. The sites don’t allow you to switch off audio that I can see. I realize that automatically playing audio can be irritating for some, and can play havoc with screen readers, but again, none of this is unexpected for a movie site.

These types of sites will never be created using HTML5— not because HTML5 isn’t capable of creating most of the effects, but because HTML5 deliberately circumvents finer control over the video element. Can you imagine what would happen with the Transformer site with scripting disabled? The browser would then automatically plunk the control UI over the video in the page, which would ruin the overall effect the page creator was trying to make.

The unfortunate consequence of making HTML5 video and audio unattractive for these sites is that once they start using Flash for one component of the site, they continue to use Flash for every component of the sites. If you open these pages and use a screen reader such as NVDA, the only sound you’ll get is the background audio because every last bit of the site is in Flash: the text, the menus, all of it.

We want these sites to consider using HTML5 instead of just Flash, because if they do, the sites will end up being more accessible rather than less. Yes, even if the HTML5 media elements don’t have a control UI, and audio and video are played automatically. If we want to convince people to use something other than Flash, we need to ensure they have the same level of control that they had with Flash. Currently, the HTML5 video and audio elements do not provide this level of control.

HTML Media and Security

During the recent brouhaha related to WebGL security, the HTML5 editor, Ian Hickson, discovered that the video element, as it was currently defined, would not allow the cross-domain access that the img element provides. In other words, if the video you linked in with the src attribute was not from the same domain as your web page, the video wouldn’t play. This restriction was lifted, and the video (and track) resources are now treated the same as image resources.

However, one of the safety features related to cross-domain resource access was the concept of canvas tainting. If the image or video drawn into a canvas element is from another domain, the canvas is marked as tainted (the origin-clean flag is set to false). When the canvas is tainted, the toDataURLgetDataImage, and measureText methods generate a security exception. You couldn’t circumvent the same-origin restriction by using Ajax, either, because it would not allow cross-domain resource access.

Of course, much of this has changed because of the WebGL security issues. Originally WebGL was limited to using only same-origin image access for canvas textures, but a more recent version of the specification allowed for cross-domain image access. WebGL developers wanted to add images (and potentially video) from other domains as textures for their 3D creations. Unfortunately, when the WebGL specification and implementations enabled cross-domain image access, they also opened up a security violation: the WebGL could be manipulated in such a way as to create a “data leak”, giving the web pages access to actual image (and video) data.

In order to allow WebGL to proceed without having to tackle the functionality causing the data leak (I’m told a daunting task), the WebGL community requested and received a new attribute that can be added to the img, audio, and video elements in HTML5crossorigin. This attribute allows same-origin privileges with cross domain resources, as long as the resource server concurs with this use. This is a concept known as Cross-Origin Resource Sharing, or CORS.

CORS is another specification in work at the W3C. It originated as a way for web developers to access cross-domain resources using XMLHttpRequest (Ajax). The concept has since been expanded to include workarounds for the same-origin security restrictions in other uses, including the newest related to canvas tainting.

It sounds all peaches and cream except that there are issues related to the concept, especially when accessing image and video data from cloud services such as Amazon’s AWS or centralized image systems, such as Flickr. For CORS and the crossorigin attribute to work, these services must be willing to support CORS. The WebGL and other developers assumed the sites would be more than willing to do so. However, I know that Amazon has already expressed reservation about supporting CORS, and I wouldn’t be surprised if there wasn’t some reluctance on the part of other services.

I also had reservations about the breathlessly quick addition of crossorigin to HTML5, starting with the unanswered question, “What would WebGL had done if HTML5 was too far along in the recommendation track to add this change?” I still have concerns about quickly adding in functionality that routes around security protocols because another specification needs to have this functionality because of a security violation. I’ve long been a fan of 3D effort on the web, beginning with the earlier VRML and continuing with my interest in WebGL (I covered it in my Painting the Web book). However, I’m even more of a fan of web security. That and a stable specification. What would have happened if WebGL had made this request after HTML5 had progressed to candidate recommendation status?

Yes, I am a stick in the mud. I like stable specifications and secure web pages. I’m just old fashioned that way.

Anyway, for those wanting to integrate HTML5 video and canvas element, be aware of this very new functionality. You won’t find it included in the HTML5 Last Call document, you’ll only find it in the HTML5 editor’s draft.

Codec Support

You would expect to find tables with audio and video browser container/codec support littering the internet, and you do. The only problem is, none of the tables seem to agree.

Trying to determine exactly what container/codec each browser supports is actually a pain in the butt. I’m sure each and every browser has a page somewhere that explicitly lists what it supports in all possible environments. Wherever these pages are, though, must be one of the better kept web secrets.

It’s not as if there’s a simple yes/no answer to audio or video codec. After all, if you use the HTMLMediaElement’s canPlayType method with various audio or video codecs, you’ll either get a “maybe”, “probably”, or an empty string. Maybe and probably are not normally viewed as decisive words. It also doesn’t help when Chrome answers either maybe or probably to everything.

Then there are the quirks.

Firefox and Chrome only like uncompressed WAV files. Opera and Safari don’t seem to mind compressed WAV files. Technically, though, all four browsers “support” WAV.

Both these statements are true: only Safari supports AAC; Safari, Chrome, and IE support AAC.

If you use a tool such as the Free MP3/Wma/Ogg Converter (, you’re given an option to convert your sound file to several different formats, including AAC and M4A. Many people will tell you AAC and M4A are one in the same. Well, yes and no.

The AAC option creates an AAC file that is packaged in a streaming format called Audio Data Transport System (ADTS). The M4A option is an AAC file that’s packaged in MPEG-4. Since Safari can play whatever QuickTime can play on a system, and QuickTime can play the ADTS AAC file, the AAC file only plays in Safari. Chrome and IE can also play the AAC file, but only if it’s wrapped in the MPEG-4 container, which Safari also supports.

But wait…there’s more!

No, no. I’m just joshing you.

Well, there really is more but I don’t want to be cruel.

The confusion about support is further exacerbated by the politics surrounding container/codec support. Yes, Chrome supports MP4. No, Chrome does not support MP4. Yes, Ogg is the open source community’s fair haired child. No, WebM is the open source community’s fair haired child … they just don’t know it yet. Speaking of WebM, yes, WebM is a video container/codec, but it’s also an audio container/codec—just leave out the video track.

Remember when everything was going to be Ogg and life was simpler?

Anyway, to add to the audio/video container/codec noise on the internet, my own versions of browser/codec support for the HTML5 audio and video elements.

Are they accurate? Sure. Why not.

What day is it?

Popular HTML5 audio container/codec support by browser
Container/Codec IE Firefox Chrome Safari Opera
WAV(PCM) No *Yes *Yes Yes Yes
MP3 Yes No Yes Yes No
Ogg Vorbis No Yes Yes No Yes
MPEG-4 AAC Yes No Yes Yes No
WebM Vorbis No Yes Yes No Yes

*Make darn sure the WAV file is uncompressed

Popular HTML5 video container/codec support by browser
Container/Codecs IE Firefox Chrome Safari Opera
MP4+H.264+AAC Yes No *No Yes No
Ogg+Theora+Vorbis No Yes Yes No Yes
WebM+V8+Vorbis No Yes Yes No Yes

*Google has announced that Chrome will not support H.264. However, there are faint traces of support—ghosts if you will—still left in Chrome.

Official HTML5 Video Mascot

The official HTML5 video mascot is ….

Big Buck Bunny!


The W3C HTML WG decision on RDFa prefixes

Recovered from the Wayback Machine.

One HTML WG decision I agree with is the one associated with Issue 120 on RDFa prefixes.

Considering that RDFa support in XHTML/HTML to this point has made use of prefixes, I don’t understand why we even contemplated not supporting prefixes just because RDFa is being ported to HTML5. Frankly, it’s not the HTML5 WG’s design decision to make—RDFa in HTML5 is a port, the design for RDFa resides with another group.

As for RDFa prefixes being confusing, one of the most fundamental design patterns, in computer tech and elsewhere, is the concept of variable/value pairs, with a shorter, easy to type and remember variable or abbreviation used in place of a longer, more complex value.

Then there’s the fact that RDFa has significant adoption, and dropping support for prefixes will break the web. I’ve heard that this is an important criteria for other HTML5 design decisions. If nothing else, consistency demands we support prefixes.

I could go on, but the proposal to keep prefixes does a commendable job and I don’t need to repeat its arguments.


Maxwell’s Silver Hammer: RDFa and HTML5’s Microdata

Being a Beatles fan, I must admit to being intrigued about the new Beatles box set that will be available in September. I have several Beatles albums, but not all. None of the CDs I own have been re-mastered or re-mixed, including one of my favorite songs, from Abby Road: Maxwell’s Silver Hammer:

Joan was quizzical; Studied pataphysical
Science in the home.
Late nights all alone with a test tube.
Oh, oh, oh, oh.

Maxwell Edison, majoring in medicine,
Calls her on the phone.
"Can I take you out to the pictures,
Joa, oa, oa, oan?"

But as she's getting ready to go,
A knock comes on the door.

Bang! Bang! Maxwell's silver hammer
Came down upon her head.
Bang! Bang! Maxwell's silver hammer
Made sure that she was dead.

I love the chorus, Bang! Bang! Maxwell’s silver hammer came down upon her head…

Speaking of Bang! Bang! Jeni Tennison returned from vacation, surveyed the ongoing, and seemingly unending, discussion on RDFa as compared to HTML5’s Microdata, and wrote HTML5/RDFa Arguments. It’s a well-written look at some of the issues, primarily from the viewpoint of a heavy RDFa user, working to understand the perspective of an HTML5 advocate.

Jeni lists all of the pushback against RDFa that I’m aware of, including the reluctance to use namespacing, because of copy and paste issues, as well as the use of prefixes, such as FOAF, rather than just spelling out the FOAF URI. Jeni also mentions the issue of namespaces being handled differently in the DOM (Document Object Model) when the document is served as HTML, rather than XHTML.

The whole namespace issue goes beyond just RDFa, and touches on the broader issue of distributed extensibility, which will, in my opinion, probably push back the Last Call date for HTML5. It may seem like accessibility issues are the real kicker, but that’s primarily because no one wants to look at the elephant in the corner that is extensibility. Right now, Microsoft is tasked to provide a proposal for this issue—yes, you read that right, Microsoft. When that happens, an interesting discussion will ensue. And unlike other issues, whatever happens will take more than a few hours to integrate into HTML5.

I digress, though. At the end of her writing, Jeni summarizes her opinion of the RDFa/namespace/HtmL5/Microdata situation with the following:

Really I’m just trying to draw attention to the fact that the HTML5 community has very reasonable concerns about things much more fundamental than using prefix bindings. After redrafting this concluding section many times, the things that I want to say are:

  • so wouldn’t things be better if we put as much effort into understanding each other as persuading each other (hah, what an idealist!) so we will make more progress in discussions if we focus on the underlying arguments so we need to talk in a balanced way about the advantages and disadvantages of RDF or, in a more realistic frame of mind:
  • so it’s just not going to happen for HTML5
  • so why not just stop arguing and use the spare time and energy doing?
  • so why not demonstrate RDF’s power in real-world applications?

My own opinion is that I don’t care that RDFa is not integrated into HTML5. In fact, I don’t think RDFa belongs in HTML5. I think a separate document detailing how to integrate RDFa into HTML5, as happened with XHTML, is the better approach.

Having said that, I do not believe that Microdata belongs in the HTML5 document, either. The HTML5 document is already problematical, bloated, and overly complex. It encompasses too much, a fault of the charter, as much as anything else. Removing the entire Microdata section would help, as well as several other changes, but we’ll focus on the Microdata section for the moment.

The problem with the Microdata section is that it is a competing semantic web approach to RDFa. Unlike competition in the marketplace, competition in standards will actually slow down adoption of the standards, as people take a sit-back and see what happens, approach. Now, when we’re finally are seeing RDFa incorporated into Google, into a large CMS like Drupal 7, and other uses, now is not the time to send a message to people that “Oops, the W3C really doesn’t know what the fuck it wants. Better wait until it gets its act together. ” Because that is the message being sent.

“RDFa and Microdata” is not the same as “RDFa and Microformats”. RDFa, or I should say, RDF, has co-existed peacefully with microformats for years because the two are really complementary, not competitive, specifications. Both can be used at a site. Because Microformat development is centralized, it will never have the extensibility that RDF/RDFa provides, and the number of vocabularies will always, by necessity, be limited. Microformats, on the other hand, are easier to use than RDFa, though parsing Microdata is another thing. They both have their strengths and weaknesses. Regardless, there’s no harm to using both, and no confusion, either. Microformats are managed by one organization, RDFa by the W3C.

Microdata, though, is meant to be used in place of RDFa. But Microdata is not implemented in any production capable tool, has not been thoroughly checked out or tested, has not had any real-world implementation that I know of, has no support from any browser or vendor, and isn’t even particularly liked by the HTML WG membership, including the author. It provides only a subset of the functionality that RDFa provides, and necessitates the introduction of several predefined vocabularies, all of which could, and most likely will, end up out of sync with the organizations responsible for the extra-HTML5 vocabulary specification. And let’s not forget that Microdata makes use of the reversed DNS identifier that sprang up, like a plague of locusts, in HTML5, based on the seeming assumption that people will find the following:


Easier to understand and use then the following:

Which, heaven knows, is not something any of us are familiar with these last 15-20 years.

RDFa and HTML5/Microdata, otherwise known as Issue 76 in the HTML 5 Tracker database. I understand where Jeni is coming from when she writes about finding a common ground. Finding common ground, though, presupposes that all participants come to the party on equal footing. That both sides will need to listen, to compromise, to give a little, to get a little. This doesn’t exist with the HTML5 effort.

Where the RDFa in XHTML specification was a group effort, Microdata is the product of one person’s imagination. One single person. However, that one single person has complete authorship control over the HTML 5 document, and so what he wants is what gets added: not what reflects common usage, not what reflects the W3C guidelines, and certainly not what exists in the world, today.

While this uneven footing exists, I can’t see how we can find common ground. So then we look at Jeni’s next set of suggestions, which basically boil down to: because of the HTML WG charter, nothing is going to happen with HTML5, so perhaps we should stop beating our heads against the wall, and focus, instead, on just using RDFa, and to hell with HTML5 and microdata.

Bang! Bang!

I am very close to this. I had started my book on the issues I have with HTML5, and how I would change the specification, but after a while, a person gets tired of being shut out or shut down. I’m less interested in continuing to “bang my head against the wall”, as Jeni so eloquently put it.

But then I get an email this week, addressed to several folks, asking about the introduction of Microdata: so what does the W3C recommend, then? What should people use? Where should they focus their time?

Confusion. Confusion because the HTML5 specification is being drafted specifically to counter several initiatives that the W3C has been nurturing over the last decade: Microdata over RDF/RDFa; HTML over XHTML; Reverse DNS identifiers over namespaces, and URIs; the elimination of non-visual cues, not only for metadata, but also for the visually challenged. And respect. There is no respect for the W3C among many in the HTML Working Group. And I know I lose more respect for the organization the closer we get to HTML5 Last Call.

In fact, HTML Working Group is a bit of a misnomer. We don’t have HTML anymore, we have a Web OS.

We don’t have a simple HTML document, we have a document that contains the DOM, garbage collection, the Canvas object and a 2D API, a definition for web browser objects, interactive elements, drag and drop, cross-document communication, channel messaging, Microdata, several pre-defined vocabularies, probably more JavaScript than the ECMAScript standard, and before they were split off, client-side SQL, web worker threads, and storage. I’m sure there’s a partridge in a pear tree somewhere in there, but I still haven’t made it completely through the document. It’s probably in Section 10. I know there’s talk of extending to the document to include a 3D API, and who knows what else.

There’s a lot of stuff in HTML5. What isn’t in the HTML5 document is a clean, straightforward description of the HTML or XHTML syntax, and a clearly defined path for people to move to HTML5 from other specifications, as well as a way of being able to cleanly extend the specification—something that has been the cornerstone of both HTML and XHTML in the past. There’s no room for the syntax, in HTML5. It got shoved down by Microdata and the 2D API. There’s no room for the past, the old concepts of deprecated and obsolete have been replaced by such clear terms as “Conforming but obsolete”. And there’s certainly no room for future extensibility. After all, there’s always HTML6, and HTML7, …, HTMLN—all based on the same open, encompassing attitude that has guided HTML5 to where it is today.

If we don’t like what we see, we do have options. We can create our own HTML5 documents, and submit “spec text” for a vote. But what if it’s the whole document that needs work? That many of the pieces are good, but don’t belong in the parent document, or even in the HTML WG?

The DOM should be split out into its own section and should take all of the DOM/interactive, and browser object stuff with it. The document should be re-focused on HTML, without this mash-up of HTML syntax, scripting references, and API calls that exists now. The XHTML section should be fleshed out and separated out into its own section, too, if no other reason to perhaps reassure people that no, XHTML is not going away. We should also be reminded that XHTML is not just for browsers—in fact, the eBook industry is dependent on XHTML. And it doesn’t need Canvas, browser objects, or drag and drop.

Canvas should also be split out, to a completely separate group whose interest is graphics, not markup. As for Microdata, at this point, I don’t care if Microdata is continued or not, but it has no place in HTML5. If it’s good, split it out, and let it prove itself against RDFa, directly.

The document needs cleaning up. There are dangling and orphaned references to objects from Web Workers and Storage still littering the specification. It hops around between HTML syntax and API call, with nothing providing any clarity as to the rhyme or reason for such jumping about. Sure there’s a lot of good stuff in the document, but it needs organization, clean up, and a good healthy dose of fresh air, and even a fresher perspective.

Accessibility shouldn’t be added begrudgingly, woodenly, resentfully. It should be integrated into the HTML, not just pasted on in order to quiet folks because LC is coming up.

The concepts of deprecated and obsolete should be returned, to ensure a sense of continuity with HTML 4. And no, these did not originate with HTML. In fact, the use of deprecated and obsolete have been fairly common with many different technologies. I can guarantee nothing but the HTML5 document has a term like “conforming but obsolete”. I know, I searched high and low in Google for it.

And we need extensibility, and no, I don’t mean Microdata and reverse DNS identifiers. If extensibility was part of the system, folks who want to use RDFa could use RDFa, and not have to beg, hat in hand, to be allowed to sit at the HTML 5 table. This endless debate wouldn’t be happening, and everyone could win. Extensibility is good that way. Extensibility has brought us RDFa, SVG, MathML, and, in past specifications, will continue to bring whatever the future may bring.

whatever the future may bring…

Finding common ground? Walk a mile in each other’s moccasins? Meet mano a mano? Provide alternative specification text?

Bang! Bang!

Jeni’s a pretty smart lady.


Arbitrary Vocabularies and Other Crufty Stuff

I went dumpster diving into the microformats IRC channel and found the following:

singpolyma – Hixie: that’s the whole point… if you don’t have a defined vocabulary, you end up with something useless like RDF or XML, etc
@tantek – exactly
Hixie – folks who have driven the design of XML and RDF had “write a generic parser” as their priority
@tantek – The key piece of wisdom here is that defined vocabularies are actually where you get *user* value in the real world of data generated/created by humans, and consumed eventually by humans.
Hixie – i’m not talking about this being a priority though — in the case of the guy i mentioned earlier, it was like or
Hixie – but it was still a reason he was displeased with microformats
@tantek – Hixie – ironically, people have written more than one generic parser for microformats, despite that not being a priority in the design
Hixie – url?
@tantek – mofo, optimus
@tantek –
@tantek – not exactly hard to find
@tantek – it’s ok that writing a generic parser is hard, because not many people have to write one
Hixie – optimus requires updating every time you want to use a new vocabulary, though, right
@tantek – OTOH it is NOT ok to make writing / marking up content hard, because nearly far more people (perhaps 100k x more) have to write / mark up content.
Hixie – yes, writing content should be easy, that’s clear
Hixie – ideally it should be even easier than it is with microformats 🙂
singpolyma – Of course you have to update every time there’s a new vocabulary… microformats are *exclusively* vocabularies
Hixie – there seems to be a lot of demand for a technology that’s as easy to write as microformats (or even easier), but which lets people write tools that consume arbitrary vocabularies much more easily than is possible with text/html / POSH / Microformats today
singpolyma – Hixie: isn’t that what RDFa and the other cruft is about?
Hixie – RDFa is a disaster insofar as “easy to write as microformats” goes
singpolyma – Not that I agree arbitrary vocabularies can be used for anything…
Hixie – and it’s not particularly great to parse either

Hixie – is it ok if html5 addresses some of the use cases that _are_ asking for those things, in a way that reuses the vocabularies developed by Microformats?

Well, no one is surprised to see such a discussion about RDFa in relation to HTML5. I don’t think anyone seriously believed that RDFa had a chance of being incorporated into HTML5. Most of us have resigned ourselves to no longer support the concept of “valid” markup, as we go forward. Instead, we’ll continue to use bits of HTML5, and bits of XHTML 1.0, RDFa, and so on.

But I am surprised to read a data person write something like, if you don’t have a defined vocabulary, you end up with something useless like RDF or XML. I’m surprised because one can add SQL to the list of useless things you end up with if you don’t have defined vocabularies, and I don’t think anyone disputes the usefulness of SQL or the relational data model. A model specifically defined to allow arbitrary vocabularies.

As for XML, my own experiences with formatting for eBooks has shown how universally useful XML and XHTML can be, as I am able to produce book pages from web pages, with only some specialized formatting. And we don’t have to form committees and get buy off every time we create a new use for XML or XHTML; the same as we don’t have to get some standards organization to give an official okee dokee to another CMS database, such as the databases underlying Drupal or WordPress.

And this openness applies to programming languages, too. There have been system-specific programming languages in the past, but the widely used programming languages are ones that can be used to create any number of arbitrary applications. PHP can be used for Drupal, yes, but it can also be used for Gallery, and eCommerce, and who knows what else—there’s no limiting its use.

Heck HTML has been used to create web pages for weblogs, online stores, and gaming, all without having to redefine a new “vocabulary” of markup for each. Come to think of it, Drupal modules and WordPress plug-ins, and widgets and browsers extensions are all based on some form of open infrastructure. So is REST and all of the other web service technologies.

In fact, one can go so far as to say that the entire computing infrastructure, including the internet, is based on open systems allowing arbitrary uses, whether the uses are a new vocabulary, or a new application, or both.

Unfortunately, too many people who really don’t know data are making too many decisions about how data will be represented in the web of the future. Luckily for us, browser developers have gotten into the habit of more or less ignoring anything unknown that’s inserted into a web page, especially one in XHTML. So the web will continue to be open, and extensible. And we, the makers of the next generation of the web can continue our innovations, uninhibited by those who want to fence our space in.


Pinky and the Markup Brains

What ended up being the ultimate irritation of my brief foray into HTML5 land, is that I found out, after careful perusal of my original use of RDFa, that I wasn’t using it incorrectly. However, by the time I got through listening to all the arguments, back and forth, and round and round, I was beginning to doubt whether an angle bracket really looked like < and >. I am correct, aren’t I? These are angle brackets, right?

Of course not. I call them angle brackets, but others call them diamond brackets, and I’m sure someone else, most likely from the UK, calls them elbow brackets or the Queen’s brackets, or some such thing.

However, the back and forth, and round and round, wouldn’t be an issue, could even be a journey of discovery, if it weren’t for the arrogance of some of the participants. Or, what I perceive to be arrogance. Variations of, “But that’s wrong and here’s why”, followed up with references to other specifications that hurt, actually physically hurt just to look at, given in a tone of, “How could you think otherwise?” Or responses based on some absolutely obscene piece of markup minutia, repeated over and over again, in attempts to hammer the point home to we, the seemingly dense as bricks.

The end product of such discussions, though, is that people like myself flee the discussion—literally flee, as if the hounds of hell were chomping at our butts. The downside of running away, though, is we’re left feeling that we have no input, no control over what the web of the future will, or will not, allow. That the web of the future of the web is designed by and for the web designers, and not thee and me.

The real problem, though, has less to do with communication style, and more to do with differing levels of expertise and interest. People like me, who are consumers of specs, are mixed in with people who create the parsers and the browsers, and live and breath, eat and sleep this stuff. What else can we, the consumers, do, though? There seemingly is no way for those of us, on the dumb side of markup, to communicate our concerns, wishes, and desires to the other side. But when we do venture into the lists, we are quickly overwhelmed with the specs, the references, the minutia. Our interests get lost in the fact that we don’t have the language to participate. Worse, we don’t have the language to participate in a field notorious for being both competitive, and impatient.

Unbeknownst to ourselves, we have become Pinky to the markup Brains.

So we consumers flee the lists and leave them to the developers and designers, and the end result is that we have specifications, and eventually implementations, that, well, frankly, scare the shit out of most of us.

Don’t believe me? How else could you explain the Yellow Screen of Death that appears whenever you make a simple mistake in markup for the post you’re writing? Not a helpful error, or an error that gently points out where and why the problem occurs; an error that tries to work with you to correct the problem.

No, it is an ugly error, an angry error, with red on yellow, that screams, “Bad, Shelley! Bad”, before it invariably trails off to uselessness on the right side of the browser. You don’t think an actual person like you and me would have designed a specification that encourages this behavior, or a browser that implements it, do you?

The true irony, though, is when you do voice concerns, or criticism, you’re typically met with, “If you want something, you need to participate in the email lists working on the specifications”, and the cycle begins anew. Narf.