Media Specs

Notes from writing HTML5 Media

Recovered from the Wayback Machine.

This last weekend I finished my latest book for O’Reilly: HTML5 Media. This is one of O’Reilly’s shorter books (about 100 pages), primarily focused at the eBook market, though you can get a hard copy with print-on-demand.

The book focuses on the HTML5 audio and video elements. I cover how to use the elements in a web page and go into detail on the attributes for each element, as well as cover video and audio codec support. I also devote a couple of chapters on developing with both elements, including how to create a custom control, as well as integrating the media elements with the canvas element and SVG.

In one chapter, I touch on the newest media element API functionality including the brand new and unimplemented media controllers and support for multiple audio, video, and text tracks. Though no browser currently provides support for captions/subtitles, I also explain how to use JavaScript libraries and SRT or WebVTT files to add captions and subtitles to videos.

I enjoyed working on this book. I enjoyed worked with the media elements, though I’m more partial to the video element. Working on the book was also a learning experience—even, at times, an eyebrow raising experience. I thought I would share with you all some of the notes I wrote while working on the book.

WebVTT versus TTML

The WHATWG group started working on a subtitle/caption format based on SRT (SubRip) text format. The original name was WebSRT, but it was recently renamed to WebVTT. The LeanBack Player web site provides a good review of WebVTT.

WebVTT is a pretty basic format, consisting of line numbers, timelines, and text with formatting options. There are plans to add additional capabilities, but what we have now should meet most needs.

There’s been interest in bringing WebVTT over to the W3C. However, the W3C already has a timed text specification, TTML. TTML is an XML based format that is more sophisticated than WebVTT, but also more complicated to use.

I covered WebVTT in the book in detail, but only briefly mentioned TTML. The reason I didn’t spend time with TTML is because of existing support and the industry movement away from XML.

None of the various JavaScript libraries I tested that provided some caption/subtitle support worked with TTML. They worked with SRT or WebVTT, but none that I tried worked with TTML.

Additionally, TTML is an XML format. Now, XML might have been the approach to take a half dozen years ago, when most everything at the W3C was heading in an XML direction. In the last several years, though, we’ve seen the popularity of the RDF/XML serialization fade in favor of Turtle or RDFa, and XHTML2 abandoned in favor of HTML5. SVG is still holding on, but now there’s rumblings of an API that will generate SVG or canvas API calls, and basically hide most of the XMLness of SVG from view. I vaguely remember reading something somewhere that the folks working on TTML were even thinking of creating a JSON version of the spec.

Whether intended or by accident, there is a subtle but noticeable shift away from XML in the W3C. At the same time, there is a strong core of support for XML formats in the W3C. Between both seemingly contradictory paths, I’m thinking we should just skip the interim pain and anguish of yet another format war, and go right to the end point. So I covered SRT and WebVTT and only mentioned TTML in passing.

Protecting the Users from the Big Bad Web Developers

I like HTML5 video and audio, I really do. I had a great deal of fun writing this book. However, despite my affection for these elements I must also admit to some irritation with their design and implementation. (Well, other than the fact that an entire block of the specification changed mysteriously one night, requiring a sudden and unexpected re-write in one of my chapters.)

The part about the HTML5 media elements I like the least is the seeming level of distrust directed at web page authors and developers.

For instance, if you’re creating a custom control and remove the controls attribute, you may think you then have complete control over the media playback. You don’t, though—at least, not in most browsers.

In the section of the HTML5 spec related to the media element’s user interface, implementors are advised to provide playback control in some manner regardless of whether the controls attribute is present or not:

Even when the attribute is absent, however, user agents may provide controls to affect playback of the media resource (e.g. play, pause, seeking, and volume controls), but such features should not interfere with the page’s normal rendering. For example, such features could be exposed in the media element’s context menu.

If you right mouse click on a video element in Firefox, you’re given the options to play or pause the video, mute the volume, play the video in fullscreen, show or hide the controls, as well as save the video or play the video by itself in another page. Chrome provides options to play, pause, or mute the video, as well as show or hide the controls, open the video in another tab, or save the video. Opera’s context menu options are similar to Chrome’s, minus the option to open the video in a new tab. IE10 provides play, pause, mute options, the ability to save the video, and the ability to control playback speed. Safari is the only browser that doesn’t provide context menu options to control the video. At least, not yet.

There is absolutely no way to directly control what does or does not display in the context menu that the browser provides. There is no way to control some of the actions that people can take in the context menu, such as preventing the fullscreen display of the video, if you don’t want it played fullscreen.

If you’re providing custom controls for the video, you have to account for the fact that the video playback is being managed by the context menu as well as your controls. One of my examples in the book provides a video playback control that consists of separate buttons for play, pause, and stop. These controls are disabled based on what action the user takes. It seems like a simple act to just disable and enable the appropriate buttons at the same time you play or pause the video, but you actually have to capture two sets of events: the click events from the buttons, and the play and pause event from the video.

Of course, the amount of extra code to do something like enable and disable buttons based on playback is trivial. But what isn’t trivial is controlling which options are made available to the user. If, for whatever reason, you don’t want the video to be played fullscreen, there is absolutely no way to prevent this from happening with Firefox.

The only way to prevent the context menu from displaying for the video is to provide a transparent div overlay for the video, so that the context menu reflects the div element, not the video. That or turn the video element’s display off, and play the video by redrawing it into a canvas element—a case of overkill, just to be able to control video playback.

The conflict between the context menu and customization isn’t the only web developer/author restriction.

There are the times when the web page author wants the audio or video to begin automatically when the page loads. The media elements do provide attributes for this: autoplay and loop. To ensure automatic playback, the author removes the controls attribute, adds autoplay and possibly loop, and when the page loads, the media element begins playing. The web page author can also remove the audio element completely from display so all that’s left is the sound. The video element is, of course, left displayed, but the control UI should not be showing. We can’t control the context menu options, but at least the control UI isn’t displaying. Well, not unless scripting is disabled, that is.

If the user has scripting disabled, the control UI is automatically re-displayed with the media element—even if you don’t want it to be displayed. If scripting is disabled, you cannot control the visibility of the control UI. According to the HTML5 specification:

If the attribute is present, or if scripting is disabled for the media element, then the user agent should expose a user interface to the user.

I’ve been told by members of the HTML WG that “should” in this context is equivalent to “must”. The two terms are not the same, but I gather that they become one in HTML5 land.

Currently, only Opera provides a visual control UI when scripting is disabled. Firefox doesn’t display a visual control UI when scripting is disabled regardless of whether the controls attribute is present or not. Safari, Chrome, and IE currently do not display the control UI. If “should” is equivalent to “must”, then Firefox, Safari, Chrome, and IE are all in error in their handling of disabled scripting and the media elements. I imagine bugs will be filed, if they haven’t already been filed, and these browsers will also automatically add the control UI when scripting is disabled.

I hear people cheering. You’re cheering, aren’t you? You’re all insanely happy with the power given the end user with the HTML5 video and audio elements.

Most of us remember those times when we opened a web page and some horrid music was blaring, or a video automatically plays with some idiot in a suit talking about his constipation. If we’re at work, we keep our machines permanently muted, lest something embarrassing blare out at an inopportune moment. If screen readers are not sophisticated enough to automatically lower background sound when a page is opened, the background sound competes badly with the reader.

Automatic audio, bad. Automatic video, bad.

I also imagine most of you have forgotten your visits to sites where you expected music or a video to play, and how much you enjoyed a well crafted multimedia experience.

Consider sites devoted to movies. Currently the last of the Harry Potter movies, the latest Transformer, and the new Spielberg Super 8 are playing in movie theaters. All three movies have their own web sites. If you open all three movie sites, you’ll find extensive use of both audio and video media.

The Harry Potter site opens with a preview of the movie with its own custom control that automatically starts playing as soon as it is sufficiently loaded. Among the options not provided with this video are the ability to open the movie out of context of the frame, such as opening the video in fullscreen. At most you can start or stop the video, choose a different video format, or skip the video and go to the site offerings.

The site offerings page has audio playing in the background. In addition, the bottom of the page features an animated video of owls. You can do nothing to stop either.

The Transformer movie site also provides background sound, as well as a video that begins to play automatically and loops continuously in the splash page. When you enter the site, another video plays continuously in the background of the page. Again, sound is used. There is very little about the site that is text-based: it’s all eye and ear candy.

The Super 8 movie site provides an automatically playing trailer with its own control. The page also has background audio when the trailer is finished. One of the sections of the site is the Editing Room. This page features a video playing automatically in an old 8mm style. Once this video if finished, rows of film are displayed. You can click a control that opens another video, again in super 8 style, providing a back story for the movie. You’re provided with controls to play the video and mute the sound.

None of these movie sites provide a context menu for their videos, other than what you would expect to see with a Flash movie. None of the sites allowed you to open the videos in fullscreen or play in a separate tab, because the videos are part of an integrated whole. The sites don’t allow you to switch off audio that I can see. I realize that automatically playing audio can be irritating for some, and can play havoc with screen readers, but again, none of this is unexpected for a movie site.

These types of sites will never be created using HTML5— not because HTML5 isn’t capable of creating most of the effects, but because HTML5 deliberately circumvents finer control over the video element. Can you imagine what would happen with the Transformer site with scripting disabled? The browser would then automatically plunk the control UI over the video in the page, which would ruin the overall effect the page creator was trying to make.

The unfortunate consequence of making HTML5 video and audio unattractive for these sites is that once they start using Flash for one component of the site, they continue to use Flash for every component of the sites. If you open these pages and use a screen reader such as NVDA, the only sound you’ll get is the background audio because every last bit of the site is in Flash: the text, the menus, all of it.

We want these sites to consider using HTML5 instead of just Flash, because if they do, the sites will end up being more accessible rather than less. Yes, even if the HTML5 media elements don’t have a control UI, and audio and video are played automatically. If we want to convince people to use something other than Flash, we need to ensure they have the same level of control that they had with Flash. Currently, the HTML5 video and audio elements do not provide this level of control.

HTML Media and Security

During the recent brouhaha related to WebGL security, the HTML5 editor, Ian Hickson, discovered that the video element, as it was currently defined, would not allow the cross-domain access that the img element provides. In other words, if the video you linked in with the src attribute was not from the same domain as your web page, the video wouldn’t play. This restriction was lifted, and the video (and track) resources are now treated the same as image resources.

However, one of the safety features related to cross-domain resource access was the concept of canvas tainting. If the image or video drawn into a canvas element is from another domain, the canvas is marked as tainted (the origin-clean flag is set to false). When the canvas is tainted, the toDataURLgetDataImage, and measureText methods generate a security exception. You couldn’t circumvent the same-origin restriction by using Ajax, either, because it would not allow cross-domain resource access.

Of course, much of this has changed because of the WebGL security issues. Originally WebGL was limited to using only same-origin image access for canvas textures, but a more recent version of the specification allowed for cross-domain image access. WebGL developers wanted to add images (and potentially video) from other domains as textures for their 3D creations. Unfortunately, when the WebGL specification and implementations enabled cross-domain image access, they also opened up a security violation: the WebGL could be manipulated in such a way as to create a “data leak”, giving the web pages access to actual image (and video) data.

In order to allow WebGL to proceed without having to tackle the functionality causing the data leak (I’m told a daunting task), the WebGL community requested and received a new attribute that can be added to the img, audio, and video elements in HTML5crossorigin. This attribute allows same-origin privileges with cross domain resources, as long as the resource server concurs with this use. This is a concept known as Cross-Origin Resource Sharing, or CORS.

CORS is another specification in work at the W3C. It originated as a way for web developers to access cross-domain resources using XMLHttpRequest (Ajax). The concept has since been expanded to include workarounds for the same-origin security restrictions in other uses, including the newest related to canvas tainting.

It sounds all peaches and cream except that there are issues related to the concept, especially when accessing image and video data from cloud services such as Amazon’s AWS or centralized image systems, such as Flickr. For CORS and the crossorigin attribute to work, these services must be willing to support CORS. The WebGL and other developers assumed the sites would be more than willing to do so. However, I know that Amazon has already expressed reservation about supporting CORS, and I wouldn’t be surprised if there wasn’t some reluctance on the part of other services.

I also had reservations about the breathlessly quick addition of crossorigin to HTML5, starting with the unanswered question, “What would WebGL had done if HTML5 was too far along in the recommendation track to add this change?” I still have concerns about quickly adding in functionality that routes around security protocols because another specification needs to have this functionality because of a security violation. I’ve long been a fan of 3D effort on the web, beginning with the earlier VRML and continuing with my interest in WebGL (I covered it in my Painting the Web book). However, I’m even more of a fan of web security. That and a stable specification. What would have happened if WebGL had made this request after HTML5 had progressed to candidate recommendation status?

Yes, I am a stick in the mud. I like stable specifications and secure web pages. I’m just old fashioned that way.

Anyway, for those wanting to integrate HTML5 video and canvas element, be aware of this very new functionality. You won’t find it included in the HTML5 Last Call document, you’ll only find it in the HTML5 editor’s draft.

Codec Support

You would expect to find tables with audio and video browser container/codec support littering the internet, and you do. The only problem is, none of the tables seem to agree.

Trying to determine exactly what container/codec each browser supports is actually a pain in the butt. I’m sure each and every browser has a page somewhere that explicitly lists what it supports in all possible environments. Wherever these pages are, though, must be one of the better kept web secrets.

It’s not as if there’s a simple yes/no answer to audio or video codec. After all, if you use the HTMLMediaElement’s canPlayType method with various audio or video codecs, you’ll either get a “maybe”, “probably”, or an empty string. Maybe and probably are not normally viewed as decisive words. It also doesn’t help when Chrome answers either maybe or probably to everything.

Then there are the quirks.

Firefox and Chrome only like uncompressed WAV files. Opera and Safari don’t seem to mind compressed WAV files. Technically, though, all four browsers “support” WAV.

Both these statements are true: only Safari supports AAC; Safari, Chrome, and IE support AAC.

If you use a tool such as the Free MP3/Wma/Ogg Converter (, you’re given an option to convert your sound file to several different formats, including AAC and M4A. Many people will tell you AAC and M4A are one in the same. Well, yes and no.

The AAC option creates an AAC file that is packaged in a streaming format called Audio Data Transport System (ADTS). The M4A option is an AAC file that’s packaged in MPEG-4. Since Safari can play whatever QuickTime can play on a system, and QuickTime can play the ADTS AAC file, the AAC file only plays in Safari. Chrome and IE can also play the AAC file, but only if it’s wrapped in the MPEG-4 container, which Safari also supports.

But wait…there’s more!

No, no. I’m just joshing you.

Well, there really is more but I don’t want to be cruel.

The confusion about support is further exacerbated by the politics surrounding container/codec support. Yes, Chrome supports MP4. No, Chrome does not support MP4. Yes, Ogg is the open source community’s fair haired child. No, WebM is the open source community’s fair haired child … they just don’t know it yet. Speaking of WebM, yes, WebM is a video container/codec, but it’s also an audio container/codec—just leave out the video track.

Remember when everything was going to be Ogg and life was simpler?

Anyway, to add to the audio/video container/codec noise on the internet, my own versions of browser/codec support for the HTML5 audio and video elements.

Are they accurate? Sure. Why not.

What day is it?

Popular HTML5 audio container/codec support by browser
Container/Codec IE Firefox Chrome Safari Opera
WAV(PCM) No *Yes *Yes Yes Yes
MP3 Yes No Yes Yes No
Ogg Vorbis No Yes Yes No Yes
MPEG-4 AAC Yes No Yes Yes No
WebM Vorbis No Yes Yes No Yes

*Make darn sure the WAV file is uncompressed

Popular HTML5 video container/codec support by browser
Container/Codecs IE Firefox Chrome Safari Opera
MP4+H.264+AAC Yes No *No Yes No
Ogg+Theora+Vorbis No Yes Yes No Yes
WebM+V8+Vorbis No Yes Yes No Yes

*Google has announced that Chrome will not support H.264. However, there are faint traces of support—ghosts if you will—still left in Chrome.

Official HTML5 Video Mascot

The official HTML5 video mascot is ….

Big Buck Bunny!


The W3C HTML WG decision on RDFa prefixes

Recovered from the Wayback Machine.

One HTML WG decision I agree with is the one associated with Issue 120 on RDFa prefixes.

Considering that RDFa support in XHTML/HTML to this point has made use of prefixes, I don’t understand why we even contemplated not supporting prefixes just because RDFa is being ported to HTML5. Frankly, it’s not the HTML5 WG’s design decision to make—RDFa in HTML5 is a port, the design for RDFa resides with another group.

As for RDFa prefixes being confusing, one of the most fundamental design patterns, in computer tech and elsewhere, is the concept of variable/value pairs, with a shorter, easy to type and remember variable or abbreviation used in place of a longer, more complex value.

Then there’s the fact that RDFa has significant adoption, and dropping support for prefixes will break the web. I’ve heard that this is an important criteria for other HTML5 design decisions. If nothing else, consistency demands we support prefixes.

I could go on, but the proposal to keep prefixes does a commendable job and I don’t need to repeat its arguments.


Maxwell’s Silver Hammer: RDFa and HTML5’s Microdata

Being a Beatles fan, I must admit to being intrigued about the new Beatles box set that will be available in September. I have several Beatles albums, but not all. None of the CDs I own have been re-mastered or re-mixed, including one of my favorite songs, from Abby Road: Maxwell’s Silver Hammer:

Joan was quizzical; Studied pataphysical
Science in the home.
Late nights all alone with a test tube.
Oh, oh, oh, oh.

Maxwell Edison, majoring in medicine,
Calls her on the phone.
"Can I take you out to the pictures,
Joa, oa, oa, oan?"

But as she's getting ready to go,
A knock comes on the door.

Bang! Bang! Maxwell's silver hammer
Came down upon her head.
Bang! Bang! Maxwell's silver hammer
Made sure that she was dead.

I love the chorus, Bang! Bang! Maxwell’s silver hammer came down upon her head…

Speaking of Bang! Bang! Jeni Tennison returned from vacation, surveyed the ongoing, and seemingly unending, discussion on RDFa as compared to HTML5’s Microdata, and wrote HTML5/RDFa Arguments. It’s a well-written look at some of the issues, primarily from the viewpoint of a heavy RDFa user, working to understand the perspective of an HTML5 advocate.

Jeni lists all of the pushback against RDFa that I’m aware of, including the reluctance to use namespacing, because of copy and paste issues, as well as the use of prefixes, such as FOAF, rather than just spelling out the FOAF URI. Jeni also mentions the issue of namespaces being handled differently in the DOM (Document Object Model) when the document is served as HTML, rather than XHTML.

The whole namespace issue goes beyond just RDFa, and touches on the broader issue of distributed extensibility, which will, in my opinion, probably push back the Last Call date for HTML5. It may seem like accessibility issues are the real kicker, but that’s primarily because no one wants to look at the elephant in the corner that is extensibility. Right now, Microsoft is tasked to provide a proposal for this issue—yes, you read that right, Microsoft. When that happens, an interesting discussion will ensue. And unlike other issues, whatever happens will take more than a few hours to integrate into HTML5.

I digress, though. At the end of her writing, Jeni summarizes her opinion of the RDFa/namespace/HtmL5/Microdata situation with the following:

Really I’m just trying to draw attention to the fact that the HTML5 community has very reasonable concerns about things much more fundamental than using prefix bindings. After redrafting this concluding section many times, the things that I want to say are:

  • so wouldn’t things be better if we put as much effort into understanding each other as persuading each other (hah, what an idealist!) so we will make more progress in discussions if we focus on the underlying arguments so we need to talk in a balanced way about the advantages and disadvantages of RDF or, in a more realistic frame of mind:
  • so it’s just not going to happen for HTML5
  • so why not just stop arguing and use the spare time and energy doing?
  • so why not demonstrate RDF’s power in real-world applications?

My own opinion is that I don’t care that RDFa is not integrated into HTML5. In fact, I don’t think RDFa belongs in HTML5. I think a separate document detailing how to integrate RDFa into HTML5, as happened with XHTML, is the better approach.

Having said that, I do not believe that Microdata belongs in the HTML5 document, either. The HTML5 document is already problematical, bloated, and overly complex. It encompasses too much, a fault of the charter, as much as anything else. Removing the entire Microdata section would help, as well as several other changes, but we’ll focus on the Microdata section for the moment.

The problem with the Microdata section is that it is a competing semantic web approach to RDFa. Unlike competition in the marketplace, competition in standards will actually slow down adoption of the standards, as people take a sit-back and see what happens, approach. Now, when we’re finally are seeing RDFa incorporated into Google, into a large CMS like Drupal 7, and other uses, now is not the time to send a message to people that “Oops, the W3C really doesn’t know what the fuck it wants. Better wait until it gets its act together. ” Because that is the message being sent.

“RDFa and Microdata” is not the same as “RDFa and Microformats”. RDFa, or I should say, RDF, has co-existed peacefully with microformats for years because the two are really complementary, not competitive, specifications. Both can be used at a site. Because Microformat development is centralized, it will never have the extensibility that RDF/RDFa provides, and the number of vocabularies will always, by necessity, be limited. Microformats, on the other hand, are easier to use than RDFa, though parsing Microdata is another thing. They both have their strengths and weaknesses. Regardless, there’s no harm to using both, and no confusion, either. Microformats are managed by one organization, RDFa by the W3C.

Microdata, though, is meant to be used in place of RDFa. But Microdata is not implemented in any production capable tool, has not been thoroughly checked out or tested, has not had any real-world implementation that I know of, has no support from any browser or vendor, and isn’t even particularly liked by the HTML WG membership, including the author. It provides only a subset of the functionality that RDFa provides, and necessitates the introduction of several predefined vocabularies, all of which could, and most likely will, end up out of sync with the organizations responsible for the extra-HTML5 vocabulary specification. And let’s not forget that Microdata makes use of the reversed DNS identifier that sprang up, like a plague of locusts, in HTML5, based on the seeming assumption that people will find the following:


Easier to understand and use then the following:

Which, heaven knows, is not something any of us are familiar with these last 15-20 years.

RDFa and HTML5/Microdata, otherwise known as Issue 76 in the HTML 5 Tracker database. I understand where Jeni is coming from when she writes about finding a common ground. Finding common ground, though, presupposes that all participants come to the party on equal footing. That both sides will need to listen, to compromise, to give a little, to get a little. This doesn’t exist with the HTML5 effort.

Where the RDFa in XHTML specification was a group effort, Microdata is the product of one person’s imagination. One single person. However, that one single person has complete authorship control over the HTML 5 document, and so what he wants is what gets added: not what reflects common usage, not what reflects the W3C guidelines, and certainly not what exists in the world, today.

While this uneven footing exists, I can’t see how we can find common ground. So then we look at Jeni’s next set of suggestions, which basically boil down to: because of the HTML WG charter, nothing is going to happen with HTML5, so perhaps we should stop beating our heads against the wall, and focus, instead, on just using RDFa, and to hell with HTML5 and microdata.

Bang! Bang!

I am very close to this. I had started my book on the issues I have with HTML5, and how I would change the specification, but after a while, a person gets tired of being shut out or shut down. I’m less interested in continuing to “bang my head against the wall”, as Jeni so eloquently put it.

But then I get an email this week, addressed to several folks, asking about the introduction of Microdata: so what does the W3C recommend, then? What should people use? Where should they focus their time?

Confusion. Confusion because the HTML5 specification is being drafted specifically to counter several initiatives that the W3C has been nurturing over the last decade: Microdata over RDF/RDFa; HTML over XHTML; Reverse DNS identifiers over namespaces, and URIs; the elimination of non-visual cues, not only for metadata, but also for the visually challenged. And respect. There is no respect for the W3C among many in the HTML Working Group. And I know I lose more respect for the organization the closer we get to HTML5 Last Call.

In fact, HTML Working Group is a bit of a misnomer. We don’t have HTML anymore, we have a Web OS.

We don’t have a simple HTML document, we have a document that contains the DOM, garbage collection, the Canvas object and a 2D API, a definition for web browser objects, interactive elements, drag and drop, cross-document communication, channel messaging, Microdata, several pre-defined vocabularies, probably more JavaScript than the ECMAScript standard, and before they were split off, client-side SQL, web worker threads, and storage. I’m sure there’s a partridge in a pear tree somewhere in there, but I still haven’t made it completely through the document. It’s probably in Section 10. I know there’s talk of extending to the document to include a 3D API, and who knows what else.

There’s a lot of stuff in HTML5. What isn’t in the HTML5 document is a clean, straightforward description of the HTML or XHTML syntax, and a clearly defined path for people to move to HTML5 from other specifications, as well as a way of being able to cleanly extend the specification—something that has been the cornerstone of both HTML and XHTML in the past. There’s no room for the syntax, in HTML5. It got shoved down by Microdata and the 2D API. There’s no room for the past, the old concepts of deprecated and obsolete have been replaced by such clear terms as “Conforming but obsolete”. And there’s certainly no room for future extensibility. After all, there’s always HTML6, and HTML7, …, HTMLN—all based on the same open, encompassing attitude that has guided HTML5 to where it is today.

If we don’t like what we see, we do have options. We can create our own HTML5 documents, and submit “spec text” for a vote. But what if it’s the whole document that needs work? That many of the pieces are good, but don’t belong in the parent document, or even in the HTML WG?

The DOM should be split out into its own section and should take all of the DOM/interactive, and browser object stuff with it. The document should be re-focused on HTML, without this mash-up of HTML syntax, scripting references, and API calls that exists now. The XHTML section should be fleshed out and separated out into its own section, too, if no other reason to perhaps reassure people that no, XHTML is not going away. We should also be reminded that XHTML is not just for browsers—in fact, the eBook industry is dependent on XHTML. And it doesn’t need Canvas, browser objects, or drag and drop.

Canvas should also be split out, to a completely separate group whose interest is graphics, not markup. As for Microdata, at this point, I don’t care if Microdata is continued or not, but it has no place in HTML5. If it’s good, split it out, and let it prove itself against RDFa, directly.

The document needs cleaning up. There are dangling and orphaned references to objects from Web Workers and Storage still littering the specification. It hops around between HTML syntax and API call, with nothing providing any clarity as to the rhyme or reason for such jumping about. Sure there’s a lot of good stuff in the document, but it needs organization, clean up, and a good healthy dose of fresh air, and even a fresher perspective.

Accessibility shouldn’t be added begrudgingly, woodenly, resentfully. It should be integrated into the HTML, not just pasted on in order to quiet folks because LC is coming up.

The concepts of deprecated and obsolete should be returned, to ensure a sense of continuity with HTML 4. And no, these did not originate with HTML. In fact, the use of deprecated and obsolete have been fairly common with many different technologies. I can guarantee nothing but the HTML5 document has a term like “conforming but obsolete”. I know, I searched high and low in Google for it.

And we need extensibility, and no, I don’t mean Microdata and reverse DNS identifiers. If extensibility was part of the system, folks who want to use RDFa could use RDFa, and not have to beg, hat in hand, to be allowed to sit at the HTML 5 table. This endless debate wouldn’t be happening, and everyone could win. Extensibility is good that way. Extensibility has brought us RDFa, SVG, MathML, and, in past specifications, will continue to bring whatever the future may bring.

whatever the future may bring…

Finding common ground? Walk a mile in each other’s moccasins? Meet mano a mano? Provide alternative specification text?

Bang! Bang!

Jeni’s a pretty smart lady.