Categories
HTML5

Simplify

At last count, I believe the HTML5 specification is adding 35 new elements, give or take a couple. That’s a lot of new elements. So what, we say. After all, it’s just a bit of text in a specification.

Unfortunately, new elements are more than just a bit of text. They have to be supported in all of the user agents, and they also have to be supported in any number of other applications, including HTML editors, Content Management Systems, validators, debuggers, and so on. A new element is a costly thing, so it needs to be a needed thing. The question is: do we need all of these new elements, and attributes?

It’s an odd thing, but people seem to have developed a disdain for the lowly div element. It’s not a meaningful element, we hear. We use terms such as divitus or some variation. Poor little play dough element.

Yet the div element was never intended to be anything more than a structural construct. It’s purpose was to be attached using the class attribute, styled using CSS, controlled with JavaScript, and given meaning with WAI-ARIA, RDFa, Microformats, and Microdata. Now, however, div is out, and article/section/footer/nav…are in.

Are we taking the right path with HTML5? It’s true that we can built expectation based on an element labeled nav or article that we can’t build on div elements with class names of “nav” or “article”. At the same time, though, adding new div elements with class names doesn’t require creating a new version of HTML, or require changes to browsers and other tools.

What kind of expectation can we build on these new elements? One expectation relates to the contents and structure. According to a blog entry written by James Graham, on the WhatWG weblog[1]:

HTML 5 introduces new elements like <section>, <article> and <footer> for structuring the content in your webpages. They can be employed in many situations where <div> is used today and should help you make more readable, maintainable, HTML source. But if you just go through your document and blindly replace all the <div>s with <section>s you are doing it wrong.

This is not just semantic nit-picking, there is a practical reason to use these elements correctly.

In HTML 5, there is an algorithm for constructing an outline view of documents. This can be used, for example by AT, to help a user navigate through a document. And <section> and friends are an important part of this algorithm. Each time you nest a <section>, you increase the outline depth by 1 (in case you are wondering what the advantages of this model are compared to the traditional <h1>-<h6> model, consider a web based feedreader that wants to integrate the document structure of the syndicated content with that of the surrounding site. In HTML 4 this means parsing all the content and renumbering all the headings. In HTML5 the headings end up at the right depth for free).

James even provides an example:


<body>
  <h1>This is the main header</h1>
  <section>
    <h1>This is a subheader</h1>
    <section>
      <h1>This is a subsubheader</h1>
    </section>
  </section>
  <section>
    <h1>This is a second subheader</h1>
  </section>
</body>

Which supposedly translates to the following navigational outline:

This is the main header
+--This is a subheader
    +--This is a subsubheader
+--This is a second subheader

Shiny newness…except…what about previous guidelines? As one person brought up in comments, what about the Web Content Accessibility Guidelines [2], which state to use H1-H6 to define the structure of the document?

Well, that’s old stuff. This is new stuff.

OK, it’s new, but is it better?

I look at the outline for one of my sites, which is based on the “old” XHTML+RDFa[3], and compare it with the outline for the HTML5 Doctor web site, which is based on the “new” HTML5 markup[4], using an HTML5 Outliner tool. Disregarding the different article counts, there is little different between the two. The behavior is exactly the same.

Now, it’s true, we could ask for a generic header element (<H>) and use it with section/article to create a web page with an outline five, ten, fifteen levels deep, but seriously, how useful is this? When I work on my books, we have several headers we can use to signify the depth of the section, but we’re discouraged from going beyond a depth of three levels. You can only break up your outline so much, before you make things worse.

I don’t want to just pick on section and article, or even pick on these elements. The point I’m making is that sometimes we can get so caught up in the shiny new that we don’t take time to think about what’s being offered, and to challenge it if all it does provide is new without also providing purpose.

References

[1] http://blog.whatwg.org/is-not-just-a-semantic

[2] http://www.w3.org/TR/2008/NOTE-WCAG20-TECHS-20081211/H42

[3] http://missourigreen.burningbird.net

[4] http://html5doctor.com/

[5] http://gsnedders.html5.org/outliner/

Categories
HTML5

Issue 90

Summmary

Summary: Remove the figure element.

Rationale

The following is the text for the initial bug[1] associated with this Issue:

Currently the HTML5 specification has an overly broad definition about what can be allowed in a figure element:

“The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc, that are referred to from the main content of the document, but that could, without affecting the flow of the document, be moved away from that primary content, e.g. to the side of the page, to dedicated pages, or to an appendix.”

This is counter to understandings about figure in other businesses and environments, where figures are illustrative graphics of some form. In addition, this provides a confusing parallel in functionality between figure and aside, enough so that people are going to have a difficult time knowing which is which, and when to use one over the other. In fact, with this parallelism, we don’t need both.

All assumptions I have read on figure is people assume the element will contain a reference to an image of some form and a caption. Yet caption is optional, and it sounds like anything can be included in figure. The specification examples show a poem and a code block, in addition to an image.

The figure element either should be pulled completely, in favor of the aside element, or it needs to have a tighter focus in its definition. It should consist of a graphic element, which could be an svg element, a mathml element, an img, an object, or, possibly, a video. It should then have one other element, which will be the caption. Since this element won’t be a svg, mathml, img, object, or video element, it could be anything, including just a regular paragraph. In fact, a regular element styled using CSS would be the best option.

This change would remove any confusion about this element, and there will be confusion. It would also eliminate the problem with having to create a special caption element, just for figure, as discussed in Issue 83.

In a second comment to the bug, I also added the canvas element to the list of allowable elements. The Editor’s rationale for marking the original bug WONTFIX

Rationale: I actually agree with Shelley on this, and that’s what HTML5 used to say. However, it is one of the very few topics which got a _huge_ outcry from Web authors around the Web, demanding that <figure> be allowed to contain basically any flow content (including sections, headings, paragraphs, lists, etc). That’s why the spec says what it does now.

Originally, my interest was only in cleaning up the figure element; to make it more consistent with standard practice in the print world. The more closely I examined the element and the discussions about it, though, the more I felt that we would be better off eliminating the figure element altogether.

The reason for specialized figure handling in the print world is because of typographic convention. This doesn’t really apply in the web world, because we have elements that can group, CSS that can style, WAI-ARIA for accessibility and semantics, as well as other semantic options. If we want to move the figure away from the text, but still have the two associated, we can just by linking the two. In fact, we would have to anyway, because there is no way to associate a figure element with its text if the two are in separate contexts, unless they are linked in some way.

There’s a good reason for specialized figure handling in the print world, but not for web pages. Because we don’t have a good understanding of why we have figure, we can’t determine what it should contain. We only have to look at the discussions about what should be allowed within the figure element to discover that no one really has a clear idea of what this element is for, or how it will be used. Well, other than something with an optional caption, that is tangentially related to the content of the page (as if “tangentially” has a great deal of meaning in a web context, considering that anything can be tangentially related to anything else with the simple addition of a link).

The figure element, as defined in the HTML5 spec, is also a far different construct than what was originally intended. The figure element originated from discussions related to finding a way to attach a caption to an img element[2], somewhat like the caption we attach to tables, but allowing markup rather than just text like the table’s caption attribute. I’m not sure at which step in the evolutionary path it went from a caption to an img, to this all encompassing something with an optional caption we have today.

I did find emails from Michael Fortin[3][4] and Simon Peters[5][6], providing use cases for the figure element. Several of the use cases that Michael pointed out were to Apple online manual web pages. He classifies the code samples that Apple labels as listings, as figures. However, the Apple company itself, restricted the use of figure for illustrative images, only. For tables it used the moniker Table, for listings, Listing. As such, Apple’s own terminology undermines the credibility of these pages as use cases for allowing actual code samples as figures. More specific to the point of this change proposal, if we add a new element for figure, why not for listings, too? That’s also a separate typographical entity in the print world.

Other use cases provided ran the gamut from the pre element for ASCII art, to actual tables, though we already have a table construct in HTML. And when we try to limit what’s allowed, someone somewhere will dig up some actual use case online, somewhere, defending the particular use.

As can be seen, either we allow everything in the figure element in order to meet all possible sets of existing use cases online, in which case figure is really nothing more than a variation on the div element; or we restrict the element to a small subset of allowable elements, and continually fight the battle of, “Well, what about this? What about that?” All for an element that, in actuality, doesn’t provide much in the way of semantics or usefulness.

The figure element is really is nothing more than a grouping mechanism, as was noted back in the beginning of the discussion about the element[7]. So why don’t we use what exists now, rather than create something new?

I was reminded recently that the WAI-ARIA states and roles are useful beyond their primary task, which is provide information for screen readers such as JAWS and NVDA. Other “screen readers”, such as search web bots, like Google’s, can also make use of the semantic information they provide. Among the semantics we can define with ARIA is being able to assign an image role to a div element, and link the image’s caption to another HTML element.

As an example, in the WAI-ARIA 1.0 specification, there is an image listing that I modified, below:


<div role="img" aria-labelledby="caption">
  <img src="example.png" alt="Some descriptive text">
  <p id="caption">A visible text caption labeling the image.</p>
</div>

Compare with the figure element:


<figure>
<img src="example.png" alt="Some descriptive text">
<figcaption>A visible text caption labeling the image.</p>
</figure>

There is little different between the two, and the ARIA example has the added benefit in that it is implemented in many screen readers today. Best of all, there’s nothing about this example that disallows its use by search bots or other tools and applications, who can then attach the right caption for the element rather than have to scan the surrounding text to derive a caption, or using the alt text.

If the figure is located apart from the text that references it, giving the outer div element an id attribute allows us to link the figure in the text. If we don’t need a physical link, we can use terminology, such as Figure 1, Figure 2, and relate the text and the illustration using this approach. There is nothing about the figure element that changes how the text/illustration are connected—you still need to link the two, or use the caption, itself, to connect the two.

You don’t have to use an img element within the div element. You don’t have to use a div, either. For the pre/ASCII art use case, attach the role and aria-labelledby attributes to the pre element:


<pre role="img" aria-labelledby="caption">
...
</pre>

It’s also a simple matter to style whatever we use, too. For instance, a CSS setting for the img example could be the following:


[role="img"]
{
  margin: 10px;
  border: 1px solid #ccc;
}

[role="img"] p
{
  margin-left: 20px;
  font-style: italic;
  font-size: .8em;
  line-height: 1em;
}

We could also further annotate the element using one of the three available semantic annotation technologies available to us: RDFa, Microdata, and Microformats. In fact, we’re overrun with an abundance of semantic annotation capability—too much so to worry about creating single purposed elements supposedly for semantic reasons.

Details

Based on the March 4th HTML5 specification, remove Section 4.5.12, on the figure element. Also remove any additional references to the figure element. In addition, remove Section 4.5.13, on the figcaption element, and any reference to it, too.

Impact

Positive

This alternative to figure I’ve provided in this change proposal is a frugal one that serves the same purpose for multiple user agents, multiple audiences, and uses available technology and specifications. It allows people to create any form of illustration, and ensures they’re accessible.

Removing the figure element and associated figcaption element, helps trim down the overlarge number of elements that have been added with HTML5. Each new element we add to the specification has a related cost when it comes to implementation—not only across browsers, but also other tools, such as HTML editors, and HTML generation tools.

In addition, encouraging the use of existing HTML, CSS, and ARIA properties and attributes also encourages reuse over creating new, which should be a fundamental goal of this group. If there is a strong rationale for creating something new, and there really isn’t a good alternative, then we can feel justified in creating new elements. However, in the case of figure, as both Michael and Simon have pointed out, we’ve made do with what we have today. We can improve what we have with the addition of the ARIA states and roles, and ensure both a semantic and an accessible solution.

Negative

Change will require HTML5 editor time. As far as I know, there is no implementation of figure in browsers or other tools, and there is no dependence on it that I can see in web pages. There might have been some modification to validation tools to support the figure element.

References

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=8404
[2] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-November/007859…
[3] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-June/006828.html
[4] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-July/012194.html
[5] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-December/008301…
[6] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-December/008302…
[7] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-June/006822.html

Categories
HTML5

Issue 91

Summary

Summary: Remove the aside element.

Rationale

Originally, my request for the aside element was to tighten its focus, and clean up the allowable content and usage. As I discovered with the figure element, though, covered in the Issue 90 change proposal, the more I looked at the element the harder time I had finding a good, solid reason for it to exist. Evidently, neither can the HTML5 editor, if one goes by the rationale he provided for rejecting my request to clarify the semantics, and structure, of the element:

Rationale: Concurred without counter-arguments above. As Shelley says, the draft was updated based on feedback from Web developers. Given that Web developers are the people who will be using this technology, that seems wise, and not at all something to be ashamed of.

If the element is so unimportant that the editor can’t provide a rationale other than a vague “web developers” want it, in the bug associated with it, then it is too unimportant to keep in the specification. However, the editor’s hubris aside, there is nothing “semantic” about the aside element, nor is there anything really useful about having the element. Consider its definition:

The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography.

There’s a reason in the print world why sidebars are separate elements: they trigger different typesetting. There’s also a reason why sidebars are many times published slightly out of context of their use: they are large, and typographically, not always easy to fit into the text where they would be most appropriately placed. Tangential placement has meaning in the print world, but less so on the web, where everything can be tangentially related to everything else via a link or happenstance physical proximity because of web page design.

It’s a confusing element, too. Because of the use of “sidebar” in the definition, people are assuming that the aside element is really a sidebar element, as sidebar is known in the web context. They’re two separate things, though, regardless of name used. A sidebar in the web world should more properly be another section element, as the main column is a section, or perhaps a div element. Frankly, I’m not sold on the usefulness of section, either, but that’s outside of the scope of this change proposal.

The HTML5 editor did not intend aside to be a sidebar element[2], as sidebar is known on the web. What was the original intent, though, is lost in the confusion surrounding the element[3]. For instance, folks have had a hard time differentiating it from figure[4], because both sound almost identical, except that figure had a caption. Semantic markup should not be causing such confusion. That it does implies that people don’t understand the purpose for this element.

The key to understanding whether aside is useful or not is to ask ourselves what it provides from a web perspective that can’t be met by other elements (in combination with semantic markup, such as RDFa, Microformats, or Microdata, ARIA, or CSS). If the material in the aside is supposedly material that can be removed from the document, document in this case being some form of article, from a web perspective, this is material that is not contained as a child of the article element. If it is to be tangentially related, the article and the sidebar could share a parent element, or be tangentially related via a link.

Again, repeating what I stated earlier, we’re not faced with the same restrictions as the print world, where even tangentially related material has to be included within the actual document, itself. We can put material anywhere on the web, and tangentially relate it to the article with a link. If the aside is equivalent to a print world sidebar, then it could be just as easily moved to another section in the same web page, or even another web page and linked. We don’t need a special purpose element in the web world, because we’re not facing the same constraints as the print world.

Now, evidently the aside element can now be used as a sidebar, which further weakens its usefulness. How is the typical sidebar content we see on the web even tangentially related to the existing document? Other than a link to the document may or may not exist in the sidebar? Or do we even know what “document” is, in this case? Is the web page the document? Or only a specific article within a section within the web page? Again, what has meaning in the print world does not necessarily carry over easily into the web.

A semantically meaningful element should be one that, when a person first sees its description, he or she goes, “I can use this. I have needed this.” I’ve not seen this in response to the aside element, except when people are defending it’s existence in the HTML5 specification. The statements such as “I need this, I can use this”, should come first, before the element is defined, not after.

I’m not sure who first asked for the element, I haven’t been able to trace its roots. Regardless, I’ve not seen many in the web design and development world jump up and down with joy for its existence, primarily because most people are still scratching their head over it, wondering what it is, and why they should use it. The aside element has been a point of confusion in the past, and is still a point of confusion now, and will not somehow become magically less confusing in the future[5][6][7][8].

The HTML5 so-called Superfriends wrote a manifesto of support for HTML5, which also included the following[7]:

We are excited about the the ability in HTML5 to scope headings via the section element. This proposes a significant improvement in fluidity of content reuse and eases the burden of creating mashups….We would like to encourage spec authors to be conservative in including new tags, and only do so when they[sic] addition of the tag allows for significant gains in functionality. (emphasis mine)

There is a cost associated with every new element, attribute, and specification change. The cost is to browser developers, but in the case of an element like aside, more so to HTML editors, Content Management Systems, and other tools that now have to incorporate support for yet another new element. The cost is also to web designers and developers, trying to figure out what to use, and when.

If we’re truly concerned about helping web developers, we’re not doing so by introducing confusing elements. If it’s not semantically meaningful, or structurally useful, it should be removed.

Details

Based on the March 4th publication of the HTML5 specification, remove Section 4.4.5, on the aside element, and any other reference in the specification to the aside element.

A better approach would be to use existing elements, such as a div element, style it with CSS, and attach semantics using ARIA, RDFa, Microformats, or Microdata. After all, we now have four different ways we can apply semantics to a web page—we don’t need to create single purposed elements, too.

Impact

Positive

Removing the aside element removes a element that has generated confusion since its first release—a confusion that doesn’t seem to lessen over time. The element provides little in the way of semantics, because it’s more or less based on a construct from the print world, and doesn’t really have much application in a web environment. Structurally, it provides nothing useful.

Removing the element reduces the confusion, but is also a cost saver in the future for HTML tool builders. Though browsers can more or less treat aside like a div element, HTML editors and other tools cannot. If there was a genuine purpose for the existence of the element, the cost would be justified. But the element’s definition is now so general that we might as well consider it a synonym for the div element.

Negative

Removing the element will require Editor time.

References

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=8447

[2] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-November/017596…

[3] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-October/023435….

[4] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-June/020267.html

[5] http://html5doctor.com/understanding-aside/

[6] http://html5doctor.com/aside-revisited/

[7] http://www.zeldman.com/superfriends/guide/

[8] http://www.webmonkey.com/2010/02/building_web_pages_with_html_5/#.3Casid…

Categories
HTML5

Issue 92

Summary

Summary: Replace too-simple and somewhat odd example table and verbose text unrelated to the table element, with one example table, derived from real world data that best demonstrates the table element. Refocus the text specifically on the table element.

Rationale

In the bug[1] related to this issue, the HTML5 Editor’s rationale for not make this change was:

Rationale: Given how bad the current situation is regarding authors providing explanatory text for tables, I think we should given them as much information as possible, in as many places as possible. We do AT users a disservice if we pretend that their needs aren’t important enough to include advice on how to best serve them in the spec.

The current table element section provides one very small and inadequate table example, with a great deal of prose basically telling people what to put in text surrounding the table. None of this prose is related to the purpose and interoperable use of the table or its child elements, and seemingly added to the section only as a way of justifying removing the summary attribute. I hate to use cliches, but this seems like a true case of the tail wagging the dog.

Throwing lots of irrelevant text at authors does not make the table element any clearer, or ensure they use the element in the proper way. What’s needed is a good, succinct example, with a clear explanation of the element, and the table’s only unique attribute, summary.

What people incorporate into the text surrounding an HTML table is not the business of the W3C HTML Working Group. Such over-specification only adds to the noise, and if you throw enough noise at people, all they’ll do is tune out the important bits.

The space would be better used by providing a table listing that uses all of the table child elements, demonstrating how the elements work together, and then providing a screenshot of the table. By creating a listing, people can see how the table is put together; the figure demonstrates the visual rendering of the table.

Details

Replace the existing table element description section with the following (replace the URL for the img element in the figure with one local to the W3C, image can be copied, link table model):

The table element represents data with more than one dimension, organized into non-empty columns and rows. It is the primary component of the table model.

Tables are used for data display, only, and should not be used for layout purposes. In particular, users of accessibility tools like screen readers are likely to find it difficult to navigate pages with tables used for layout.

The only unique attribute for the table element is the summary attribute. This attribute is optional, and should only be used with complex tables, in order to provide a visual description of the structure of the table—making the table easier to navigate when rendered non-visually. The summary may also include a brief description of the purpose of the table, if such purpose is visually apparent when viewing the entire table, but may not be apparent traversing the table, cell by cell. An example of a complex table is shown in Listing Table-1. Figure Table-1 is a visual rendering of the table.


<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Example Table</title>
<style>
table
{
  border: 1px solid #ccc;
  border-collapse: collapse;
  margin: 0 20px;
}

td, th
{
  border: 1px solid #ccc;
  padding: 10px;
}

.female
{
  background-color: #ffc;
}

tfoot
{
  font-size: smaller;
}
</style>
</head>
<body>
<table summary="First row is column headers separated into year and degree programs 
                (bachelor, master, graduate). Each degree program is further split into 
                biology and technology fields on second line. Each topic field is separated into 
                male and female graduates on third row. Years are listed in first column.">

   <caption>
      <p>Degrees in the biological and biomedical sciences compared with degrees 
         in technology conferred by degree-granting institutions, by level of 
         degree and sex of student: Selected years, 2002-2007.
      </p>

   </caption>

   <colgroup span="1"></colgroup>
   <colgroup>
      <col class="male" />
      <col class="female" />
   </colgroup>
   <colgroup>
      <col class="male" />

      <col class="female" />
   </colgroup>
   <colgroup>
      <col class="male" />
      <col class="female" />
   </colgroup>
   <colgroup>
      <col class="male" />
      <col class="female" />

   </colgroup>
   <colgroup>
      <col class="male" />
      <col class="female" />
   </colgroup>
   <colgroup>
      <col class="male" />
      <col class="female" />
   </colgroup>

   <thead>
      <tr>
         <th rowspan="3">Year</th><th colspan="4">Bachelor's Degrees</th>
         <th colspan="4">Master's Degrees</th>
         <th colspan="4">Doctor's Degrees</th>
      </tr>

      <tr>
         <th colspan="2">Biology</th><th colspan="2">Technology</th>
         <th colspan="2">Biology</th><th colspan="2">Technology</th>
         <th colspan="2">Biology</th><th colspan="2">Technology</th></tr> 
      <tr>
         <th>Male</th><th>Female</th><th>Male</th><th>Female</th><th>Male</th><th>Female</th>

         <th>Male</th><th>Female</th><th>Male</th><th>Female</th><th>Male</th><th>Female</th>
      </tr>
   </thead>
   <tbody>
      <tr>
       <td>2002</td><td>22,918</td><td>37,186</td><td>41,950</td><td>15,483</td>

                    <td>2,981</td><td>4,009</td><td>13,267</td><td>6,242</td>
                    <td>2,804</td><td>2,289</td><td>648</td><td>168</td> 
      </tr>
      <tr>
       <td>2003</td><td>23,248</td><td>38,261</td><td>44,585</td><td>14,903</td>

                     <td>3,227</td><td>4,430</td><td>13,868</td><td>6,275</td>
                     <td>2,804</td><td>2,438</td><td>709</td><td>200</td> 
      </tr>
      <tr>
       <td>2004</td><td>24,617</td><td>39,994</td><td>42,125</td><td>11,986</td>

                    <td>3,318</td><td>4,881</td><td>13,136</td><td>5,280</td>
                    <td>2,845</td><td>2,733</td><td>905</td><td>214</td>
      </tr>
      <tr>
       <td>2005</td><td>26,651</td><td>42,527</td><td>37,705</td><td>9,775</td>

                    <td>3,654</td><td>5,027</td><td>12,470</td><td>4,585</td>
                    <td>2,933</td><td>2,842</td><td>1,109</td><td>307</td>
      </tr>
      <tr>
       <td>2006</td><td>29,951</td><td>45,200</td><td>34,342</td><td>7,828</td>

                    <td>3,568</td><td>5,179</td><td>11,985</td><td>4,247</td>
                    <td>3,221</td><td>3,133</td><td>1,267</td><td>328</td>
      </tr>
   </tbody>
  
   <tfoot>

      <tr>
         <td colspan="13">Data from Institution of Education Sciences National Center
                for Education Statistics, derived from two tables: 
                <a href="http://nces.ed.gov/programs/digest/d08/tables/dt08_298.asp">
                Table  298. Degrees in the biological and biomedical sciences conferred by degree-granting 
                institutions, by level of degree and sex of students; selected years, 1951-52 through 2006-07
                </a> and <a href="http://nces.ed.gov/programs/digest/d08/tables/dt08_302.asp">
                Table 302. Degrees in compueter and information sciences conferred by degree-granting 
                 institutions, by level of degree and sex of student: 1970–71 through 
                 2006–07</a></td>
      </tr>
   </tfoot>
</table>
</body>
</html>

Other changes:

For each of the table element children in the listing—tr, th, col, colgroup, caption, tbody, thead, tfoot—reference the existing example in Listing Table-1, and remove any other example table. One example should be sufficient to demonstrate all of the table model elements.

Though this is more related to Issue 32, it’s still relevant: remove the warning about the summary attribute, and remove the attribute from the “obsolete but conforming” section. We’ve beat this horse so much that it’s practically glue. Time to open the gates and let it loose. Time to move on to other things.

Impact

Positive Effects

Replaces an unbelievable table example, with one that is believable, using real data. In the spec, we should avoid contrived and made up examples, as much as possible, as they can undermine the credibility of the section, and distract from element(s) being demonstrated.

The change also clarifies the section and puts the focus back on the table element, rather than anything but. The example also demonstrates how to use all of the table elements, as well as making a correct use of the summary attribute. Linking all of the table child elements back to the table element section and the listing allows people to see how these elements are used, especially in context.

Negative Effects

Will take some of the editor’s time to make this change. The use of labels such as Listing Table-1 and Figure Table-1 may not fit it within the writing style of the rest of the specification (adjust as necessary). The listing is also a little large, though as a demonstration of a family of elements, not overly so.

References

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=8449

Categories
HTML5

Issue 100

Summary

Remove the srcdoc attribute.

Rationale

The original bug report for removing srcdoc provided the following change request[1]:

This recent entry does not have universal acceptance, and the group was still discussing it when the editor added it to the specification.

The supposed use case for this attribute is weblog comments, but concerns about HTML security have been resolved with weblog and other application comments years ago. In addition, support for this attribute could give the impression that online sites don’t need any other security, which is false. Script injection is only one aspect of security related to weblog comments, and considered a fairly trivial one at that.

This needs to be removed from the specification.

The rationale given by the HTML5 Editor for keeping this attribute:

Rationale: I’m happy to remove this attribute from the W3C HTML5 specification if that’s what the working group wants. The last time I removed a feature based on a bug report such as this, I started a minor war, however, so I suggest that you raise this via the change proposal process if you really feel this way.

According to the HTML5 editor, there is no rationale for keeping this attribute. That made this change proposal more difficult to write, because I had to base my arguments on guesses and scraped email messages.

There was a great deal of contention about this attribute before it was added. It spawned another issue (Issue 103) because of concerns about escaping the markup in the attribute, especially for XHTML. That this caused some difficulty for members of this group, who are defining the next version of HTML/XHTML, should give us pause, because knowing what must be escaped is going to be that much more difficult to the average web developer[2][3].

When asked the purpose for srcdoc, the HTML5 Editor replied that the use case for the attribute is weblog comments[4]. Because the srcdoc attribute works within a sandboxed context, the use of the attribute would prevent script injection in comments. Since this change was targeted to a specific use related to weblog software, I asked Matt Mullenweg[5], the creator of WordPress, one of the more popular weblogging tools in use today, about the usefulness of this attribute. He responded with[6]:

We haven’t had any HTML-level problems in comments in a while.

We use and maintain a library called KSES that we use for all sanitation, and it has served us well.

I brought Matt into the discussion for two reasons. The first is that I wanted to bring in an “implementor”, and demonstrate that an implementor, in the case of weblog comments, is the the group or individual responsible for the weblogging software. Too often this group is focused purely on browser developers as implementors, forgetting that browsers are not the only application group impacted by HTML5 changes.

The second reason was to demonstrate that no one from the weblogging community has asked for this, and it is very unlikely that many, if not most, of the weblogging community will use this uncomfortable, awkward attribute. The weblogging community has long had to deal with security problems, and has devised sophisticated tools and techniques to not only protect against script injection, but also SQL injection, the greater hazard for weblog comments, and even the accidental wayward insertion of a non-printing character in XHTML.

In point of fact, relying on something such as srcdoc can make a site less secure rather than more, because it only touches on one vulnerability, when we’re faced daily with a host of new and ever more sophisticated threats[7].

So the use case is heavily flawed. What are the other issues associated with srcdoc? I’ve already mentioned the concerns about escaped characters, and how this will differ between HTML and XHTML, which in itself will discourage its use with most applications like Content Management Systems. Are there other issues?

Another issue is when something like srcdoc can be used, and if the restrictions of the use are such as to defeat its use. This attribute can’t be used effectively for potentially years in the future, because web browsers don’t print out what’s contained in the attributes—not unless specifically directed to do so[8]. Until then, the fallback is used, which is the iframe’s src attribute. In the meantime, our existing applications that do provide security become more sophisticated, more capable, more tightly integrated, until by the time we could use srcdoc effectively, few of us will even remember what it is, and fewer still, would be interested.

An alternative to srcdoc was suggested in the discussion surrounding this attribute. Instead of embedding markup in the attribute—something that has been actively discouraged for some time— we can use a data URI with the src attribute, getting the same functionality that can be more quickly usable and won’t require us to embed markup in an attribute. However, the data URI has its own challenges, specifically the fact that the data would be printed out without the security controls in legacy browsers [9]. Again, though, using a data URI in an iframe src attribute would most likely never be used for weblog comments. I find it unlikely that any approach related to the iframe and sandboxing will ever be used with weblog comments, so it might be best if another use case is used to attempt to defend this attribute.

One use case that does come to mind are the plug-ins we drop into our web pages. The source of the plug-in comes from an external site, which could be cause for alarm. However, plug-in security is not related to the srcdoc attribute, so I have a hard time determining what use case would apply. Perhaps there are none, in which case, there’s even more of a reason to remove this potentially harmful, most definitely problematic attribute.

Details

Remove all references to the srcdoc attribute from the HTML5 specification. If such a removal results in a gap in coverage, consider following one of two paths: remove whatever other material is necessary to eliminate the gap or work with the W3C HTML WG to come up with an alternative approach, if one can be found.

I would also strongly suggest finding another use case, if you want to pursue this type of functionality.

Impact

Positive

Removes a confusing, potentially harmful, and not really usable attribute, either forcing us to re-address the issue, or to consider dropping this particular subset of web page security from the HTML5 specification. Perhaps there are some aspects of the web that cannot be managed by browsers.

Negative

Requires some of the Editor’s time to make the change. Could potentially leave a gap in coverage, if this subset of security is still of interest. Would require more work in the HTML WG. However, counter proposals to this proposal might be able to provide effective alternatives. Or not, if none really exists.

References

[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=8818

[2] http://www.w3.org/html/wg/tracker/issues/103

[3] http://lists.w3.org/Archives/Public/public-html/2010Mar/0431.html

[4] http://lists.w3.org/Archives/Public/public-html/2010Jan/1193.html

[5] http://lists.w3.org/Archives/Public/public-html/2010Jan/1223.html

[6] http://lists.w3.org/Archives/Public/public-html/2010Jan/1337.html

[7] http://lists.w3.org/Archives/Public/public-html/2010Jan/1318.html

[8] http://lists.w3.org/Archives/Public/public-html/2010Jan/1325.html

[9] http://lists.w3.org/Archives/Public/public-html/2010Jan/1346.html