Categories
Web

Google is the new Cloverfield monster

Recovered from the Wayback Machine.

Oh, the horror! Google hijacks 404 pages!

The reality is that the new Google beta toolbar doesn’t hijack the 404 page if the site provides a 404 page or other form of web error handling. I tried the toolbar out this morning, and the only case I found where the Google toolbar provided a search page is the site matching the screenshots below, and the site given in the original post on this topic. The latter site provided a lame looking redirect back to the main page. However, other sites that redirected back to the home page for 404 errors did not have this problem, so the problem seems to be unique to this site.

If you’ve ever seen default 404 error handling, you know it’s basically useless.
[missing image]

Compare that with a page managed by the toolbar.

[missing image]

I would expect a search engine toolbar to provide useful, alternative methods of finding the content if the web site uses default error handling. However, according to Codswallop, Google steals your visitors.

Why is this “helpful” behavior bad? As well as a link to the domain root they provide a prominent search box pre-filled with search terms. The temptation is going to be to hit that search button, effectively taking away your visitor.

I would say any webmaster that doesn’t provide effective error handling pages for 404 errors doesn’t really care about losing visitors, do they?

update

Matt Cutts from Google explained that the toolbar looks for a result larger than 512 bytes. The example page is nothing more a broken HTML page, with a meta refresh and a link, all of which is less than 512 bytes. Those sites that do a direct redirect don’t, of course, return 404 to trigger the toolbar. End of story

.end update

What really surprised me about this story, though, is that if people are so quick to accuse Google of ‘evil’ behavior in an innocuous situations like this, why was the idea of Google helping to bail out Yahoo to keep the latter out of the hands of Microsoft seen as a “good” thing? I would think a search engine monopoly in the hands of Google would be potentially more evil than Google providing useful features for default 404 error handling.

This environment is confusingly inconsistent at times.

Categories
RDF Specs SVG XHTML/HTML

Our bouncing baby markup has growed up

Recovered from the Wayback Machine.

On today’s tenth anniversary of the birth of XML, Norm Walsh writes:

I joined O’Reilly on the very first day of an unprecedented two-week period during which the production department, the folks who actually turn finished manuscripts into books, was closed. The department was undergoing a two-week training period during which they would learn SGML and, henceforth, all books would be done in SGML…My job, I learned on that first day, would be to write the publishing system that would turn SGML into Troff so that sqtroff could turn it into PostScript. “SGML”, I recall thinking, “well, at least I know how to spell it.”

Ah yes. “Unix Power Tools” was formatted as SGML, the one and only book at O’Reilly I worked on that wasn’t in a Word format. I must express a partiality to my NeoOffice, though the SGML system was ideal for cross-referencing and indexing. OpenOffice ODT, or OpenDocument text, will be the most likely format for the next UPT. Just another example of the permanent/impermanence of web trends.

Norm also mentions about HTML5 possibly being the nail in this child of SGML’s coffin, but as I wrote recently, the folks behind HTML5 have solemnly assured us this specification also includes XHTML5. I’d hate to think we’re giving up on the benefits of XHTML just when they’re finally being realized by a more general audience.

Of course, I’m also fond of RDF/XML, which seems to cause others a great deal of pain, the pansies. And I’ve never hidden my SVG fandom and SVG is based in XML. I must also confess to preferring XML over JSON–you know, good enough for granddad, good enough for me. Atom rules. Or is that, Atom rocks? I’m also sure XML has squeezed between the joints of many of my other applications, and I just don’t know it.

Categories
SVG

Graphics tools

I really kick myself now for not including a mention of gnuplot in “Painting the Web”. I had one chapter on graphics and data, and it would have been a nice fit. However, it does need a nice installation environment for the Mac, and that was one of the criteria for including mention of tools.

We’re told that a Mac-specific installation of gnuplot is coming. When it does, I’ll include a link in the graphics tools section of the book’s supplementary site.

Another handy graphical tool is svgfig, which allows you to draw mathematical figures in SVG using Python. This tool should be very simple to install if you have Python installed. Using it, though, does require an understanding of math. Of course.

I would say that 2008 is the year of SVG in addition to the year of semantics. Works for me, though perhaps I should have called my book, “Painting the Semantic Web”.

(Thanks to Michael Bernstein for mention of svgfig)

Categories
XHTML/HTML

Adventures in XHTML

Recovered from the Wayback Machine.

During the recent light hearted discussions revolving around IE8 and its faithful companion, Wonder Tag, a second topic thread broke out about XHTML. As is typical whenever XHTML is brought up, the talk circles around to the draconian error handling or yellow screen of death when encountering even a small, harmless seeming discrepancy in a page’s markup.

However, the yellow screen of death is a factor of how Firefox deals with problems, not handling that’s inherent to serving XHTML as application/xhtml+xml. Safari’s error handling is much less extreme, attempting to render all of the ‘good’ markup up to the point where the ‘bad’ markup occurs.

Opera’s error handling is even more friendly. It provides the context of the error, which makes it the best tool for debugging a faulty XHTML page. You might say Opera is to XHTML, as Firebug is to JavaScript. The browser also provides an option to process the page as a more forgiving HTML.

To return to the discussion I linked earlier, in response to the mention of the draconian error handling, I wrote:

I can agree that the extreme error handling of the page can be intimidating, but it’s no different than a PHP page that’s broken, or a Java application that’s cracked, or any other product that hasn’t been put together right.

To which one of the commenters responded:

I don’t want to get off-topic either but I hear this nonsense a lot. You can’t simply compare a markup language with a programming language. They have very different intended authors (normal people versus programmers) and very different purposes.

I disagree. I believe you can compare a markup with a programming language. Both are based on technical specifications and both require an agent to process the text in a specific way to get a usable response. As with PHP or Java, you have to know how to arrange XHTML in order to get something useful. Because HTML has a more forgiving processor than the XHTML or PHP doesn’t make it less technical–just inherently more ‘loose’ for lack of a better term.

In my opinion, the commenter, Tino Zijdel, was in error on a second point, as well: markup isn’t specific to programmers. In fact, programmers are no better at markup than ‘normal’ people. Case in point is the error pages I’ve shown in this post.

As most of you are aware, I serve my pages up with the application/xhtml+xml MIME type. For those of you who have tried to access this site using IE, you’re also aware that I don’t use content negotiation, which tests to see if the browser is capable of processing XHTML and returns text/html if not.

Before yesterday, I still served up the WordPress administration pages as text/html, rather than application/xhtml+xml. Yesterday I threw the XHTML switch on the administration pages as well, and ended up with some interesting results. For instance, both plug-ins I use that have an options page had bad markup. In fact one, a very popular plug-in that publishes del.icio.us links into a post, had the following errors:

  • The ‘wrap’ class name wasn’t in quotes.
  • Five input fields were not properly terminated.
  • The script element didn’t have a CDATA wrapper.
  • Properties such as ‘disabled’ and ‘readonly’ were given as standalone values.
  • Two extraneous opening TR tags.
  • One non-terminated TR element.
  • Two terminating label elements without any starting tag.

For all of that, though, it didn’t take me more than about 15 minutes to fix the page, with a little help from Opera.

The WordPress administration pages work except for the Dashboard, where the version of jQuery that comes with WordPress didn’t seem to handle the Ajax calls to fill the page. I updated jQuery with the latest version, and the feed from the WordPress weblog shows, but not the other two items. At least, not with Firefox 3 or Safari, but all the content does show with Opera.

The Text Control plug-in had one minor XHTML error in the options page, but even when that was fixed, selecting a new text formatting option in the post doesn’t work–the selection goes back to the default. That one will end up being more challenging to fix, because I haven’t a clue what’s stopping the update.

WordPress does a decent job of generating proper XHTML content when using the default formatting. In fact the only problem I’ve had, other than when I embed SVG inline, was my own inaccurate use of markup. I used <code> elements, by themselves, when displaying block code. What I should have used is the <code> preceded by <pre>. When I do, the WordPress default formatting works without problems.

remove_filter('comment_text', 'wpautop', 30);
remove_filter('comment_text', 'wptexturize');
add_filter('comment_text', 'tc_comment');

My error, and the errors of the plug-in creators all demonstrate that though programmers might be more familiar with the consequences of making a mistake with technical text, we don’t make fewer mistakes than anyone else when it comes to using web page markup. Our only advantage is we’re not as intimidated by pages with errors. Regardless of how displayed or our relative technical expertise, though, these error messages aren’t necessarily a bad thing.

One of the advantages to serving the pages with application/xhtml+xml is that we catch mistakes before we serve the pages up to our readers. We definitely catch the mistakes before we release code that generates badly formed markup, or providing broken option pages to accompany our coded plug-ins. I can’t for the life of me understand why any programmer, web developer, or designer would want less than 100% accuracy from their web pages. That’s tantamount to saying, “Hire me. I write sloppy shit.”

Of course, being able to program can have advantages when working with XHTML, especially with many of today’s applications. WordPress does a good job at working in an XHTML environment, but not a great one. One example of where the application fails, badly, is in the Atom feed.

In Atom, WordPress outputs the HTML type as an attribute to many of the fields:

<summary type="<?php html_type_rss(); ?>">
<![CDATA[<?php the_excerpt_rss(); ?>]]></summary>
<?php if ( !get_option('rss_use_excerpt') ) : ?>

This is all well and good except for one thing: when the type is returned as ‘xhtml’, Atom feeds are supposed to use the following syntax for the content:

<summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">
...</div></summary>

This is an outright error in how the Atom feed is coded in WordPress. I’ve had to correct this in my own feed, and then remember not to overwrite my copy of the code whenever there’s an update. What the code should be doing is testing the type, and then providing the wrapper accordingly.

A second issue with WordPress is more subtle, and has to do with that part of XML I don’t consider myself overly familiar with: character sets and encoding. As soon as I switched on XHTML at my old weblog, I started to have problems with certain characters in my comments, and had to adjust the WordPress comment processing to allow for UTF-8 encoding. As it is, I’m not sure that I’ve covered all the bases, though I haven’t had any re-occurrence of the initial problems.

However, during the XHTML discussion, Philip Taylor demonstrated another problem in the WP code, in this case sending through a couple of characters that the WP search function did not like.

I checked with one of my two XHTML experts, Jacques Distler (the other being Sam Ruby), and the characters were Unicode, specifically:

utf-8 0xEFBFBE = U+FFFE
utf-8 0xEFBFBF = U+FFFF 

From Jacques I found that Philip likes the U+FFFE and U+FFFF Unicode characters because they’re not part of the W3C’s recommended regular expression for filtering illegal characters.

Unfortunately, to protect against these characters in search as well as comments required code in more than one place, and in fact, having to hack into the back end of WordPress. This is not an option available to someone who isn’t a programmer. However, this example doesn’t demonstrate that you have to be coder to serve pages as XHTML–it demonstrates that applications such as WordPress have a ways to go before being technically, rather than just cosmetically, compliant with XHTML.

Having said that, I can almost hear the voices now: Why bother, they say. After all, no one uses XHTML, do they?

Why bother? Well, for one thing, XHTML served as XML provides a way to integrate other XML-based specifications into the page content, including in-line SVG, as well as MathML, and even RDF/XML if we’re so inclined. The point is, serving XHTML as XML provides an open platform on which to build. Otherwise, we’re dependent on committees to hash through what will or will not be allowed into a specification, based on one company or another’s agenda.

We can include SVG into a page using an object element, but we can’t integrate something like SVG and MathML together without the ability to include both inline. We certainly can’t incorporate SVG into the overall structure of the page–at least not easily using separate files. There is no room in an HTML implementation for all the other XML-based vocabularies, and we can only cram so much into class attributes before the entire infrastructure collapses.

No, we need both: an HTML implementation for those not ready to commit to an XML-based implementation, and XHTML for the rest of us.

During the recent discussions on IE8, several people asked Chris Wilson from Microsoft whether IE8 will support the application/xhtml+xml MIME type. So far, we’ve not had an answer. Whatever the company decides, though, XHTML is not going away. The HTML5 working draft, which was just released, is about a vocabulary, not a specific implementation of that vocabulary. Both HTML and XHTML implementations are covered in the document, though XHTML isn’t covered as fully because most of the aspects of processing XHTML are covered in other documents. At least, that’s what we’re being told.

What’s critical for the HTML5 effort is that browsers support both implementations. Even the smallest mobile device is not going to be so overburdened by the requirements that it can’t consume pages delivered up as proper XHTML. It’s a sure thing that handling clean markup takes less requirements than handling a mess.

I’d also hate to think we’re willing to trade well designed and constructed web sites for pages filled with missing TR end tags, poorly nested elements, and unquoted class names, just because Microsoft can’t commit to the spec, and Firefox took the “bailing out now!” approach to error handling.

Categories
Browsers

And they’re off

The ACID3 race has begun. Coming around the first lap…

Firefox 3 is in first place, with a comendable lead. Way to burn up the track, foxy!

[image gone]

Coming up from behind, we find the ACID crowd favorite, *Opera!

[image gone]

Winded, but still giving it all she’s got…Safari! (Is that a picture of a cat?)

[image gone]

And in the tail position, dragging, but not dead yet…IE!

[image gone]

The next lap is in six months. Get your bets in now.

Update

*Testing with Opera’s 9.5 beta, we have a new winner, going into the first lap…

[image gone]