Categories
RDF Specs

Proving yet again

Recovered from the Wayback Machine.

…why Atom is the only syndication format to use (if you all persist in finding RDF too hard that is, and go icky poo with RSS 1.x).

Rogers Cadenhead:

In part to address his concerns (and some voiced by Palfrey), I launched a new site for the board and we’ve been working on a newly written specification that seeks to resolve long-standing issues with RSS that make it difficult to implement, such as a lack of clarity on whether an item’s description is the only element that can carry HTML. (The spec’s not official — it’s published to solicit public review for at least 60 days. I encourage people who are interested in it to join the RSS-Public mailing list.)

Winer has now decided that the board doesn’t exist and never had authority over the RSS specification, even though it has published six revisions from July 2003 to the present.

God giveth. God taketh away.

Categories
Specs

XML Introduction

Originally published in NetscapeWorld, sometime in 1997. Note, the examples in this article only work with IE 4.x, and have only been tested with IE 4.01 on Windows95 and Windows NT. Netscape does not have XML parsing built into Navigator 4.x at this time, something that will probably change with Navigator 5.0.

The concept is simple: An implementation independent language that can define the structure of any information and that can be human- and machine-readable. There is no assumption about what information is defined, or how it is used. The only guaranteed fact is that the information is defined in structures, the structures follow certain rules, and the information contained within the structure can be accessed automatically or manually.

So why is a descriptive language that describes things and that doesn’t do anything so important? Considering that the concept I just defined is for SQL, the database access language that is in wide and accepted use with relational databases and other data stores, the importance is fairly well known and is a proven value. SQL is used more than any other technique for relational database access within most corporations.

SQL is also an example of a multi-purpose language used to define data structures, and query those same data structures without concern about how the information is displayed or used, either by the machine or a person.

XML, or eXtended Markup Language, is another implementation-neutral language that is used to perform the same function of defining structures for information in the form of elements, element attributes, and element content. Except instead of defining data that is stored in a physical storage medium usually only accessible by a database engine, XML describes data that is stored and accessed from within documents.

What is XML?

XML is a subset of SGML (Standardized General Markup Language), a generalized markup language that was passed as an ISO standard in the 1980’s. Rather than specifying a language’s elements directly, SGML is used to define the rules that constrain the elements of a specific language.

What is SGML

SGML grew out of the need to define a document’s structure and to define rules used to determine whether the document is valid and well formed. The document’s structure is defined through the use of markup tags, which delimit the elements, and Document Type Definition (DTD) files that define each element’s structure and content, providing the grammar for the document.

As an example of SGML, a customer element within a document could have the following structure:

<CUSTOMER name="Shelley Powers" id="CUST011A1">
<PO id="PO23349008">
<POITEM id="POI1">
<ITEM id="14453">
Item ID: 14453
Item Desc: some description
</ITEM>
</POITEM>
</PO>
</CUSTOMER>

To validate the markups used to define the structure of the document, an associated DTD would be created, and for the example could have statements similar to the following:

<!ELEMENT customer - - (POITEM)+>
<!ATTLIST customer 
   name CDATA 
  id CDATA
>

The extremely simplified and abbreviated DTD uses an Extended Backus-Naur Form (EBNF) syntactic notation to create the grammar.

Using a standardized meta-language to define the entities means that SGML parsers can pull out the individual entities within the document, such as the customer entity just described, and any associated attributes and content, and an application can then use the information for some purpose. Among some of the application purposes could be:

  • To define information in a database neutral format for transport between unlike databases
  • To provide a search engine that allows a person to query on the entity type as well as the data
  • For report generation, or even an online hypertext order processing form that allows the document reader to “drill” down within the document to find out the information wanted.
  • To define a standard language for a specific industry or science, such as the petroleum industry, or chemistry, including any special notational conventions

The concept of SGML is very attractive: define a language that in turn defines a document structure used for a specific group of documents and which can be extended without impacting on the underlying language generation mechanism. Unfortunately, the downside to SGML is that it is not trivial to define the DTD for a language, and SGML is a complex standard.

SGML did, however, provide the roots of the first Web document specification, HTML.

HTML, a Derivative of SGML

HTML was a derivation of SGML, except that a group of elements was predefined that controlled the delivery of a Web page’s content. In addition, the original HTML elements were expanded to include suggested presentation elements that controlled the appearance of the Web page. SGML does not control the presentation of elements, only the element structure and semantics.

With HTML, the following would define an element that is a unnumbered list element, which is defined by the DTD associated with the HTML 4.0 specification as having a start and end tag, and containing at least one list item:

<!ELEMENT UL – – (LI)+>

According to the EBNF associated with SGML, this DTD states that UL is an element, the double dashes assert that the element requires a start and end tag, and the element consists of at least one, and possibly more than one, list item (LI). When a user agent such as a browser parses an HTML element it knows to look for both beginning and ending UL tags, and at least one LI element contained within those tags.

Associated with the DTD for HTML 4.0 is an implied visual presentation of a unnumbered list, which is that each list item has a specified list graphic, each list item is on a separate line, and each list item lines up beneath the previous list item. However, not all user agents (such as browsers) are visual so presentation can only be a suggestion.

After the first releases of HTML, new elements crept into the language to provide control over page presentation. Among some of the elements is the FONT element, which controlled the size, color, type, and font family of any text the element contained. However, the problem with using a specific tag such as this, is that non-standardized tags can lead to diverse differences with user agents in the presentation of a Web page. In addition, an element’s appearance is considered more of an attribute of the element rather than being introduced with a new element.

To differentiate the differences between an element’s structure and its presentation, within HTML, the W3C also issued a recommendation for CSS1, or Cascading Style Sheet Level 1, a specification that provides presentation information for HTML elements.

The advantage to HTML is that it was relatively easy to read, and to follow, which aided in the massive growth of the Web. In fact, if Web document access began with XML, chances are you wouldn’t be reading this article right now, and Web access would probably be limited to the scientific community or internally with larger corporations. We initially needed a simple to use and follow mechanism to create Web documents, and HTML was it. The very lack of flexibility was the language’s strength.

Now that the Web and Internet access has matured, or at least we like to think it has matured, flexibility needs to be built into documents such as Web pages that increase their effectiveness and applicability.

Enter XML

XML arose from a need to create more generalized markup languages, but without having to follow the fairly large and complex SGML standard. The implication with XML is that a markup language can be defined that is generic and well-formed, but that does not have to be validated, which means that an associated DTD is not required, though one can be included. Additionally, XML uses only a subset of the rules for SGML, leading to quicker understanding of the principles, and quicker implementation of the technology.

The main requirements for a well-formed XML are:

  • The language may begin with a valid XML declarative statement or prolog
  • There is one element that acts as the root element and which acts as parent to all other elements
  • Elements are either not empty or the empty element has a “hint” encoded within the element that defines this information to the XML parser
  • Non-empty element must have start and ending tags
  • All elements except the root element are contained within some element, referred to as the element’s parent; all contained elements are referred to as the parent element’s children.
  • Elements can contain character data, other elements, CData sections, processing instructions or comments
  • Each parsed element within the document is well-formed
  • Character data that may be processed as XML is enclosed within CData sections
  • Documents can include comments, white space, and processing instructions

How Does XML Work?

XML is a meta-language, and provides rules to define a set of tags that can be used within a document, these same tags then being used to delimit an XML entity, its attributes, and its contents, and to define the semantics of syntax of these same elements. These tags are read by a XML processor, which in turn provides application access to the entities. The application, and there can be more than one, in turn performs one or more actions on the XML entities.

XML processors can be validating, which means that they make use of an associated DTD in order to ensure valid structures, or non-validating. Regardless of the validation used, XML documents can themselves be considered to be well-formed as long as it matches the syntax defined for an XML document, and each entity within the document meets the syntax for a well-formed XML entity.

In plain language, consider that a valid and well-formed XML document consists of the following EBNF format non-terminating symbols (non-terminating meaning that the symbols are themselves expanded elsewhere):

document::= prolog element Misc*

A complying document could be as simple as:

<?XML VERSION="1.0" ENCODING="UTF-8"?>
<ARTICLE name="XML" author="Shelley Powers"/>

This document consists of the prolog section which includes the XML declaration (“<?XML”) and includes the version number of the XML definition, as well as the encoding declaration. It also contains one element, ARTICLE, which has two attributes, NAME and AUTHOR. As the element is an empty element it ends with a forward slash to signal to the processor that the element contains no other content. This is necessary for a non-DTD (non-validating) document. Otherwise the XML processor would not know when to look ahead in the parsing for required element content. This is one of the key features of XML: forward processing information is embedded directly within the document negating the necessity of creating an associated DTD.

The example just provided is a well-formed document, but not a valid one as no DTD is provided to use for validation. The example also demonstrates the simplicity of XML. An even simpler version of the language would be:

<ARTICLE name="XML" author="Shelley Powers"/>

To make the document a valid one, I could have added a DTD for the ARTICLE element directly into the document, or linked in a DTD external file:

<?XML VERSION="1.0" ENCODING="UTF-8"?>
<!DOCTYPE article SYSTEM "article.dtd">
<ARTICLE name="XML" author="Shelley Powers"/>

XML in Action

Though new, there are several XML parsers that validate whether an XML document and its associated DTD fit the rules for a valid XML document. In addition, these same parsers may return the elements within a document, exposed in their tree like form, and that can be used by applications.

As an example of XML in action, Microsoft has defined an XML application it terms “Channel Definition Format”, or CDF. CDF files contain entities that describe the contents of an active channel. Following the accepted technique for XML, CDF files do not contain reference to a DTD file, and use “clues” embedded within the tags and tag definitions to provide forward looking information for the XML parser.

The implied purpose of CDF is to provide a document that defines the use of push technology at a specific web site, including which pages are to be displayed as channels, what are the icons to display, update schedules, etc. With this information the XML processor provides the key elements which a channels-based application can then use to control channel access of the web site, including which pages are updated and the schedule in which the pages are updated.

The following code shows the CDF file I have defined for use at my web site. The root element for the file is the CHANNEL element. It is the parent element for several other elements such as an ICON element, an ITEM element, and an ABSTRACT element. Each of the elements within the document may or may not have attributes, and a child element may in turn be the parent for another element:

<?XML VERSION="1.0" ENCODING="UTF-8"?>
<CHANNEL HREF="http://www.yasd.com/plus/index.htm" 
	BASE="http://www.yasd.com/plus/">
    <TITLE>YASD+</TITLE>
    <ABSTRACT>YASD+ pages, using the newest technologies</ABSTRACT>
    <LOGO HREF="http://www.yasd.com/mm/wide_logo.gif" STYLE="IMAGE-WIDE"/>
    <LOGO HREF="http://www.yasd.com/mm/logo.gif" STYLE="IMAGE"/>
    <LOGO HREF="http://www.yasd.com/mm/icon.gif" STYLE="ICON"/>
    <SCHEDULE>
	<INTERVALTIME DAY="1"/>
        <EARLIESTTIME HOUR="0"/>
        <LATESTTIME HOUR="12"/>
    </SCHEDULE>
    <ITEM HREF="http://www.yasd.com/samples/bytes/daily.htm">
	<LOGO HREF="http://www.yasd.com/mm/icon.gif" STYLE="ICON"/>
	<ABSTRACT>YASD Code Byte</ABSTRACT>
    </ITEM>
    <ITEM HREF="http://www.yasd.com/samples/bytes/cheap.htm">
	<LOGO HREF="http://www.yasd.com/mm/icon.gif" STYLE="ICON"/>
	<ABSTRACT>Cheap Page Tricks</ABSTRACT>
    </ITEM>
</CHANNEL>

Notice that the first line contains the XML declaration element, a version number, and an encoding declaration. The main entity within the document is the CHANNEL entity, enclosing other elements such as TITLE, ITEM, ABSTRACT, and LOGO. Each of these elements falls within the allowable XML definition for elements:

element ::= EmptyElemTag | STag content ETag 
EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'
STag :: = '<' Name (S Attribute)* S? '>'
ETag::= '</' Name S? '>'
content ::= (element | CharData | Reference | CDSect | PI | Comment )*

Without continuing to resolve the non-terminating references, what the syntax just shown states is that each element is either an empty element, in which case it ends with a backslash-angle bracket combination (‘/>’), or it has start and end tags which enclose content. A constraint, actually called a “well-formed” constraint, is placed on the start and end tags in that the NAME used in both is the same. The enclosed content can be other elements, comments, processing instructions, or other well formed XML entities. Both empty and non-empty elements can have zero or more attributes, as the following demonstrates:

<CHANNEL HREF="http://www.yasd.com/plus/index.htm" 
	BASE="http://www.yasd.com/plus/">
…
</CHANNEL>

or

<INTERVALTIME DAY="1"/>

Internet Explorer 4.0 has an associated XML parser that pulls out the element information from the document. IE 4.0 uses this parsed element information to create the channel for the web site, including the two sub-channel items.

Accessing the CDF file directly with IE 4.0 opens a dialog asking the individual how they would like to subscribe to the web site channel, and allowing the reader to determine how and when the channel contents are downloaded to the client machine.

The Uses of XML

You have had a brief overview of XML, and a chance to see it in action. You might be wondering now what the practical uses of XML would be. Well, as shown with the CDF example, XML can be used to define presentation and organization of documents for specific purposes.

Microsoft and Marimba, Inc. have also proposed a new use for XML called the Open Software Description format, which can be used to control the download an installation of software within a company. One real expense for larger corporations, especially those that are geographically distributed, is installing and maintaining software upgrades on the employees’ machines. One small upgrade to a popular piece of software can take days of planning and weeks of actual implementation. In the meantime while installation is happening, employees will have different versions of the same software, which can create problems. With OSD, software upgrades can be handled automatically using push technology. Incorporating this with the use of the same push technology to distribute information about upcoming software upgrades, and what was a logistics problem becomes much more routine.

SGML, and XML, has been used to create a Chemical Markup Language (CML). With this vocabulary molecular structures can be defined within a document and the information either posted or transmitted. XML processors can pull out the CML elements and pass these to applications which can do one of several things with the information such as prepare a hard copy print out of the information, either textually or graphically, or create an online three-dimensional model of the information using VRML or some other technology.

Netscape, Apple, and others have proposed a Meta Content Framework (MCF) created in XML that can expose a Web site structure for navigation or online exploration. MCF can be used to do such things as generate a three-dimensional site map, and can be used for web site publication and administration. The technology is currently used by Apple’s ProjectX/HotSauce browser, and “Xspace” compatible content can also be viewed using a plug-in available from Apple. Note: According to Netscape and the W3C, the MCF proposal has been absorbed into the Resource Description Framework (RDF) specification, and Netscape fully supports RDF. Netscape also states that it will support any standard, such as XML and RDF, as the standard moves from W3C working draft to recommendation.

Returning to SQL and relational databases, XML can be used to define a relational database meta-language, which can then be used to describe documents containing relational database information. These same documents can be easily generated from the relational database dictionaries, which are repositories of information about the information stored in the database. The XML language can then be used to create documents in context, such as “all information pertaining to any purchases, week of January 16 through January 23” rather than within the context-neutral database format. In addition, supporting information can be pulled into the document such as images, or reference material, information that is not part of the data in the database.

An XML processor can process this context-based data document, and the information can be used to present reports, for online research and queries, or even to create interactive three-dimensional models of the data. Instead of issuing a SQL statement such as:

select customer_name, customer_address, city, state, zip_code from customer, purchase_order
where purchase_order.order_id = 32245 and customer.customer_id = purchase_order.customer_id;

I could enter a three-dimensional VRML world at a purchase_order portal, and scan a virtual filing cabinet for my purchase order number. Once found I can open it into another room with doors including ones labeled “Purchase Order items” and “Customer” and follow Customer door into another room containing the information I am looking for. Best of all, the documents containing the context-based data could be generated automatically, processed automatically, and presented automatically. This means a change in the database table can be handled automatically.

If a three-dimensional database does not appeal, then consider that the data defined in the document can be used as a method to convert database data in one format, such as relational data, into another format, such as an object-based database.

Sound far-fetched? Not really. The resources section at the end of this article has a reference to a preliminary XML representation of a relational database.

In addition with XML processors or XML parsers if you prefer, the most difficult aspect of XML has already been implemented: pulling the entities out of the document.

Returning to the CDF example, not only can the document be used by Internet Explorer 4.0 to provide information about the structure of a web site’s channels, I can also access the XML entities within JScript, or C++, or Java and use the information for other purposes. As an example, the following JScript functions opens a CDF file, pulls out information about the elements contained within the CDF file, and prints this information out in a newly opened window.

<script language="jscript">
<!--
var doc = new ActiveXObject("msxml");
var wndw = null;

// display elements in CDF file
// file reference must be fully resolved Internet reference
function DisplayElements(cdffile)
{
// Display this with an appropriate message in a popup window
wndw = window.open("","CDFFile",
"resizable,scrollbars=yes");
wndw.document.open();
doc.URL = cdffile;

// begin displaying elements at root
displayElement(doc.root);

wndw.document.write("</body>");
wndw.document.close();

}

// display element tagname, if any
// and information about element such as any attributes (even if undefined for element)
// and text and element type
function displayElement(elem) {
if (elem == null) return;
wndw.document.writeln("<p>");
if (elem.type == 0)
    wndw.document.writeln("Document contains element with tagname: " + elem.tagName);
else
    wndw.document.writeln("Document contains element with no tagname");
wndw.document.writeln("<br>Element is of type: " + 
				GetType(elem.type) +"<br>");
wndw.document.writeln("Element text: " + elem.text + "<br>");
wndw.document.writeln("Element href: " + elem.getAttribute("href") + "<br>");
wndw.document.writeln("Element base: " + elem.getAttribute("base") + "<br>");
wndw.document.writeln("Element style: " + elem.getAttribute("style") + "<br>");
wndw.document.writeln("Element day: " + elem.getAttribute("day") + "<br>");
wndw.document.writeln("Element hour: " + elem.getAttribute("hour") + "<br>");
wndw.document.writeln("Element minute: " + elem.getAttribute("min") + "<br>");

// check to see if element has children
var elem_children = elem.children;
if (elem_children != null)
   for (var i = 0; i < elem_children.length; i++) {
      element_child = elem_children.item(i);
	displayElement(element_child);
   }

}

// element type
function GetType(type) { 
if (type == 0) 
	return "ELEMENT"; 
if (type == 1) 
	return "TEXT"; 
if (type == 2) 
	return "COMMENT"; 
if (type == 3) 
	return "DOCUMENT"; 
if (type == 4) 
	return "DTD"; 
else 
	return "OTHER";
}

//-->
</script>

You can actually try this yourself by accessing the demonstration Example.

Show me the Info

A key to the true usefulness of XML is that once an XML parser has been created to process an XML document, it can be used to parse out entity information from any document, containing any well formed XML content.

In the last section, I used Internet Explorer’s exposure of XML entities, attributes, and content to create a web page that listed the entities, their attributes, and some content. An interesting example, but not really useful. However, what if I were to define my own XML document, including my own XML entities and attributes, and then use the IE built-in XML parser to create my own graphic menu, Web page application? This is not only doable, it is fairly simple and only took a couple of hours of playing around to accomplish.

First, I defined my own “CDF” file, and created my own entities, as shown in the following code block:

<?XML VERSION="1.0" ENCODING="UTF-8"?>

<DOCUMENT >
    <TITLE>YASD+</TITLE>
    <STYLESHEET HREF="http://www.yasd.com/css/daily.css" />
    <ITEM HREF="http://www.yasd.com/plus/plus.htm">
	<IMAGE HREF="http://www.yasd.com/plus/logo.jpg">
	<ALT>YASD+ Main Page</ALT>
        </IMAGE>
    </ITEM>
    <ITEM HREF="http://www.yasd.com/samples/bytes/daily.htm">
	<IMAGE HREF="http://www.yasd.com/plus/logo.jpg">
	<ALT>YASD Code Byte</ALT>
        </IMAGE>
    </ITEM>
    <ITEM HREF="http://www.yasd.com/samples/bytes/cheap.htm">
	<IMAGE HREF="http://www.yasd.com/plus/logo.jpg">
	<ALT>YASD Cheap Page Tricks</ALT>
        </IMAGE>
    </ITEM>
</DOCUMENT>

I redefined what ITEM is, created a new root element called “DOCUMENT”, and added some new elements of IMAGE, STYLESHEET, and ALT. I followed the XML convention for well-formed entities, and opening up this document for parsing within IE 4.0 generates no errors.

I then created an application that consists of two frames, and that uses the images associated with the items to create a graphical menu bar in the top frame of the window, and set the link associated with each image to open in the bottom frame of the window. The window originally opens with the form to access the “CDF” file and process its contents. This form is then “overwritten” with the processing results. The code for the form and to process the form contents is shown in the next block:

 <script language="jscript">
<!--
var doc = new ActiveXObject("msxml");
var wndw = null;

var title = "";
var stylesheet = "";
items = new Array();
itemimages = new Array();
itemalts = new Array();
ct = -1;

function createWindow(cdffile)
{
doc.URL = cdffile;

// find main document and any associated item documents
findElements(doc.root);

// if associated documents
if (ct > 0) {
  var strng = "<HTML><HEAD><TITLE>" + title + 
	"</TITLE><LINK REL=STYLESHEET TYPE='text/css'" +  
	" HREF='" + stylesheet + "'></HEAD><BODY>";
  for (var i = 0; i <= ct; i++) 
     strng+="<a href='" + items[i] + 
		"' target='Body'><IMG src='" + itemimages[i] + "' ALT='" + 
		itemalts[i] + "' border=0>" + 
		"</a>"; 
  strng+="</BODY></HTML>";
  document.open();
  document.writeln(strng);
  document.close();
  }
}

// display element tagname, if any
// and information about element such as any attributes (even if undefined for element)
// and text and element type
function findElements(elem) {
if (elem == null) return;
if (elem.type == 0) {
    if (elem.tagName == "TITLE")
        title = elem.text;
    if (elem.tagName == "STYLESHEET")
        stylesheet = elem.getAttribute("href");
    if (elem.tagName == "ITEM") {
        ct++;
	  items[ct] = elem.getAttribute("href");
        }
    if (elem.tagName == "ALT") 
        itemalts[ct] = elem.text;
    if (elem.tagName == "IMAGE")
        itemimages[ct] = elem.getAttribute("href");
    }
        
// check to see if element has children
var elem_children = elem.children;
if (elem_children != null)
   for (var i = 0; i < elem_children.length; i++) {
      element_child = elem_children.item(i);
	findElements(element_child);
   }
}
//-->
</script>

I could have defined any elements within the XML document as long as I used well formed XML entities, and could process the results in virtually any way I want just using simple scripting techniques. All of this within an existing browser (Internet Explorer 4.01, on Windows 95).

Linking and Style information

In addition to the XML specification, other efforts are currently underway to add supporting specifications. The first is XML part 2, which includes linking. Another is XSL, Extensible Style Specification, which is a specification for an XML stylesheet.

Linking has been extended considerably with XML. One can specify an attribute that determines how a resource is displayed, whether the resource is displayed automatically, and can specify multiple layers of linkage. Of particular interest is being able to define a group of links, associating documents together in such a way that the person following the links does not have to hunt around trying to find related documents. If you have ever been to a web site page following a link from another site, you know how frustrating it can be to establish the “context” of the link in order to find related documents.

XSL would be specified using XML and would provide a means to define presentation elements, such as those used currently in HTML. Current examples include the emphasis element, delimited with <EM>…</EM> tags, the strong element, delimited with <STRONG>…</STRONG>, and others. However, with XSL, styles could be created to provide “recommendations” for how a XML entity is rendered. XSL is considered to be part 3 of the XML specification.

 XML is great, so What’s the downside

The concept of XML is great: provide an implementation neutral technique to define a structure for the contents of a document, which can then be parsed and the information used for multiple purposes, in multiple applications. However, it is the very nature of the flexibility of XML that may cause problems.

Returning one last time to my CDF example, I asked myself the question: What exactly can I do with the information pulled from the CDF files, and an answer that came to me was, create my own web page. And that’s what I did. I created a simple JScript application, which opens the main channel page and all the associated pages, into a frames-based web page. The main page opens into the topmost frame, and each individual CDF ITEM element opens into one of the smaller frames located along the bottom of the document.

This isn’t a problem for my own CDF file, which is relatively simple. However, applying the same application to another CDF file, one I neither created nor control, creates a web page that probably does not meet the expectations of the originator, in this the IDG Net channel.

To create this page, I used a publicly accessible file, IDG Net’s CDF file, and publicly exposed XML elements to create a presentation neither Microsoft nor IDG Net intended.

This demonstrates a side effect to XML, in that application purpose is implied but not guaranteed. Even with the new effort in XSL, currently only a W3C proposal, there is no guarantee that the information exposed so nicely, and accessible so easily, with XML will be used in anything approaching the intended purpose of the original XML document creator.

Another potential problem area with XML is apparent when one considers the CDF specification used throughout this document. The concept is great: a XML-based document that can be used by different push technology vendors and which can ensure relatively comparable results with each vendors implementation. But what happens if a vendor supports channels, but doesn’t want to use CDF? Do we then have a proliferation of channels flavored XML document specifications? Does the W3C then create a different standards specification for channels, another for chemistry, another for math, another for finance, and so on in order to ensure that only one specification for each “topic” or “business” is created? Or do we purchase tools with the sole purpose of translating between each of the XML document definitions?

Definitely issues though how serious they are is yet to be determined.

Summary

Even with the issues just discussed, and issues that I deliberately sought in order to present a comprehensive article, XML is a terrific addition to Web and other application development. The aspect in application programming that is most difficult when dealing with documents is extracting the structure as well as the contents of the document. Especially if the same document must be created, and read, by humans.

XML has made this process a whole lot easier. During the recent XML/SGML conference in Washington DC (December 10-12, 1997), XML has become a proposed recommendation of the W3C, the last remaining step before becoming a recommendation. As has been demonstrated throughout this article, even in its early infancy there are practical uses for, and applications created from, XML. It may be only a matter of time before XML is just as common a term within corporations as SQL is today.

Categories
Specs Standards

Sugar and spice

Recovered from the Wayback Machine.

I finally found out what was causing the problems with the post When We Are Needed in IE: it’s called the “Magic Creeping Text” bug. It’s caused by having a left border for a blockquote (or other marginalized blocks), without having an accompanying bottom border. I’ve since fixed the bug, by adding a bottom border the same color as the background.

I found out about this bug through a post that Molly Holzschlag published that referenced another post written by Chris Wilson, a member of the IE team, that listed it among the fixed bugs in IE7. When I saw the title of the bug, “Magic Creeping Text”, I knew it was my bug and sure enough a search on that term returned a description of the problem and the workaround.

Chris published his post because Microsoft has been taking a lot of heat for the release of IE7, and the fact that this first beta release hadn’t fixed some of these longterm bugs. He wanted to reassure people that the next beta release will have these bugs fixed, and to be patient.

The WaSP organization has shared in some of the heat, primarily because members such as Molly have been very supportive of Microsoft, especially since Microsoft has invited the WaSP members in to work with the organization to ensure a standards compliant browser. Many people in the web development community feel that WaSP has been romanced by Microsoft into pulling in its stinger, and I will have to admit that the WaSP of today is very different of the one from several years ago.

I remember back in the late 90′s, when the Mozilla development project was building it’s infrastructure that would eventually not only become the foundation for Mozilla and Firefox, but also Thunderbird and a host of other tools. I watched the members of the development team, many of whom worked for Netscape at the time, as they created a brilliant component-based architecture that I knew was going to be capable of amazing things. And, as we have seen, it has been.

This, however, slowed up the development of the tool, and at times the browser development side was slow in responding with new browser releases fixing this standards bug or that. Well, this pissed off the WaSP folks, who started a campaign to harrass, and there is no other word for it, Mozilla into dropping its development on all that ‘fancy stuff’ and refocus back on delivering a browser that was standards compliant.

I wrote a couple of articles for publications about the potential of the Mozilla framework (including Digital Play Dough, Designing Applications with XUL, Web Techniques, 2000 and Browser, Browser Not for O’Reilly), but the WaSP wasn’t having any of it: that organization was Peeved at Mozilla for not delivering a standards-based browser right now.

So then I wrote Tyranny of Standards, saying:

I’ve long been a fan of the W3C, and I think that the Web and the Internet would be a much more chaotic environment without this organization. However, my fondness for the W3C does not necessarily extend itself to the WSP.

If you haven’t heard of the WSP, it is an example of what happens when standards enforcement is left to the masses. This organization’s intentions are pure: It’s a nonprofit organization of Web developers, designers, and artists who encourage browsers to support standards equally and completely. However, somewhere along the way, the WSP took on the aspect of a holy war, a Web jihad.

The WSP’s behavior is tantamount to lynch mob justice. After all, there are no gray areas of justice: only black and white, right or wrong. The same can be said of support for the enforcement of standards: A company supports standards 100 percent, or the company is noncompliant and, therefore, evil.

Note that I agree with the WSP in spirit: Our lives would be much easier if Microsoft and Mozilla and Netscape would support the W3C specifications fully and equally. I’m more than aware of the cost of having to write different Web pages for different browsers because each has implemented technologies in a different way. I’ve been doing this for years.

However, I’ve also benefited when an organization has expressed an innovation that exists outside of a specification, such as the aforementioned innerHTML, or Mozilla’s support for XUL (Extensible User Interface Language). If having all browsers be 100 percent standards compliant means not having access to these innovations, then I’ll take noncompliance even if it does mean extra effort to compensate for differences.

I encourage Microsoft and Mozilla and Netscape to support the W3C specifications and other standards, but I also encourage these same organizations to continue their innovative efforts, even if the result is a bit of chaos in a world that would otherwise run smoothly, and without a wrinkle.

And who’s to say that a little chaos is such a bad thing?

Oh, my, didn’t I hear about this post. You can see from the reader comments that few people agreed with me. Most disagreed with the words, but more than a few responded at a very personal level:

Tim Bray:

In words of one syllable (the apparent level of discourse here): It is good to add new stuff, OK?

Is this hard to understand?

legLess:

In closing, I’m frankly surprised that O’Reilly would post a piece so obviously inflamatory. There are no hard facts here, just wild and unspecific accusations. The only people who could take this fluff seriously are those completely ignorant of the subject to begin with, and that’s a sad disservice to the web at large.

Tyson Kingsbury:

While the article is well written, it seems to me that it shows the glaring difference between those that ‘do’ and those who only write about it.

I am a web designer. It’s my humble opinion that if Shelley Powers were too, this article would have been very different….Web jihad indeed…hahaha

(Author’s note: I’ve been working with web application development and design since 1994…)

Lauren B:

Content free article.

and so on

The comments weren’t just restricted to the article’s comment section. (Even showing up in later years.)

Web design and standards compliance in browsers has long been an emotionally laden topic, as designers and web page developers have been caught between client’s unrealistic expectations and inherently buggy browsers and inconsistent application of specifications. I was philosophical about the reaction, knowing that I had used the Marketing 101 technique of “Kicking the Bear” to get my point across — taking an outrageous point of view, to make people realize that perhaps their own perspective is equally unrealistic, as they argued through why my opinion sucked and I was an idiot.

I’ve since become friends with many of the people who disagreed with me, and even worked with one of them (Simon St. Laurent) as editor of my book on RDF. The point is, I knew that I was going to generate discussion, and much of it unhappy discussion, and had to accept responsibility for the reactions to my writing.

Fast forward to 2005 and WaSP, the same WaSP that started a campaign to send obnoxious email to web designers telling them their pages were not standards compliant, is now working hand and glove with Microsoft. More, telling web designers to ‘be patient’ because IE 7 is beta and the company is trying. As Molly wrote:

As a fellow WaSP Microsoft Task Force member bluntly pointed out to me as I was trying to strategize how to respond to upset developers, WaSP should never act as Microsoft’s public relations department. And he’s absolutely right. WaSP isn’t here to forgive Microsoft for past practices.

However, as the relationship person here, I can only do my honest best to communicate both sides of what is clearly a complex concern. I can only work to assure you that I, and everyone within this Task Force is extremely motivated to make sure we keep things positive, honest, and respectful so we can continue to work together and hopefully, once and for all, achieve the goals we didn’t succeed at before

WaSP’s continued effort to work with rather than against Microsoft at a very frustrating time in history means that we all have to have patience, and we have to ask everyone to have patience with us in kind. This isn’t easy for anyone, not the Microsoft developers, not WaSP as an organization and of course not the working Web designer and developer.

Having felt the sting of the angry WaSP in the past, I will have to admit that my own jaw dropped when reading a WaSP member telling developers to be patient. With Microsoft of all companies.

Frankly, it was going against human nature to ask web page developes–frustrated for seven years with having to deal with IE bugs, all the while listening to Bill Gates smugly telling business what a superior product IE is–to focus purely on constructive criticism. Good intentions of the IE team aside, Microsoft sat on a buggy browser for years after crushing Netscape, and only now, after the growing success of Firefox, has the company responded–like a slow moving dinosaur, message finally reaching its tiny brain that someone kicked its tail months ago. The WaSP organization should have expected to take some heat.

And heat it did get, if comments in Molly’s post are anything to go by. For the most part, the heat has been directed at Microsoft, and some, indirectly, at WaSP, as an organization. In fact, unless there were a lot of personal emails and IM messages that said otherwise, there was no personal attacks in any of the commentary.

However, I can understand that not all communication happens in the open, so I wasn’t surprised to read today that Molly had been getting some flack, personally, for her defense of Microsoft and the IE team. I wouldn’t have blamed Molly for telling people to f**k off, the team is doing the best it can and to be patient for crissakes.

What I wasn’t expecting was to read the following:

Somehow by being an advocate and defending Microsoft and doing one thing – asking for patience from the community while all this unravels – has made a lot of people mad at me. This includes friends, some within WaSP and at least two I really have deep personal feelings for. That hurt so much I crawled into a bottle of wine and cried for most of the day.

I’m a sensitive girl.

For some, the idea of standards implementation is work-related, placed in a box, not worried about beyond the end of the day. For me, it’s religion. Why? I really don’t know the full answer to that, but I do know that it has to do in part with wanting to do something that strengthens the foundations of a technology I truly believe can, does and will continue to change the world in positive ways. Give something to the world that matters before I die.

Some women have families, husbands, children and other passions besides their careers. I don’t have those things. Unless I’m at a conference socializing with Web people, I live alone, eat alone, drink alone and mostly move through the world alone caring about the Web and the people who work it with a consuming, fiery passion. You can make fun of me all you want, say I’m wasting my time, I’m Don Quixote, self-destructive, I’m tilting windmills, I should get a life, I’m a dreamer, an idealist, a stupid girl.

I’m a sensitive girl. Some women have families, husbands, children and other passions besides their careers. I’m Don Quixote, self-destructive, I’m tilting windmills, I should get a life, I’m a dreamer, an idealist, a stupid girl.

And in comments, person after another writing, “You go, girl!” and one writing: anybody who makes my little girl cry again will get their kneecaps readjusted.

I wrote in comments:

I do find that WaSP’s response to Microsoft’s effort to be a puzzle after what the group did to Mozilla about five years back. When one considers that it has taken Microsoft what, those same five years and more to finally start fixing these problems I can understand both the frustration and wariness. I would have been surprised if the WaSP expected anything less.

Having said that, I don’t think anyone should have personally attacked you, and wasn’t aware that they had. From comments I read attached to the post, it seemed more that they were angry at WaSP and Microsoft. If you were personally attacked, of course it’s wrong.

As for being a ‘sensitive girl’, and mentioning not having family, friends, etc. not sure what this has to do with your position in WaSP or your being a technologist or even your being an advocate.

I can empathize with Molly if she wants to react to being hurt by friends by crying or spending a day with a bottle of wine. Each of us reacts to hurt in our own ways. I used to cry, then I used to swear a lot and, lately, I take walks and sometimes they are very sad, and very quiet walks– but each individual must deal with hurt in their own way.

What I found troubling and disconcerting was Molly’s emphasis on being a girl–as if somehow this made the reactions that much more heinous.

Molly responded to comments, mine and others, with one of her own:

Thanks for all the kind words, folks. I needed some love as I was feeling pretty beat up there.

Many people have pointed out that taking any stand when it comes to Microsoft is going to arouse anger and frustration. Intellectually, I knew that, but until I began getting emails the other day calling me a ‘whore for satan’ and questioning my personal agenda ‘oh, you just want to keep yourself close to the consulting gigs’ and otherwise stating that what was perceived as my apologetics on behalf of Microsoft was the wrong thing to do, I had to face up to a fact I prefer to ignore: people sometimes really suck.

And once again, I’ve been asked to explain why there’s no apparent separation between the personal and the professional in my writing. Shelley says:

‘As for being a ‘sensitive girl’, and mentioning not having family, friends, etc. not sure what this has to do with your position in WaSP or your being a technologist or even your being an advocate.’

Shelley, first, please don’t misquote me – I never wrote I don’t have family or friends. I referred to husband, children and outside passions. I’m really struggling to get this communicated properly: there is no separation from the flesh-and-blood-person that I am and what I do in my career.

I am not compartmentalized. I realize that’s a fairly unique quality, and I also know that I seem to generally feel more emotion than most people. That passion and unity of vision is what enables me to do the amount of work I do, to achieve what I hope are good things for the Web and for the community of designers and developers with whom I work.

I don’t think that’s ever going to change. Even if one day I decide to stop blogging or walk away from the Web (and I actually see that happening at some point) I will still be the same way. My mother tells me I was like that from birth, and here it is 42 years later: singleminded, stubborn, highly emotional and exceptionally productive.

No one is asking Molly to become an automaton, and not to react emotionally to such personal and vicious attacks. And if someone referred to Molly as a whore for Satan, then they used Molly’s sex as a weapon to attack her at a personal level, like so many others have done in the past –using a woman’s sex in stereotypical terms as a weapon. To this person, throwing Molly’s femaleness back at her, using ‘whore’, was the worst that they could do. It was the ultimate insult. You’re not only a woman but you’re a bad woman, as society judges women.

If Molly wanted to re-assert that yes, she is a women, but what does that and her supposed sex life have to do with her work with WaSP, good on her. And if she wanted to respond that, yes, she was hurt by such a personal attack, damn straight she should be hurt–angry, too. But how did Molly respond? She used her sex as a shield. I am a sensitive girl she writes.

I am a sensitive girl.

When you pick up a shield made of the same material as the sword being used to attack you, you don’t turn the attack; all you do is validate the use of the sword.

I had other things to write this weekend, but first I have to rediscover the reasons for doing so. I’m going for a walk.

Some of Molly’s commenters have said that I’m overreacting. That Molly was just talking about herself, and her reference to herself as ‘girl’ was part of it. Nothing more, nothing less.

Their point is good and perhaps I did overreact. I am sensitive to being a woman in tech, and how others perceive women in tech. And if I dislike guys playing the ‘girl’ card, I dislike women doing the same. However, there is no indication that’s what Molly was doing. My apologies to Molly if I caused her additional hurt.

Categories
Specs

Knots

I’ve been quieter than anticipated this week, primarily because I’m working on a very long essay, which I should be able to post tomorrow. I hope so because I need to finish my work for Roger at JournURL especially since I keep causing him work (”Say, Roger, you know wouldn’t this be nice if…”) Beware you sons and daughters of the computer, of the Mark of the Documentor.

As much as I need to finish work this week, and get outside more, I am very glad for the essay I’m writing. It’s helped me look more closely at some of the frustrations I’ve experienced the last few years, and more closely at some of the anger, too. Both aren’t necessarily gone–just better understood, which is more important. The boogeyman is just a heap of clothes when you turn on the lights. But we need the boogeyman.

In the meantime, the Atom 1.0 specification was released and there was a lot of back and forth on this, ignored by many, because most of us will produce and consume both, anyway. Ignored, that is, other than to give a nod of thanks to both sides, and say well done, because we shouldn’t take either the producing or the consuming for granted. This was hard work, and hard work should always be appreciated.

And I like how Tim Bray has learned how to apply Marketing 101: Kicking the Bear.

Categories
Specs

Google doesn’t REST

Thanks to Sam Ruby for a heads up on a potentially nasty problem with Google’s new Web Accelerator, and badly designed REST applications. He linked to two sites that go into the details. The short version is that users of a specific web service were finding that they were losing data and after investigation, the service discovered that the Web Accelerator was the culprit.

The Web Accelerator is one of Google’s newest releases, and supposedly will help with the server-side backlogs that can occur when you’re accessing a site with a faster DSL or cable connection. How it works is that when you navigate to a site it ‘pre-fetches’ information by clicking on all the various links and, it would seem, caching the results.

All dandy (if confusing — we asked for this?) except for one thing: since this little deskside bot operates on a page under your name and authority at whatever site you’re at, if you’re at a web application that has links that do things such as ‘delete’ a web page or make some other form of update, the bot is just as happy clicking those links as not. Even if there is a Javascript alerts that should say, “Are you really sure you want to do that?”, it manages these behind the scene. Before you even know what’s happened, your data is gone.

Leaving aside that perhaps this won’t rival Google Maps for being handy, this bot does prove out a problem that people like Sam have been pointing out for some time — we’re using REST incorrectly, and because of this, we’re going to get bit in the butt some day.

Well, there’s a rottweiler hanging off some asses now, and it has “Google” on its name tag. (Uh, no metaformat pun intended.)

REST is an extremely simple web application protocol, and is what I’m using for my metadata layer in Wordform. Before I started implementing it, I researched around at what is needed for an application to be RESTful and the primary constraints are knowing when to use GET and when not to use GET. Really, really knowing when not to use GET.

You’re familiar with GET operations. In simplest form, when you search in a search engine, or access a weblog entry and there’s a URL with a bunch of parameters attached, that’s a GET request being made to the server. In this case, the parameters are passed as part of the URL.

Lots of applications have been using GET, not only to fetch information, but also to create or remove resources or make updates. However, what they should be doing is using methods other than GET, because this HTTP request type is only supposed to be used for operations that don’t have side effects. In other words: you can invoke the same service again and again and nothing will happen to the data with each iteration. Because of this, it’s also an operation that usually isn’t overly protected, other than perhaps a login being required to access the page or service. To the non-tech, it’s a link.

For operations that change data, we should be using POST, PUT, and DELETE. These requests are different in that all three have side effects. A POST is used to create a resource; PUT to update it; and DELETE to remove it.

These types of operations are associated with a certain sequence of events–you click some kind of Submit button and usually another page or alert box opens up that perhaps asks if you’re really sure you want to do this; you don’t see the parameters, and you don’t even necessarily see the service the request went to before a response page is shown. They are not links, and they don’t have the same global accessibility that a link has.

More importantly, potentially destructive agents such as Google’s Web Accelerator can’t do damage when you use the four REST commands correctly.

Today I tried to run my metadata flickr update application on my new weblog post and the flickr API is not responding. Since flickr is not using REST correctly –it’s using GET operations for events that have side effects–I am assuming that the web service is offline while the folks work on this. I haven’t been able to find an update anywhere on this, though, so this is my assumption only. Since I’m only using flickr for fetches, hopefully this won’t result in me having to change my code.

As for my metadata layer–it’s not as open as some of the applications that have used all GETs, but it isn’t fully RESTful either, which is why it won’t be released until it is–not when Google releases such a potentially harmful application. To be honest, though, my own pride should demand that if I’m going to use a specific protocol, I use it correctly.

Bottom line: Do not use Google Web Accelerator unless you know all web service applications you use are fully RESTful. If you do, you’ll most likely be unhappy as you watch your data disappear.

Two excellent articles on how REST works: Joe Gregorio’s How to Create a REST Protocol and Dare Obasanjo’s Misunderstanding REST: A look at the Bloglines, del.icio.us and Flickr APIs.

Oh, and don’t miss Phil’s Launch the Nuclear Weapon.