Interoperability Standards Are Your Friend

Date Posted: 25 June 2005
Last Updated: 16 September 2005

Background

In May of 2005, my friend, fellow grad student, and (most recently) co-worker Dan Noland posted a log entry expressing indignation regarding flagrant abuse of well-understood standards by popular C/C++ compilers. While I appreciated his post on a surface level, I couldn't get very worked up about it. In retrospect, the reason I couldn't get worked up about it is because I hadn't been the one who'd been bitten by brain-dead, non-compliant software.

That has since changed, so thus begins the rant.

A Brief History of Ti-- Err, Web Standards

The native language for Web documents is the Hyper-Text Markup Language (HTML). It has gone through a couple of releases as more and more functionality was requested from web browsers. The official specification for HTML can be found here. It very clearly spells out exactly what is valid HTML and how it should be rendered on the screen.

XHTML is a reformulation of HTML 4 into XML. XHTML basically "beefs up" the HTML standard so that the resulting code is XML-compliant. This means that all XHTML documents are guaranteed to be parseable by XML parsers while still maintaining backwards-compatibility with HTML parsers. The two main advantages of XHTML in my opinion are that XHTML is much more precise than HTML, and that XHTML documents can be parsed with an XML parser that, assuming it was well-implemented, should be both smaller and faster than an HTML-only parser.

The Problem

One weekend, when I clearly had far too much free time on my hands, I decided to migrate my home page from the ancient, non-standard HTML it was written in back in undergrad to XHTML. It took a goodly while as I quickly realized that there was unexpected learning I had to do, such as my first introduction to Cascading Style Sheets (CSS).

After a handful of curse-filled hours, the website was in a state I was happy with, not to mention it was compliant with the XHTML 1.0 Strict standard, verified by the W3C itself. I considered the project as a success and proceeded to spend the remainder of my weekend in far more social and healthy ways.

The next week at work, I showed my new homepage to a co-worker. We pulled up my page in his default browser, which happened to be Internet Explorer 6.0, and I had a very rude shock. The table that contained the links on the page wasn't centered as it was supposed to be. I figured that I'd forgotten to save the last batch of changes I'd made, so I wandered back to my computer and pulled it up. The table was centered, just like it should be. The only difference I could see? I was viewing the webpage in Firefox 1.0.4.

Here is the page in question. In Firefox, the main table centers as it should. In IE6, the table stubbornly sticks to the left-hand side of the screen. You can click the "W3C XHTML 1.0" button at the bottom and it will assure you that the code in question does, in fact, meet the XHTML 1.0 Strict standard.

"Waittasecond," I thought to myself, "the whole goal of this was to write an XHTML document that that conformed to a very strict standard. As such, there should be zero ambiguity about how it should be rendered by a browser. How could this be?"

XML Prologue

The almighty Google quickly revealed some disturbing information. I found a forum discussion that indicated that Internet Explorer did not like the "XML prologue" I was using. This prologue was the first line of the XHTML document which contained this:

<?xml version="1.0" encoding="utf-8"?>

The very next HTML element in the document is a DOCTYPE statement which should force the browser into standards-compliant mode. However, Internet Explorer 6 does not behave that way. Internet Explorer 6 sees the XML prologue, decides it doesn't know what to do with it, and decides to switch to "quirks mode." As Patrick Griffiths put it, when the browser is in quirks mode it "...will think that you don't know what the hell you're doing and make up [its] own mind on what to do with your code. You can be the greatest HTML ninja ever to have walked the earth. Your HTML can be flawless and your CSS simply perfect, but ... your web pages can look like they were put together by a short-sighted, one-eyed infant gibbon with learning difficulties."

(Edit: after trading e-mails with Scott Yost, I feel I should stress the fact that I'm a big fan of browsers having a quirks mode; the number of good, compliant web pages is tiny compared to the number of pages with garbage HTML that some high-schooler shotgunned into his editor. I just feel strongly that this is a case where IE shouldn't have fallen back to quirks mode.)

At this point, I figured the problem was mine since I was very new to XHTML. So I started questioning my assumptions. Where had I gotten the idea to include that XML prologue anyway? Oh, yes, that's right; I got it from a fairly shady reference: the XHTML specification itself.

In section 3.1.1 of the specification, misleadingly titled (*cough*) "Strictly Conforming Documents," they give an example XHTML document -- including the XML prologue. In fact, in that section of the XHTML specification, they quite explicitly say that the XML prologue should be included:

An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents.

This makes sense. An XHTML document is, by definition, an XML document. In section 2.8 ("Prolog and Document Type Declaration") of the specification for XML documents, it says:

XML documents SHOULD begin with an XML declaration which specifies the version of XML being used.

Determinations

Internet Explorer 6 publicly claims to be XHTML-compliant. However, due to poor adherence to a widely-accepted, peer-reviewed, and unambiguous standard -- contrary to their marketing -- Microsoft's Internet Explorer 6 incorrectly renders completely valid XHTML documents, as verified by the standards organization that wrote the XHTML standard.

If I leave the XML prologue out of the document, as I do here, IE and Firefox both render the document correctly. I heartily encourage everyone to diff this version of the file with the one above that renders differently between the two browsers -- the only difference is the XML prologue.

Solution

My homepage's PHP code now has a workaround in it. The webserver queries the browser to see if it can support the application/xhtml+xml content type. In Section 5.1 ("Internet Media Types") of the XHTML specification, it says that XHTML content may be served as text/html or application/xhtml+xml.

IE 6 only supports the text/html content type. In that case, my homepage is sent to IE6 as text/html without the XML prologue.

Firefox, on the other hand, does support application/xhtml+xml. In this case, the server sends the homepage as application/xhtml+xml with the XML prologue.

The homepage now renders correctly in both browsers. You can rest assured that I'm only going to use Firefox for the forseeable future, however.

2005-09-16: Bug Fixed in IE 7

The ever-vigilant Scott Yost found a recent entry on the IE 7 Developer Blog that says this exact bug is being fixed! They say they won't be supporting the application/xhtml+xml content type, which is slightly disappointing. Since supporting it is optional I can easily forgive that decision, however.

I felt the bug was serious enough to spend a signficant portion of a weekend afternoon writing this entry about the bug. I'm glad to hear that the IE 7 development team agreed with me.


Copyright © 2005-2017, Terry D. Ott

Valid XHTML 1.0!