Date Posted: 25 June 2005
Last Updated: 16 September 2005
In May of 2005, my friend, fellow grad student, and (most recently)
co-worker Dan Noland posted a
log entry expressing indignation regarding flagrant
abuse of well-understood standards by popular C/C++ compilers.
While I appreciated his post on a surface level, I couldn't get very
worked up about it. In retrospect, the reason I couldn't get worked
up about it is because I hadn't been the one who'd been
bitten by brain-dead, non-compliant software.
That has since changed, so thus begins the rant.
The native language for Web documents is the Hyper-Text Markup
Language (HTML). It has gone through a couple of releases as more
and more functionality was requested from web browsers. The official
specification for HTML can be found
here.
It very clearly spells out exactly what is valid HTML and how
it should be rendered on the screen.
XHTML
is a reformulation of HTML 4 into XML. XHTML basically "beefs up"
the HTML standard so that the resulting code is XML-compliant. This
means that all XHTML documents are guaranteed to be parseable by
XML parsers while still maintaining backwards-compatibility with
HTML parsers. The two main advantages of XHTML in my opinion are
that XHTML is much more precise than HTML, and that XHTML
documents can be parsed with an XML parser that, assuming it was
well-implemented, should be both smaller and faster than an
HTML-only parser.
One weekend, when I clearly had far too much free time on my hands,
I decided to migrate my home page
from the ancient, non-standard HTML it was written in back in undergrad
to XHTML. It took a goodly while as I quickly realized that there was
unexpected learning I had to do, such as my first introduction to
Cascading Style Sheets (CSS).
After a handful of curse-filled hours, the website was in a state
I was happy with, not to mention it was compliant with the
XHTML 1.0
Strict standard, verified by the W3C itself. I considered the project as a success and
proceeded to spend the remainder of my weekend in far more social
and healthy ways.
The next week at work, I showed my new homepage to a co-worker. We
pulled up my page in his default browser, which happened to be
Internet Explorer 6.0, and I had a very rude shock. The table that
contained the links on the page wasn't centered as it was supposed to
be. I figured that I'd forgotten to save the last batch of changes
I'd made, so I wandered back to my computer and pulled it up. The
table was centered, just like it should be. The only difference I
could see? I was viewing the webpage in Firefox 1.0.4.
Here is the page
in question. In Firefox, the main table centers as it should.
In IE6, the table stubbornly sticks to the left-hand side of the
screen. You can click the "W3C XHTML 1.0" button at the bottom and
it will assure you that the code in question does, in fact, meet the
XHTML 1.0 Strict standard.
"Waittasecond," I thought to myself, "the whole goal of this was to
write an XHTML document that that conformed to a very strict standard.
As such, there should be zero ambiguity about how
it should be rendered by a browser. How could this be?"
The almighty Google quickly revealed some disturbing information. I found a forum discussion that indicated that Internet Explorer did not like the "XML prologue" I was using. This prologue was the first line of the XHTML document which contained this:
<?xml version="1.0" encoding="utf-8"?>
The very next HTML element in the document is a DOCTYPE
statement which should force the browser into standards-compliant mode.
However, Internet Explorer 6 does not behave that way. Internet
Explorer 6 sees the XML prologue, decides it doesn't know what to do
with it, and decides to switch to
"quirks mode." As
Patrick Griffiths put it, when the browser is in quirks mode it
"...will think that you don't know what the hell you're doing and
make up [its] own mind on what to do with your code. You can be the
greatest HTML ninja ever to have walked the earth. Your HTML can be
flawless and your CSS simply perfect, but ... your web pages can look
like they were put together by a short-sighted, one-eyed infant gibbon
with learning difficulties."
(Edit: after trading
e-mails with Scott Yost, I feel I should stress the fact that I'm a
big fan of browsers having a quirks mode; the number
of good, compliant web pages is tiny compared to the number of
pages with garbage HTML that some high-schooler shotgunned into his
editor. I just feel strongly that this is a case where IE shouldn't
have fallen back to quirks mode.)
At this point, I figured the problem was mine since I was very new to
XHTML. So I started questioning my assumptions. Where had I gotten
the idea to include that XML prologue anyway? Oh, yes, that's right;
I got it from a fairly shady reference: the
XHTML specification itself.
In
section
3.1.1 of the specification, misleadingly titled (*cough*)
"Strictly Conforming Documents," they give an example
XHTML document -- including the XML prologue. In fact, in
that section of the XHTML specification, they quite explicitly say that
the XML prologue should be included:
An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents.
This makes sense. An XHTML document is, by definition, an XML document. In section 2.8 ("Prolog and Document Type Declaration") of the specification for XML documents, it says:
XML documents SHOULD begin with an XML declaration which specifies the version of XML being used.
Internet Explorer 6
publicly
claims to be XHTML-compliant. However, due to poor adherence
to a widely-accepted, peer-reviewed, and unambiguous standard --
contrary to their marketing -- Microsoft's Internet Explorer 6
incorrectly renders completely valid XHTML documents,
as verified by the standards organization that wrote
the XHTML standard.
If I leave the XML prologue out of the document, as I do
here, IE and Firefox both render
the document correctly. I heartily encourage everyone to diff this
version of the file with the one above that renders differently between
the two browsers -- the only difference is the XML
prologue.
My homepage's PHP code now has a workaround in it. The webserver
queries the browser to see if it can support the
application/xhtml+xml content type. In
Section
5.1 ("Internet Media Types") of the XHTML specification, it says
that XHTML content may be served as
text/html or application/xhtml+xml.
IE 6 only supports the text/html content type. In that
case, my homepage is sent to IE6 as text/html
without the XML prologue.
Firefox, on the other hand, does support
application/xhtml+xml. In this case, the server sends the
homepage as application/xhtml+xml
with the XML prologue.
The homepage now renders correctly in both browsers. You can rest
assured that I'm only going to use Firefox for the forseeable future,
however.
The ever-vigilant Scott Yost found a recent entry on the
IE 7 Developer Blog that says this exact bug is being fixed!
They say they won't be supporting the
application/xhtml+xml content type, which is slightly
disappointing. Since supporting it is optional I can easily forgive
that decision, however.
I felt the bug was serious enough to spend a signficant portion of a weekend afternoon writing this entry about the bug. I'm glad to hear that the IE 7 development team agreed with me.