Home
top

XHTML

It is possible to use HTML 4.01 to build modern, structured, and standards compliant websites. However, to make the transition to clean, semantic markup, and be better prepared for a possible transition to XML and other future markup languages, XHTML 1.0 Strict is what I recommend using for new websites, and XHTML 1.0 Strict is what is used in the examples in this document.

XHTML 1.0 is a reformulation of HTML 4 in XML 1.0, and was developed to replace HTML. XHTML 1.0 Strict, which is what I recommend using, does not allow presentational markup (neither does HTML 4.01 Strict, but XHTML is what I’m focusing on here). Because of this, XHTML 1.0 Strict enforces separation of structure from presentation.

XHTML 1.1, which is the latest version of XHTML, is technically a bit more complicated to use, since the specification states that XHTML 1.1 documents should have the MIME type application/xhtml+xml, and should not be served as text/html. It isn’t strictly forbidden to use text/html, but it is not recommended. XHTML 1.0 on the other hand, which should use application/xhtml+xml, may also use the MIME type text/html, if it is HTML compatible. The W3C Note XHTML Media Types contains an overview of MIME types that are recommended by the W3C.

Unfortunately, some older web browsers, and Internet Explorer, do not recognize the MIME type application/xhtml+xml, and can end up displaying the source code, or even refuse to display the document.

If you want to use application/xhtml+xml, you should let the server check if the browser requesting a document can handle that MIME type, and in that case use it, and use text/html for other browsers.

If you’re using PHP for server side scripting, the following content negotiation script can be used to serve documents with different MIME types for different browsers:

<?php
if (stristr($_SERVER[HTTP_ACCEPT], 
			"application/xhtml+xml") || 
stristr($_SERVER["HTTP_USER_AGENT"],
			"W3C_Validator")) {
    header("Content-Type: application/xhtml+xml; 
			charset=iso-8859-1");
    header("Vary: Accept");
    echo("<?xml version=\"1.0\" 
			encoding=\"iso-8859-1\"?>\n");
    }
else {
    header("Content-Type: text/html; 
			charset=iso-8859-1");
    header("Vary: Accept");
    }
?>

The script checks if the user agent sends an Accept HTTP header that contains the value “application/xhtml+xml”, or if the user agent is the W3C HTML Validator, which does not send a proper Accept HTTP header but still handles application/xhtml+xml. If either of those are true, the document is served as application/xhtml+xml. Those browsers are also sent an XML declaration. To other browsers, including all versions of Internet Explorer, the document is served as text/html. No XML declaration is added to the document, since that would put IE/Win into Quirks mode, which we don’t want.

After the Content-Type header, a Vary header is sent to tell intermediate caches, like proxy servers, that the content type of the document varies depending on the capabilities of the client which requests the document.

For a more advanced PHP content negotiation script, visit Serving up XHTML with the correct MIME type. That script takes the requesting user agent’s q-rating (how well it claims to handle a certain MIME type) into account, and converts XHTML to HTML 4 before sending the document as text/html to user agents that don’t handle application/xhtml+xml.

Here is a similar script for those who use ASP and VBScript:

<%
If InStr(Request.ServerVariables("HTTP_ACCEPT"), 
			"application/xhtml+xml") > 0 
Or InStr(Request.ServerVariables("HTTP_USER_AGENT"), 
			"W3C_Validator") > 0 Then
    Response.ContentType = "application/xhtml+xml"
    Response.Write("<?xml version=""1.0"" 
	encoding=""iso-8859-1""?>" & VBCrLf);
Else
    Response.ContentType = "text/html"
End If
Response.Charset = "iso-8859-1"
%>

Note that when the MIME type is application/xhtml+xml, some browsers, for example Mozilla, will not display documents that contain errors. This can be a good thing during development, but may cause problems on a live site that gets updated by people who are not XHTML experts, unless you can ensure that all code stays valid. If that is the case, you may want to consider using HTML 4.01 Strict instead.

Here is a list of the things that are most important to consider when using XHTML 1.0 Strict instead of HTML:

  • Always use lower case, and quote all attributes: All element and attribute names must be in lower case. All attribute values must be quoted.

    Incorrect: <A HREF="index.html" CLASS=internal>
    Correct: <a href="index.html" class="internal">

  • Close all elements: In HTML, some elements don’t have to be closed. Such elements are automatically closed when the next element starts. XHTML does not allow that. All elements must be closed, even those that have no content, like <img>.

    Incorrect: <li>Item 1
    Correct: <li>Item 1</li>

    Incorrect: <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
    Correct: <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>

    Incorrect: <br>
    Correct: <br />

    Incorrect: <img src="image.jpg" alt="">
    Correct: <img src="image.jpg" alt="" />

  • Attributes can not be minimized: In HTML, certain attributes can be minimized. XHTML does not allow this.

    Incorrect: <input type="checkbox" id="checkbox1" name="checkbox1" checked>
    Correct: <input type="checkbox" id="checkbox1" name="checkbox1" checked="checked" />

  • Don’t use deprecated elements: Some elements and attributes that are allowed in HTML 4.01 Transitional and XHTML 1.0 Transitional are deprecated in XHTML 1.0 Strict (and in HTML 4.01 Strict). A few examples are <font>, <center>, alink, align, width, height (for some elements), and background.

Read more:

Doctype

Currently, very few HTML documents have a correct and full doctype, or DTD (Document Type Declaration). It used to be more decorative than functional, but starting a few years ago, the presence of a doctype can greatly affect the rendering of a document in a web browser.

All HTML and XHTML documents must have a doctype declaration to be valid. The doctype states what version of HTML or XHMTL is being used in the document, and is used by the validator when validating, and by web browsers to determine which rendering mode to use. If a correct and full doctype is present in a document, many web browsers will switch to standards mode, which means that they will follow the CSS specification closer. The document will also render quicker because the browser doesn’t have to interpret and try to compensate for invalid HTML. This will also reduce the difference in rendering between browsers.

The following doctype declares that the document is XHTML 1.0 Strict, and will make the web browsers that have so called “doctype switching” use their standards mode.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Read more:

Character encoding

All XHTML documents should specify their character encoding.

The best way of specifying the character encoding is to configure the web server to send an HTTP content-type header with the character encoding. For detailed information on how to do this, check the documentation for the web server software you are using.

If you’re using Apache, you can specify the character encoding by adding one or more rules to your .htaccess file. For example, if all your files use utf-8, add this:

AddDefaultCharset utf-8

To specify a character encoding for files with a certain filename extension, use this:

AddCharset utf-8 .html

If your server lets you run PHP scripts, you can use the following to specify the character encoding:

<?php
    header("Content-Type: application/xhtml+xml; 
			charset=utf-8");
?>

To serve your pages as HTML, change application/xhtml+xml to text/html. If you, for whatever reason, are unable to configure your web server to specify the character encoding you are using properly, use a <meta> element in the document’s <head> section. It’s a good idea to specify the character encoding this way even if your server is configured correctly.

For example, the following <meta> element tells the browser that a document uses the ISO-8859-1 character encoding:

<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1" />