the iCite net > news / blog > a permalink

news and thoughts on and around the development of the iCite net
by Jay Fienberg

To X or not to X HTML, that could always be a fine question

posted: Aug 3, 2003 4:46:25 PM

MikeyC left some good comments on my posts about XHTML (XHTML the way mom used to make it and Mark's dips into XHTML). What follows are my responses to his comments.

I started using HTML around version 2 after I had some experience with SGML before that. And, from the perspective of SGML parsing, I see no big reason for XHTML. In other words, if someone creates a well-formed HTML document, it should be able to be cleanly parsed using an SGML parser.

An XHTML document isn't necessarily more likely well-formed than an HTML document. But, my argument has probably sounded like I think XHTML is good because I expect XHTML documents to be more likely well-formed and easier to parse consistently than HTML documents.

This is an essentially flawed argument (at least from the SGML perspective), and I am guilty of some wishful thinking about this.

My real interest in parsing X/HTML is in being able to derive and consistently make use of some "semantic" structure in any particular document. For example, looking at this page as X/HTML, is there a consistent way to extract just this blog post rather than the page navigation.

The document definitions of HTML and XHTML, as hierarchic structures of "semantic" meaning are almost totally equivalent—in particular, neither says much about the meaning of the document. So, I don't think valid HTML or valid XHTML gives more refined meaning than simply well-formed HTML (though a valid document doesn't contain certain "surprises" that might get in the way of parsing).

But nevertheless, XHTML is necessary to whatever degree it is useful to know that an HTML document is also an XML document. A well-formed HTML 4 document could be well-formed XML or it could not be, but a well-formed XHTML document is always a well-formed XML document. (Right?)

So, at this point, I am not using any SGML parsers but am using XML parsers. Pretty much, I think XHTML is good for fitting into XML processing scenarios.

What I would argue is that it seems good for XHTML to exist along side of HTML. I think it is good to be able to have a common XML document type (XHTML 1) that matches (at least a subset of) HTML 4 and can be used in HTML browsers.

I think it is good if there are HTML documents that can be directly parsed (because they are XHTML) as XML. Maybe I am mistaken to assume this, but I think XHTML 1 is good as essentially a convenient XML subset of HTML 4.

With XHTML 2.0, I am not excited about XHTML developing as a replacement for HTML; or about XHTML totally deprecating major features of HTML. XHTML ceases to be a subset of HTML with this, and I don't see that as a good course of development for X/HTML.

permalink | comments {0} · trackbacks {0}

also available as: rss · rss2 · rdf · atom

Comments and Tracbacks

Note: All comments and trackbacks are moderated. Spam is deleted. Other comments are approved as promptly as possible.

Note: Older posts no longer accept new comments or trackbacks.

« prev post
Radha's three latest posts from Rwanda

» next post
Semantic web 2003: not unlike making music on a TRS-80 in the 1970's

blog newsfeeds

brief content:

 XML  ·  RSS  ·  RDF  ·  Atom 

full content:

 XML  ·  RSS  ·  RDF  ·  Atom 

blog archive

jan · feb · mar · apr
may · jun · jul · aug 
sep · oct · nov · dec
jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

may · jun · jul · aug
sep · oct · nov · dec

first post: 
April 30, 2003

highlight views:
Spammers' Choice

Jay elsewhere online
Jay Fienberg - the official home page

Wrong Notes - the music blog of the Ear Reverends

Fine & Full, aka, a fine and full burger

Sociomobilepoetextologia (moblog, currently inactive due to lack of proper mobile)

to enjoy roll
sites I like to read when I start from here

· Anastasia Fuller
· Andy Baio
· Biz Stone
· Boris Mann
· Bre Pettis
· Chris Dent
· Danny Ayers
· Dare Obasanjo
· David Czarnecki
· David Weinberger
· Don Park
· Evan Williams
· Greg Narain
· Jason Kottke
· Jim Benson
· Lucas Gonze
· Marc Canter
· Matt May
· Matt Mullenweg
· Michal Migurski
· Nancy White
· Rebecca Blood
· Reg Cheramy
· Richard MacManus
· Sam Ruby
· Shelley Powers
· Tim Bray
· danah boyd

powered by blojsom

Entries by blojsim