the iCite net > news / blog > a permalink

news and thoughts on and around the development of the iCite net
by Jay Fienberg

Good grief over escapes in XML

posted: Aug 25, 2003 12:35:35 AM

The possibility of working with escaped markup in XML, along the lines of RSS, is something that I am dealing with right now in my work on the iCite net. So, I have been interested in the opinions about this on the Atom/N/Echo/Pie wiki, and just read Norman Walsh's recent article on XML.com titled Embedded Markup Considered Harmful.

Mark Pilgrim mentioned Norman's article in his Everything considered harmful post, and also links to the always far out Ted Nelson's 1997 article on XML.com also titled Embedded Markup Considered Harmful.

There is also a good discussion of Norman's article on Sam Ruby's blog in a post titled Escaped Markup Considered Harmful. In particular, the back-and-forth between especially Tim Bray and Dare Obasanjo on character encoding issues in escaped content.

By now, at least, I hope you are glad that I didn't also name this post "Escaped Markup Considered Harmful".

As Mark points out, Ted Nelson is talking about all markup embedded in text as harmful, whereas everyone else is discussing escaped markup within a marked-up document.

It was interesting for me to read Ted Nelson's ideas because I had, a long time ago (before I had learned about SGML, before the web, etc., but after reading his "Literary Machines") developed a system that attempted to do what he describes as "parallel markup".

As I am writing this document, I am marking it up with basic XHTML markup. And, I do this so often that it has become second nature to me. And, in fact, I think the semantics of basic language presentation layout (e.g., putting in paragraphs, quotes, italics, etc.) is quite a natural part of typing text.

I mention this because I think Ted Nelson's logic around the separation of concerns of text and markup is totally insightful, but I also find a certain amount of embedded markup as natural as typing—both of which I find requires more notation than speaking.

But, anyway, with regards to the points about escaped markup within markup, I think there are two different concepts converging in the case of RSS-type feeds: a new document of content and a new document that references other content.

In one hand, an RSS-type feed is an XML document of content. Norman Walsh's point is basically that this document is XML and so shouldn't wrongly use escaping to stuff possibly invalid markup in the document.

But, the other case is that the RSS-type feed document is really a compilation of other documents. So, this blog post shows up in my RSS-type feeds, but that feed is really compiling from posts like this—and that compilation is what I would call a convenience to aid the bridge of remote/local referencing.

In the iCite net, this kind of compilation is thought of as a form of caching. In other words, the feed references documents elsewhere, and, as a convenience, it may include copies of those documents for more immediate consumption.

I mention all of this because I think there are two cases for processing: 1) you can convert the original document to well-formed Unicode XML and it becomes part of a single Unicode XML document, and 2) you can't because of markup issues, multiple character sets, etc.

In that second case, you can nevertheless reference all the documents and, in the iCite net (or so I am trying!), deliver all those separate documents as a single package of data but not as a single XML document.

permalink | comments {0} · trackbacks {0}

also available as: rss · rss2 · rdf · atom

Comments and Tracbacks

Note: All comments and trackbacks are moderated. Spam is deleted. Other comments are approved as promptly as possible.

Note: Older posts no longer accept new comments or trackbacks.

« prev post
Article on RDF and Atom/N/Echo/Pie

» next post
Monday = busy day for Jay

blog newsfeeds

brief content:

 XML  ·  RSS  ·  RDF  ·  Atom 


full content:

 XML  ·  RSS  ·  RDF  ·  Atom 


blog archive

2006:
jan · feb · mar · apr
may · jun · jul · aug 
sep · oct · nov · dec
		
2005:
jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

2004:
jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

2003:
may · jun · jul · aug
sep · oct · nov · dec

first post: 
April 30, 2003

highlight views:
Spammers' Choice
		

Jay elsewhere online
Jay Fienberg - the official home page

Wrong Notes - the music blog of the Ear Reverends

Fine & Full, aka, a fine and full burger

Sociomobilepoetextologia (moblog, currently inactive due to lack of proper mobile)

to enjoy roll
sites I like to read when I start from here

· Anastasia Fuller
· Andy Baio
· Biz Stone
· Boris Mann
· Bre Pettis
· Chris Dent
· Danny Ayers
· Dare Obasanjo
· David Czarnecki
· David Weinberger
· Don Park
· Evan Williams
· Greg Narain
· Jason Kottke
· Jim Benson
· Lucas Gonze
· Marc Canter
· Matt May
· Matt Mullenweg
· Michal Migurski
· Nancy White
· Rebecca Blood
· Reg Cheramy
· Richard MacManus
· Sam Ruby
· Shelley Powers
· Tim Bray
· danah boyd


powered by blojsom


Entries by blojsim