the iCite net > news / blog > a permalink

news and thoughts on and around the development of the iCite net
by Jay Fienberg

Weeds in your RSS, to parse or not to parse

posted: Feb 3, 2004 2:32:40 PM

A lot of RSS is full of weeds. What do you do?

Danny Ayers' A matter of form post critiques some of Mark Pilgrim's comments in Mark's post about his Universal Feed Parser 3.0 beta release. I added some comments to Danny's post to suggest how it seemed like Danny and Mark were using the word "data" in slightly different ways, which I believe, might account for some of their apparent difference of opinion.

At the end of my comments, I suggested an analogy of ecology and sustainable farming. Later, I was thinking that parsing RSS coming from a variety of sources brings up the issue of dealing with what are the equivalent of weeds in the data.

The purist school is saying something like "don't pick your crops until you are sure there are no weeds: weeds are poisonous and they'll take over, so don't let any get mixed in". (Stop parsing if the data is not well-formed.)

The pragmatist school is saying something like "you need to pick your crops now, and you can remove the weeds as you go: even if weeds are poisonous, you can get enough of them out that no one will get sick and they won't take over". (Keep parsing even if the data is not well-formed: use work-around extraction techniques to get the weeds out.)

My basic sense is that both approaches have their place, and both have negative consequences if followed too strictly too often.

permalink | comments {0} · trackbacks {0}

also available as: rss · rss2 · rdf · atom

Comments and Tracbacks

Note: All comments and trackbacks are moderated. Spam is deleted. Other comments are approved as promptly as possible.

Note: Older posts no longer accept new comments or trackbacks.

« prev post
IA / UX goodies: books, reports, and jobs oh-my!

» next post
Personal CRM for your friends?

blog newsfeeds

brief content:

 XML  ·  RSS  ·  RDF  ·  Atom 

full content:

 XML  ·  RSS  ·  RDF  ·  Atom 

blog archive

jan · feb · mar · apr
may · jun · jul · aug 
sep · oct · nov · dec
jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

may · jun · jul · aug
sep · oct · nov · dec

first post: 
April 30, 2003

highlight views:
Spammers' Choice

Jay elsewhere online
Jay Fienberg - the official home page

Wrong Notes - the music blog of the Ear Reverends

Fine & Full, aka, a fine and full burger

Sociomobilepoetextologia (moblog, currently inactive due to lack of proper mobile)

to enjoy roll
sites I like to read when I start from here

· Anastasia Fuller
· Andy Baio
· Biz Stone
· Boris Mann
· Bre Pettis
· Chris Dent
· Danny Ayers
· Dare Obasanjo
· David Czarnecki
· David Weinberger
· Don Park
· Evan Williams
· Greg Narain
· Jason Kottke
· Jim Benson
· Lucas Gonze
· Marc Canter
· Matt May
· Matt Mullenweg
· Michal Migurski
· Nancy White
· Rebecca Blood
· Reg Cheramy
· Richard MacManus
· Sam Ruby
· Shelley Powers
· Tim Bray
· danah boyd

powered by blojsom

Entries by blojsim