news and thoughts on and around the development
of the iCite net
by Jay Fienberg
posted: Feb 3, 2004 2:32:40 PM
A lot of RSS is full of weeds. What do you do?
Danny Ayers' A matter of form post critiques some of Mark Pilgrim's comments in Mark's post about his Universal Feed Parser 3.0 beta release. I added some comments to Danny's post to suggest how it seemed like Danny and Mark were using the word "data" in slightly different ways, which I believe, might account for some of their apparent difference of opinion.
At the end of my comments, I suggested an analogy of ecology and sustainable farming. Later, I was thinking that parsing RSS coming from a variety of sources brings up the issue of dealing with what are the equivalent of weeds in the data.
The purist school is saying something like "don't pick your crops until you are sure there are no weeds: weeds are poisonous and they'll take over, so don't let any get mixed in". (Stop parsing if the data is not well-formed.)
The pragmatist school is saying something like "you need to pick your crops now, and you can remove the weeds as you go: even if weeds are poisonous, you can get enough of them out that no one will get sick and they won't take over". (Keep parsing even if the data is not well-formed: use work-around extraction techniques to get the weeds out.)
My basic sense is that both approaches have their place, and both have negative consequences if followed too strictly too often.
permalink | comments {0} · trackbacks {0}
also available as: rss · rss2 · rdf · atom
Note: All comments and trackbacks are moderated. Spam is deleted. Other comments are approved as promptly as possible.
Note: Older posts no longer accept new comments or trackbacks.
« prev post
IA / UX goodies: books, reports, and jobs oh-my!
» next post
Personal CRM for your friends?
blog archive
2006: jan · feb · mar · apr may · jun · jul · aug sep · oct · nov · dec 2005: jan · feb · mar · apr may · jun · jul · aug sep · oct · nov · dec 2004: jan · feb · mar · apr may · jun · jul · aug sep · oct · nov · dec 2003: may · jun · jul · aug sep · oct · nov · dec first post: April 30, 2003 highlight views: Spammers' Choice
Jay elsewhere online
Jay Fienberg - the official home page
Wrong Notes - the music blog of the Ear Reverends
Fine & Full, aka, a fine and full burger
Sociomobilepoetextologia (moblog, currently inactive due to lack of proper mobile)
to enjoy roll
sites I like to read when I start from here
· Anastasia Fuller
· Andy Baio
· Biz Stone
· Boris Mann
· Bre Pettis
· Chris Dent
· Danny Ayers
· Dare Obasanjo
· David Czarnecki
· David Weinberger
· Don Park
· Evan Williams
· Greg Narain
· Jason Kottke
· Jim Benson
· Lucas Gonze
· Marc Canter
· Matt May
· Matt Mullenweg
· Michal Migurski
· Nancy White
· Rebecca Blood
· Reg Cheramy
· Richard MacManus
· Sam Ruby
· Shelley Powers
· Tim Bray
· danah boyd