a permalink

news and thoughts on and around the development of the iCite net
by Jay Fienberg

Search engines must change in web 2.0

posted: Apr 21, 2005 10:13:08 PM

(Sorry if you don't like the "web 2.0" label—I needed that kind of shorthand label for my title, but it's not too necessary for the rest of this post.)

One feature of the iCite net is that the same contents can live in multiple places but share a single identity. This means that, for example, the same blog post can live at many URLs on the web (on many websites), but, in the iCite net, also be clearly identified as the exact same post.

While that might sound far out (in the sense that this feature of the iCite net is not available yet), it has long been common for resources on the web to be mirrored or cited in whole in multiple places. And, with RSS/Atom and other data-oriented forms of publishing, there is a more and more viable mechanism and permissiveness for copying the contents of one site to many others on the web.

One thing weighing heavily on many folks' minds is how the duplication of one's site contents can be surreptitiously used by another to garner search engine traffic and/or sell advertising. There are a number of issues there in terms of copyright, commercial vs non-commercial usage, personal vs public usage, and just general rip-off and manipulation. Many of these are discussed in the comments to Dave Winer's post, How to allow subscription but not syndication?, and in the thread NoIndex, again initiated by Nikolas 'Atrus' Coukouma on the Atom syntax mailing list.

But, as a general context, one underlying problem is in the way current search engines work with site contents copied across multiple URLs, namely that the engines don't (and/or can't) properly account for the contents' origin.

Search engines like Google tend to be treated as if they are part of the infrastructure of the web (because none of us wants a web without them—so fair enough, in some sense). But, these engines aren't in the commons infrastructure of the web—they are commercial enterprises trying to capitalize on their strengths and being self-protective against web-practices that highlight their weaknesses.

So, with regards to duplicating site contents, on one side, search engines look to penalize sites that duplicate. And, on the other side, publishers look to enforce license restrictions that will keep duplicates out of the search engines.

But, I think we'd have a different dynamic if the search engines would themselves clearly indicate attribution and list origin sites in a primary position while including duplicate sites only in a secondary position. In other words, we could imagine a search engine that capitalizes on the value in duplication (e.g., part of why this page is authoritative is that its contents are copied by many others) rather than tries to eliminate it.

As I've written about many times in this blog, the ongoing development of web service interfaces (APIs) and web data formats encourages more and more websites that remix and repurpose the contents of other sites into new uses. And, I think we should look to search engines to more seriously adapt to that rather than look to cripple what we're doing to offset current search engine weaknesses with regards to this evolution.

permalink | comments {1} · trackbacks {0}

also available as: rss · rss2 · rdf · atom

Comments and Tracbacks

Comment by: Chris Dent · http://www.burningchrome.com/~cdent/mt/
posted: Apr 21, 2005 10:33:26 PM

"same contents can live in multiple places but share a single identity"

Hurrah! I wish I could think of something clever to say here other than that, but there it is. Persistent, location independent identifiers are going to make lots of good things possible.

Note: All comments and trackbacks are moderated. Spam is deleted. Other comments are approved as promptly as possible.

Note: Older posts no longer accept new comments or trackbacks.

« prev post
CodeZoo and PHP / Java bridge

» next post
Tersely: untagging tags without tiring

the iCite net

     news / blog
     software
     related
     about

blog newsfeeds

brief content:

XML · RSS · RDF · Atom

full content:

XML · RSS · RDF · Atom

blog archive

2006:
jan · feb · mar · apr
may · jun · jul · aug 
sep · oct · nov · dec
		
2005:
jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

2004:
jan · feb · mar · apr
may · jun · jul · aug
sep · oct · nov · dec

2003:
may · jun · jul · aug
sep · oct · nov · dec

first post: 
April 30, 2003

highlight views:
Spammers' Choice

Jay elsewhere online
Jay Fienberg - the official home page

Wrong Notes - the music blog of the Ear Reverends

Fine & Full, aka, a fine and full burger

Sociomobilepoetextologia (moblog, currently inactive due to lack of proper mobile)

to enjoy roll
sites I like to read when I start from here

· Anastasia Fuller
· Andy Baio
· Biz Stone
· Boris Mann
· Bre Pettis
· Chris Dent
· Danny Ayers
· Dare Obasanjo
· David Czarnecki
· David Weinberger
· Don Park
· Evan Williams
· Greg Narain
· Jason Kottke
· Jim Benson
· Lucas Gonze
· Marc Canter
· Matt May
· Matt Mullenweg
· Michal Migurski
· Nancy White
· Rebecca Blood
· Reg Cheramy
· Richard MacManus
· Sam Ruby
· Shelley Powers
· Tim Bray
· danah boyd