Ken Norton nailed something critical in The web is full of tags
[T]agging isn’t new; the web is full of tags. But they’re not in meta keywords, they’re in the links. The text of the links pointing to other web pages are simply the web publisher’s best effort to describe the page she’s linking to. And it turns out those links are some of the most valuable metadata we have to work with in search. And you know what? They’re subject to all of the flaws people say will doom tagging. Spammers lie. The spelling is atrocious. And there’s ambiguity everywhere. But given a huge population of links, you can begin to make sense of the madness. Why? Well, there are humans on both ends of the search rope. There’s a person searching, and there’s a person who’s written some content. The job of the search engine is to simply connect the two. Traditional software engineers, in their endless pursuit of the elimination of ambiguity, sometimes forget this. Search engineers embraced it.
There are humans at both ends of the rope. It seems so simple, but technologies that can rely on this fact have a huge advantage, since the human brain is terrific at signal extraction in environments that consistenly defeat machine strategies. This is one of the reasons that semweb-flavored approaches to metadata attempt to express data in an unambiguous format — if there is a machine at the other end of the rope, even slight ambiguities defeat the recipient’s interpretive capabilities. As the man said, time flies like an arrow, but fruit flies like a banana.
Once you have humans at both ends of the rope, though, even purely contextual tags that are unextractable from the tagged content itself, tags like cool and toread, become valuable. This is why attempts to improve tagging by making it less ambiguous are missing the point — the ambiguity allows for a huge reduction of both markup cost and conceptual brittleness, by involving human brains as the final endpoints.
I agree on you that links are way better tags than “tags” as we know them.
Stephen Downes has been writing about the handicaps of tagging in the blogosphere as well, and has come up with an alternative: edu_rss
I wrote a post about what he has to say on metadata.
The Self Organizing network
Downes’ idea of an optimal, meaningful network is one that organises itself, by the use of …
Third party metadata:
* metatdata about a resource not created by the author of the resource …
* includes: links, references, ratings, annotation, context of use…
* attached both to the resource (and hence the resource author) but also to the commentator
* Creates a multi-dimensional semantic social network
Comment by tuur — May 15, 2005 @ 9:26 am
There is a cost as well: it requires those brains to do work (ie thinking), which humans try to avoid whenever necessary. This is especially a cost for people who are new to a content organization system or use it too rarely to learn its intricacies. How is a newbie to know that some information on “San Francisco” will be at sanfrancisco and some at san_francisco and some at san+francisco and some at “bayarea” and some at “bay_area”? This could potentially be overcome with UI, but current tagging systems certainly doesn’t provide any sorts of hints right now. Those are the sorts of hints and improved precision that are paid for with the extra cost of markup into a controlled system.
Comment by Joel — May 16, 2005 @ 10:36 am
In my experience, creating a single meaning (removing ambiguity) is useful in contexts where work needs to be done, like on a corporate intranet. This means a controlled vocabulary with only one San Francisco. This isn’t very interesting, but there’s a lot of money in it.
Allowing ambiguity is useful in contexts where learning needs to be done, like on a site like this or Del.icio.us. This means an uncontrolled vocabulary and San Francisco, san_francisco, and sanFran. This is much more interesting, but not as much money in it.
Like Joel says, learning (thinking) is an odd duck because it’s an investment that some people clearly do not want to make. Others seem to revel in it.
Comment by Joshua Porter — May 16, 2005 @ 11:10 am
Joel: some tagging systems DO help you overcome the “San Francisco” vs “sanfrancisco” problem.
Take a look at this screenshot to see how Simpy does it.
People still have the choice. You can use one of the offered tags, your own tag, you can use both of them at the same twice. You can tag newyork.craigslist.com with “San Francisco” or “Frisco” if you like, Simpy doesn’t care - tag it whichever way it helps you and those close to you.
Comment by Otis — May 17, 2005 @ 11:07 am
[…] of reasons. First, it recognizes the value of some semantics in the system (that is, the humans-at-both-ends-of-the-rope approach would be insufficient on its […]
Pingback by You’re It! » Blog Archive » Tom Coates on bubble-up folksonomies — September 2, 2005 @ 1:24 pm