November 29, 2005

David W on ‘Everything is Miscellaneous’

Especially for those who, like me, missed David W’s keynote today at the London Online conference, here are some notes I made during his excellent talk at Nature yesterday. It was on the subject of his work in progress, Everything is Miscellaneous, which is about information organisation in the age of the web and focuses quite heavily on tagging.

At absolutely no extra charge, there are also notes from a talk by our other distinguished guest yesterday: Jimmy Wales of Wikipedia.

October 19, 2005

Peter Morville: the Tagsonomy interview

Last week I got the chance to talk to Peter Morville about his recent article Authority, his excellent new book Ambient Findability, and the future when everything will be taggable.

As usual Peter has some provocative ideas. I’ve asked him to watch the comments here, so feel free to post your comments or ask questions.

Gene: How is authority related to findability?

Peter: My authority article stirred up a fascinating discussion on Web4Lib centered around this question. Historically, librarians have been comfortable with the notion that the most frequently cited academic papers (and their authors) are also the most popular, findable, and authoritative. But many are horrified by the migration of this concept to the public web. In truth, the comparison is not totally fair. Scholars invest more thought and structure into their citations than we invest in our links. But the revolution in authority is real. In a world where we can select our sources and choose our news, we must increasingly make our own decisions about what to believe and who to trust. And thanks to the well-documented anchoring bias, we’re highly influenced by the first information we find. In this sense, Google’s algorithms are as much about authority as relevance. And this is why the subtitle of Ambient Findability is “What We Find Changes Who We Become.”

I know many people who don’t get tagging. Do you think tagging is a novelty? Or can you see some persistent value in it that will keep tags around?

I hate tagging. It’s too much work. It’s so much easier to drag and drop an email message into a folder than it is to construct keywords that define its aboutness. And with respect to refindability, using Google Desktop’s full text search is infinitely better than relying on the semantic poverty of tags. On the other hand, as one element of Google’s multi-algorithmic search solution, tags in the form of links are a wonderful source of collective intelligence. Also, as ubiquitous computing yields an Internet of (non-textual) objects, user-defined tags will be important alongside the manufacturer-supplied metadata.

If “it’s so much easier to drag and drop an email message into a folder than it is to construct keywords that define its aboutness” then what do you do when an email fits equally well in multiple folders?

I’m an impatient information architect. I spend no more time organizing than absolutely necessary. When faced with this taxonomic dilemma, I used to agonize for a few seconds before deciding which folder to use. Once every few thousand messages, I would decide the message was important enough to cross-list in multiple folders, so I’d make a copy. But now, I never cross-list, and I generally agonize for less than a split second, because I know it doesn’t really matter which folder I use, since I have Google Desktop. I can’t wait until all my books, clothes, keys, remote controls, and other physical possessions have RFID tags, so I can search for them too.

With respect to personal information architecture, less is more.

Can you elaborate a bit on how tags–user-added descriptive metadata–will help us navigate the internet of objects?

A big difference between barcodes and RFID tags is that barcodes identify a product category whereas RFID tags also identify the product as a unique object. So, if you consider the way that Amazon starts with manufacturer-supplied metadata, but then allows users to enhance that base explicity (by adding customer reviews) and through their behavior (navigation and purchasing), and you extend that model to a world where we can tag the product as a discrete object with a history and a location and an owner, you begin to get an idea of where things are headed. Imagine an eBay where you can learn about the history of every object that’s for sale.

Users will be able to tag objects, but that’s only a tiny piece of the puzzle. We haven’t even begun to talk about the embedded sensors that will add eyes and ears (and senses we humans lack) to our objects. Things are about to get very weird.

In your article you downplay the significance of tagging on del.icio.us and Flickr. But in del.icio.us we find a kind of authority as well–the URL that has been tagged hundreds or thousands of times. (Jesse James Garrett’s Ajax article is a good example of this.)

Isn’t this tag-driven authority the same as the collective intelligence we find at work in Wikipedia? Or put another way, what’s the difference between the del.icio.us hive mind and the Wikipedia hive mind that makes the latter authoritative and the former not?

In some ways, an article that’s been frequently tagged in del.icio.us possesses more authority than a Wikipedia article that may have been written or re-written by a single ignorant user. I love the Wikipedia, but outside the most popular, highly edited articles, you’ve got to seriously question (and cross-reference) your sources. In other words, there’s not a whole lot of collective intelligence in the long tail of the Wikipedia. For me, what’s important, is that as a society we’re beginning to examine what we mean by authority, and we’re finding it’s a very slippery concept. As I argue in Ambient Findability, “Like relevance, authority is subjective and ascribed by the viewer.” Not many people agree with me yet. But they will.

What do you think of Yahoo’s My Web 2.0 that integrates search, tags and social networks?

To be honest, until you asked the question, I’d never heard of it. After giving it a very quick look, I’d say it’s too complicated for anyone outside the geekorati. For now, I’ll stick with Google.

You say the future is multi-algorithmic. Google’s search relevance algorithms have often been called objective (sometimes even by the company itself). What’s your take on this — can algorithms be objective?

Assuming that by objective we mean “free from bias,” algorithms for data retrieval (where there are right and wrong answers) and information retrieval (where answers are more or less relevant to a particular user) can be objective. It’s hard (but perhaps not impossible) to argue that traditional full text retrieval algorithms possess bias.

However, Google’s multi-algorithmic solution is not even close to objective.

Google’s algorithms are optimized to produce the greatest advertising revenue to Google Inc. in the short-term and the greatest shareholder value to GOOG in the long-term. To be fair, Google has exercised great restraint. They understand that for long-term success, Google must provide the most useful results and the best user experience, so they have maintained a clean interface, and they haven’t yet tilted their algorithms too far towards commerce.

But we can already see a subtle bias towards the types of web sites most likely to host Google’s sponsored links. This partially explains why Google searches on “melanoma” and “breast cancer” don’t present the authoritative content from the National Cancer Institute in the first few hits. Government web sites are not great clients for advertising, so Google doesn’t like .gov.

This is why I like Yahoo! Mindset. It uncovers the hidden bias and puts the user in charge of the algorithms. Algorithmic openness is a great strategy for Yahoo! I’m not sure Google can maintain its algorithmic secrecy indefinitely without consequence. I’m in favor of more transparent, user-configurable algorithms.

We’ve hardly mastered user experience on the web and now we’re facing a future where the detritus of our lives will be “tagged” with RFID chips. How do we design for ambient findability?

The user experience will begin with a keyword search, but we’ll have all sorts of new facets and filters for refining our query. I may want to Google my possessions or the contents of my house or your bookshelf. Last week I went to the shopping mall for the first (and last) time this year. It was a horrible experience. I had to physically drag my body from store to store in search of a specific product. I desperately wanted to Google the Mall. And thanks to RFID, it may not be too long before that’s possible. Of course, Endeca may be a better choice than Google, since their Guided Navigation approach fits perfectly with an increasingly faceted future.

Of course, it’s tough to predict how this will all pan out. A few futurists including Adam Greenfield, Mike Kuniavsky, Bruce Sterling, and myself have written about the user experience in a world of ambient findability, but we’re only scratching the surface, and we’re all probably suffering from apophenia anyway.

Tim O’Reilly was recently quoted as saying that collective intelligence is the innovation that will most alter how we live in the next few years. What do you think?

Collective intelligence existed before humans could speak (or tag). The waggle dance of honeybees is all about the wisdom of the crowd (and finding the best food). So is gossip. So is the stock market. But I agree with Tim that the Web (or Web 2.0 if you prefer) is spreading the hive mind into far more nooks and crannies than we could have imagined only a few years ago. So get ready to be found. Collective intelligence is coming to a niche near you.

Thanks, Peter!

October 16, 2005

IA Summit call for papers

The Information Architecture Summit call for papers was released a few weeks ago. The deadline is October 31 for session proposals, and December 5 for posters. In addition to the regular presentations and panels there will be a research track this year.

If you’re doing interesting work with tagging and folksonomies, consider submitting something. The summit is March 23 to 27, 2006 in Vancouver, Canada.

September 27, 2005

Rashmi Sinha on the cognitive process behind tagging

Rashmi Sinha has posted an interesting hypothesis on the cognitive psychology behind tagging (with easy illustrations for those of us who don’t remember psych class): A cognitive analysis of tagging (or how the lower cognitive cost of tagging makes it popular).

September 16, 2005

Ian Davis on Why Tagging Is Expensive

Last week Ian Davis wrote an interesting post on Why Tagging is Expensive:

On the surface tagging seems to offer a new paradigm of organising information, one that reduces the cost of entry and so enables a long tail of participation to emerge. I’ve come to realise that the cost isn’t removed, instead it’s displaced and possibly increased. Tagging bulldozes the cost of classification and piles it onto the price of discovery.

There’s a saying I’ve heard once or twice (I wish I could attribute it): “The cost of metadata is in its application, but the value of metadata is in its use.”

Not exactly something you’ll be quoting at dinner parties, but it nicely captures the cost/benefit gaps of metadata.

The arguments against professional classification (including Clay’s views on tagging) have almost always worked on the cost side of the equation. Automated indexing, search and now tagging are seen as ways drive down classification costs. But as Davis explains, classification costs are only one part of the system:

In my view the total cost of an information retrieval system is the cost of classification plus the cost of discovery. In the formal classification world you have a very small number of people incurring a high cost in order to reduce the costs incurred by a very large number of people. In contrast the tagging world has the unit costs reversed: it’s cheap to classify, expensive to find. But the numbers of people involved are large in both cases so you end up with a lot of people paying a tiny cost to classify added to a lot of people paying a high price to discover. I think it’s pretty likely that the total cost is going to end up much higher than in the classification scenario.

Here’s an analogy. I visit a lot of thrift stores. The true cost of an item in a thrift store is a function of the time it takes me to find it, not the price (which is always cheap). A very large thrift store is more likely to have what I want, but at a greater discovery cost. Like del.icio.us, a thrift store is great for serendipitous discovery but not so good for known item retrieval. Put another way, del.icio.us wouldn’t be your first choice if you needed articles on Rousseau and the French Revolution, just like the Sally Ann wouldn’t be your first choice if you needed a smoking jacket, size 42T.

Where I think Davis might be wrong is suggesting that the discovery costs are shifted back to the user. In fact, the costs are shifted to search, blogs and other more efficient discovery tools. In large part this is because the domain of tagging systems has been the “big messy” web.

In that case, the “classicial” economics of information retrieval don’t apply because there are often multiple ways of finding things. Or because Google can radically lower your discovery costs by selling keyword advertising to offset their infrastructure. Or because algorithms can do much of the heavy lifting. Or because users’ expectations are for “just good enough” results. Or because users are not interesting in finding so much as tracking. And so on.

But I’d argue that once the domain is constrained–by subject, by context, by user population, by privacy/security, by business goals, or by those things in combination–the economic prinicples of classification and retrieval come back into play. Because other discovery tools are either not available or not optimal, poorly designed retrieval systems do shift the burden back to the user. (Karl Fast’s thoughts on problems in the middle are worth a read here).

In that middle ground–and the “big messy” web contains probably millions of cases where local structure is valuable, not to mention information systems that aren’t part of the “big messy” web–I think there’s a large area where a mixture of emergent, algorithmic, formal and now social classification systems will make for optimal retreival.

September 2, 2005

Tom Coates on bubble-up folksonomies

Tom Coates has a great post on bubble-up folksonomies–using tags to augment conceptual hierarchies. The example he gives involves tagging songs (in his Phonetags project) and using those tags to understand broader categories like album and artist.

…because you have a semantic understanding of the relationship between concepts like a ’song’, an ‘album’ and an ‘artist’ you can allow people to drill-down or move up through various hierarchies of data and track the changes in an artist’s style over time. For me, this is a pretty compelling argument that understanding semantic relationships between concepts makes folksonomic tagging even more exciting, rather than less so, and may indicate a changing role for librarians towards owning formal conceptual relationships rather than descriptive, evocative metadata. But that’s a post for another time. (My emphasis)

I like this idea for a couple of reasons. First, it recognizes the value of some semantics in the system (that is, the humans-at-both-ends-of-the-rope approach would be insufficient on its own). Second, it solves a real problem–helping people find good music that they’ll like.

And In the comments Kevin Marks says that Technorati is using the bubble-up approach to recommend blogs by tag (rather than just posts).

July 31, 2005

Tag Team interview

Adam Weinroth asked Clay and I five questions about tagging in an email interview posted here.

As for uniform tagging, that can only work in situations where there is enough force to expend making the users behave uniformly, and where it is worth expending that force. For example, the Diagnostic and Statistical Manual, (DSM-IV) used by American psychiatrists and psychologists, provides a relatively standard way to diagnose mental disorders. It only works as well as it does, however, and that not perfectly, because it is produced by the American Psychiatric Association, which body can exert considerable force over its members, and DSM-IV is only to be used by the members.

The amount of human cost, in other words, in creating an enforcing uniformity is so high that attempts at such uniformity will fail in most cases. Fortunately, tagging allows for degenerate cases such as alternate spellings and phrasing. This would be a problem if there were only one tagger, responsible for a large group of users, but with every user a tagger, the loss stemming from degenerate cases quickly shrinks, while the value from multiple points of view grows.

July 22, 2005

Tagging and Participative Journalism

Jon Udell has an interesting piece on (among other things) the use of del.icio.us tagging by InfoWorld editors as a way for them to work with each other and also interact with their readers.

We’re finding similar things at Nature. First, our social bookmarking service for scientists, Connotea, is proving useful as a collaborative tool for our journalists and editors. For example, editorial teams can use tagged links to communicate ideas and leads among themselves. Also, journalists researching particular stories can use the system to store and retrieve informative links under suitable tag names — and can choose to keep those links private, at least temporarily, if they’re worried about being scooped.

Second, Connotea enables greater interaction with readers. For example, collections of links gathered by a writer during their research can be released on publication of their article in order to provide readers with further sources of information. A recent example of this was Declan Butler’s Nature article on the new generation of laboratory information systems, which pointed interested readers to his accompanying collection of links.

As Jon Udell points out, such collections are future-proof because they can grow even after the URL has been distributed. This means that sometimes, as with Declan’s own collection of avian flu links, they can become important community resources that continue to be tracked by significant numbers of interested readers, potentially even long after the original article has become obsolete. Of course, readers can themselves contribute simply by using the same tag names. For example, the Connotea collections on bioinformatics and open access have attracted groups of users that turn these pages into something like pared-down group blogs.

With participative (or grassroots or citizen) journalism becoming an increasingly important theme inside media organisations and beyond, it’s intriguing to see that tagging also seems to have a role to play in facilitating exchanges between writers and their readers, and in blurring the boundaries between those traditionally distinct roles.

June 13, 2005

The Death of Hierarchy?

John Hiler has an interesting post on Microcontentnews called Google’s War on Hierarchy, and the Death of Hierarchical Folders. He talks about how that computing standby the folder is being replaced by search and tags:

Hierarchical Folders have helped us manage information for decades. They’ve proven themselves as some of the most flexible tools ever created: organizing wildly different industries, from Web Directories, to Email and Desktop File Systems.

But Folders rarely solve the core problem that they address – and often create new ones, like forcing you to create new folders just to manage new information. Solutions like Search, Archives, Stars and Labels get more directly at the core problem… and promise that the future of information management will look very different from its past.

Dan Brown posted a thoughtful follow-up that digs into the distinctions between hierarchy and structure:

Hiler is right to point out that folder-based navigation is going away, but I think it’s dangerous to extend the demise of the folder (a bad metaphor) to the demise of hierarchy and formal structure. There is still a place for formal structure in interface design, even if it doesn’t look or behave like our old friend the folder.

It’s also dangerous to compare “hierarchy” with “search.” Hierarchy is, most typically, a part-whole organization of things. Search, on the other hand, is a behavior where users specify some criteria and the computer does the work of locating objects that share something in common with them. These two notions are hardly mutually exclusive. Perhaps Hiler meant to compare search with browse, a behavior where users select from menus of options to arrive at the desired thing.

In Hiler’s three “search” case studies, there is evidence of formal structure, though it’s under the surface. With Gmail, for example, there’s still the notion of a thread which contains messages. There is an inbox and an archive, which contain threads. There are relationships between original messages and replies. These are abstract hierarchies that are inherent to the information architecture, not layered on top like a folder structure. They may seem self-evident, but constructing these hierarchies requires a careful, user-aware design process.

Indeed, structure is useful. And instead of one structural option–the folder–we now have derived structure (like search engine indexes and the derived polyhierarchies in iTunes) and user-applied structure (tags, labels, links, playlists). This is not the death of hierarchy; it’s the augmentation of hierarchy.

June 5, 2005

Tagging Saves Categorization From Itself

One axis of human sense-making runs from applied to derived; some things we understand by applying explanations to them, and others we understand by deriving explanations from them.

The component parts of water and the component parts of breakfast can both be described. Breakfast, however, exists only by definition — breakfast is the morning meal because we say it is, and even then, there are lots of caveats. A restaurant can advertise “Breakfast All Day!”; the same food at the same time can be breakfast or brunch; and so on.

Water is easier to explain than breakfast because water exists independently of any community — its existence can be derived by independent observers. Breakfast exists only because of and among people who say it does — it is an applied category, and therefore contingent on various ways of generating and enforcing shared understanding.

Because applied categories are social facts, they take more energy to define, and those definitions are less encompassing, less coherent, and less robust than for derived categories. And of course this is not a bifurcation but a spectrum. Imagine the difficulty of explaining each of three categories: “the New York Times,” “the media,” and “the liberal media”; or “the Republican Party,” “the conservative movement,” and “the vast right-wing conspiracy.” Each of those three categories is increasingly difficult to explain, because each is more of an applied category than the last, which is to say each requires more shared assumptions between sender and receiver.

Systems of classification frequently mix derived and applied categories. The more ambitious a system, in fact, the likelier it is to do so. If a significant number of the categories we use in our daily lives are applied, then large-scale classification systems will weaken with any of several changes, including especially an increase in the number or mix of users, an increase in the number of things to be classified, a decrease in coordination among the people doing the classifying, and the passage of time.

One strategy for improving classification schemes, or at least making them resistant to the inherent weakness of applied categories, is to reduce, possibly to zero, the number of universal assumptions or constraints within the system. This will minimize the energetic requirements of communicating and enforcing applied categories, while allowing the categories themselves to flow where and among whom they are considered useful or valid.

And this, of course, is exactly what tagging does. Much of the shared value, with little of the required force.