January 17, 2006
Tim O’Reilly posts on the O’Reilly radar that Jakob Nielsen’s concern about search engines “strikes a chord.”
It’s easy to see why folks with paid content businesses would be concerned about giving away too much information via search engines, but it’s really interesting to see the same concerns springing up around free content sites. Google and Yahoo! have done a good job of providing ad revenue back to small content providers with AdSense and Overture, but their model is also a threat to many prevalent kinds of advertising. And of course, the search engines get a huge amount of revenue from advertising on the index pages themselves. I tend to think that the search engines earn their keep, but I’ve got my ear to the ground, and Jakob makes a thoughtful case.
Since Tim authored the Web 2.0 piece I linked from my previous post, I thought I should note that his take on the Nielsen piece was more supportive.
December 22, 2005
This was a big year for tags. You could even say that tags went mainstream in 2005 (if tags were a band, they’d be The Killers). So, given we’re at the end of 2005, I thought I would take a look back at the major announcements and events in the world of ad hoc, user-created metadata.
These are the events I thought were important (feel free to add your own in the comments):
Technorati introduces tags (January). Technorati’s tags was the first implementation of distributed tagging (i.e. T’rati doesn’t own the tags the way Flickr does–it picks them up from blog posts while it’s crawling). It’s been criticized, but it’s widely used.
Folksonomies at the IA Summit (March). Tagging was one of the hot topics at the Information Architecture Summit in Montreal, and we kicked off the discussion with a pretty good panel (full disclosure: I moderated the panel).
Ontology is Overrated (March). Clay Shirky’s provocative talk at Etech predicted that the rise of tagging meant the death of hierarchy. Or something like that. A bit too dogmatic for me, but it was received like a papal bull and produced some interesting critiques and counter-critiques. (To give credit where it’s due, this blog wouldn’t exist but for Clay’s presentation.)
Yahoo buys Flickr (March). The first big acquisition of what some people call the “Web 2.0 era” (I call it “Ned”).
Yahoo launches My Web 2.0 (June). Yahoo integrated search, social networks and tags in its My Web 2.0 product. Some people noted the product’s lacklustre growth, which perhaps set the stage for another socsoft/web 2.0 acquisition by Yahoo at the end year.
Flickr adds interestingness and clustering (August). This was a big one for me–it proved that with some algorithmic mojo tags could act more like categories, separating like things from unlike things, wholes from parts and so on. It can even tell dog noses from cat noses.
Hurricane Katrina (September). Tags not only helped keep people connected during the aftermath of Hurricane Katrina, they tethered together many disparate pieces of content so we could make sense of complex, evolving, intertwingled events. (Or try to make sense of them, anyway.)
Google adds tagging. Kind of. (October) As part of its search history feature, Google allows tagging of pages. The tags are private, and the feature seems peripheral. To paraphrase Elvis Mitchell, this one would have to work a lot harder to earn the appellation uninspired.
Amazon launches tags (November). Amazon now lets you tag books. I wasn’t all that impressed with their implementation–their product pages seem cluttered–but Amazon has always been smart about social IA and user-created content. It’ll be interesting to see how they leverage their tag data. (SIPs are cool, too.)
Everyone must have tags! (November) Anil Dash points out that tags–like Ajax and Ruby on Rails–are becoming web 2.0 cliches.
Google Base launches with tagging. Kind of. (November) Google uses the term labels, but we know them as tags. Search Engine Watch didn’t think much of Google Base’s tagging, but it’s just one of several kinds of classification used on that product. Key takeaway: Tagging is a means, not an end.
Tag formats: Can’t we all just get along? (December) Matt from 37 signals points out the multiple variations of tagging UIs. The big question: “Will all these different formats still be around a year from now or will a standard emerge?”
Yahoo buys del.icio.us (December). The rumoured price was around $30 million, which has Yahoo spending about $100 for each of the 300,000 or so del.icio.us users. The interesting question is how does tagging behaviour figure into the price? I think (and this is purely speculation) Yahoo paid for millions of tagged URLs plus a community of active taggers, both of which promise to boost the relevance of search results.
Folksonomy makes the NY Times Magazine “Year in Ideas” list (Decemember). Despite its humble origins and many doubters (I’m looking at you Morville), folksonomy is named one of 2005’s best ideas by the New York Times Magazine.
Back in April, Tim Bray asked some important questions:
Are tags useful? Are there any questions you want to ask, or jobs you want to do, where tags are part of the solution, and clearly work better than old-fashioned search? I really want to believe that tagging is big, a game-changer, but the longer I go on asking this question and not getting an answer, the more nervous I get.
2005 has proven that tags are both big (in the financial sense) and useful. Whether or not tagging is a game-changer will, I think, depend on what Yahoo, Amazon and Google do with tags in 2006. But with three big players in the tagging game there’s a lot of opporunity for innovation.
December 9, 2005
Subject says it all. Film at 11.
November 29, 2005
Especially for those who, like me, missed David W’s keynote today at the London Online conference, here are some notes I made during his excellent talk at Nature yesterday. It was on the subject of his work in progress, Everything is Miscellaneous, which is about information organisation in the age of the web and focuses quite heavily on tagging.
At absolutely no extra charge, there are also notes from a talk by our other distinguished guest yesterday: Jimmy Wales of Wikipedia.
November 16, 2005
Tagging inside the corporate firewall seems to me to be one of the great emerging hopes for this model of information management. Not only would it be cool if companies could use tags to help sort out their internal (and infernal) information haystacks, it might even be a money-spinner for software vendors. At Web 2.0, Josh Schachter of del.icio.us mentioned that he regularly gets requests from companies for ‘Intranet Delicious’, though he didn’t sound particularly interested in developing it as a business line. IBM appears to see things differently: David W reports that they have created ‘dogear’, a prototype corporate tagging system.
At Nature, we’ve had similar experiences to Josh’s with our own application, Connotea. Since our readers and other customers are largely academics, and since we’ve made Connotea Code available under the GPL, the main trend has been for academic institutions to download it and experiment with their own internal instances (though there’s been plenty of interest from pharma and biotech companies too). Most of this attention has been focused on the use of tagging to organise research information at these institutions, but this overlooks another key activity at many of them: education.
That’s is one reason why I was so interested in this email to the Connotea discussion list by Tony Hirst from the Open University in the UK, which has long been at the forefront of using technology in education, particularly in distance learning. Tony has been thinking about how to use social bookmarking and tagging as ways for students and teachers to share information during a course. He has in mind a controlled model in which there would be at least some level of moderation. In some ways this runs counter to the free and open nature of social bookmarking, at least on sites like del.icio.us. Tony is clearly conscious of this and gives the following reasons for taking a more conservative route, at least initially:
- social bookmarking could become a value-adding service, the reliable provision of which forms part of the institution’s contract with the student; as such, liveness and reliability need to be guaranteed; uniformity of provision (and a guaranteed high quality of provision) to students
- if social bookmarking is to be introduced as part of the learning strategy, some would argue that its use needs to be carefully managed to ensure that it is capable of delivering whatever the learning designer requires; (unfortunately, this approach reminds somewhat of unintended learning as a bad thing);
- integration with other cohort related batch processes (e.g. registering all new students, or setting particular group privileges for a particular cohort);
- Educational establishments need to be wary of what gets published on their domains. List of links to adult resources, for example, are unlikely to be tolerated.
He goes on to give an example of how social bookmarking could be used in practice during a real course.
He then signs off his email with:
This is all very ‘possible’ and ‘potential’ at the moment, but if my institution goes for this, they may go for it in a big way….
October 19, 2005
Last week I got the chance to talk to Peter Morville about his recent article Authority, his excellent new book Ambient Findability, and the future when everything will be taggable.
As usual Peter has some provocative ideas. I’ve asked him to watch the comments here, so feel free to post your comments or ask questions.
Gene: How is authority related to findability?
Peter: My authority article stirred up a fascinating discussion on Web4Lib centered around this question. Historically, librarians have been comfortable with the notion that the most frequently cited academic papers (and their authors) are also the most popular, findable, and authoritative. But many are horrified by the migration of this concept to the public web. In truth, the comparison is not totally fair. Scholars invest more thought and structure into their citations than we invest in our links. But the revolution in authority is real. In a world where we can select our sources and choose our news, we must increasingly make our own decisions about what to believe and who to trust. And thanks to the well-documented anchoring bias, we’re highly influenced by the first information we find. In this sense, Google’s algorithms are as much about authority as relevance. And this is why the subtitle of Ambient Findability is “What We Find Changes Who We Become.”
I know many people who don’t get tagging. Do you think tagging is a novelty? Or can you see some persistent value in it that will keep tags around?
I hate tagging. It’s too much work. It’s so much easier to drag and drop an email message into a folder than it is to construct keywords that define its aboutness. And with respect to refindability, using Google Desktop’s full text search is infinitely better than relying on the semantic poverty of tags. On the other hand, as one element of Google’s multi-algorithmic search solution, tags in the form of links are a wonderful source of collective intelligence. Also, as ubiquitous computing yields an Internet of (non-textual) objects, user-defined tags will be important alongside the manufacturer-supplied metadata.
If “it’s so much easier to drag and drop an email message into a folder than it is to construct keywords that define its aboutness” then what do you do when an email fits equally well in multiple folders?
I’m an impatient information architect. I spend no more time organizing than absolutely necessary. When faced with this taxonomic dilemma, I used to agonize for a few seconds before deciding which folder to use. Once every few thousand messages, I would decide the message was important enough to cross-list in multiple folders, so I’d make a copy. But now, I never cross-list, and I generally agonize for less than a split second, because I know it doesn’t really matter which folder I use, since I have Google Desktop. I can’t wait until all my books, clothes, keys, remote controls, and other physical possessions have RFID tags, so I can search for them too.
With respect to personal information architecture, less is more.
Can you elaborate a bit on how tags–user-added descriptive metadata–will help us navigate the internet of objects?
A big difference between barcodes and RFID tags is that barcodes identify a product category whereas RFID tags also identify the product as a unique object. So, if you consider the way that Amazon starts with manufacturer-supplied metadata, but then allows users to enhance that base explicity (by adding customer reviews) and through their behavior (navigation and purchasing), and you extend that model to a world where we can tag the product as a discrete object with a history and a location and an owner, you begin to get an idea of where things are headed. Imagine an eBay where you can learn about the history of every object that’s for sale.
Users will be able to tag objects, but that’s only a tiny piece of the puzzle. We haven’t even begun to talk about the embedded sensors that will add eyes and ears (and senses we humans lack) to our objects. Things are about to get very weird.
In your article you downplay the significance of tagging on del.icio.us and Flickr. But in del.icio.us we find a kind of authority as well–the URL that has been tagged hundreds or thousands of times. (Jesse James Garrett’s Ajax article is a good example of this.)
Isn’t this tag-driven authority the same as the collective intelligence we find at work in Wikipedia? Or put another way, what’s the difference between the del.icio.us hive mind and the Wikipedia hive mind that makes the latter authoritative and the former not?
In some ways, an article that’s been frequently tagged in del.icio.us possesses more authority than a Wikipedia article that may have been written or re-written by a single ignorant user. I love the Wikipedia, but outside the most popular, highly edited articles, you’ve got to seriously question (and cross-reference) your sources. In other words, there’s not a whole lot of collective intelligence in the long tail of the Wikipedia. For me, what’s important, is that as a society we’re beginning to examine what we mean by authority, and we’re finding it’s a very slippery concept. As I argue in Ambient Findability, “Like relevance, authority is subjective and ascribed by the viewer.” Not many people agree with me yet. But they will.
What do you think of Yahoo’s My Web 2.0 that integrates search, tags and social networks?
To be honest, until you asked the question, I’d never heard of it. After giving it a very quick look, I’d say it’s too complicated for anyone outside the geekorati. For now, I’ll stick with Google.
You say the future is multi-algorithmic. Google’s search relevance algorithms have often been called objective (sometimes even by the company itself). What’s your take on this — can algorithms be objective?
Assuming that by objective we mean “free from bias,” algorithms for data retrieval (where there are right and wrong answers) and information retrieval (where answers are more or less relevant to a particular user) can be objective. It’s hard (but perhaps not impossible) to argue that traditional full text retrieval algorithms possess bias.
However, Google’s multi-algorithmic solution is not even close to objective.
Google’s algorithms are optimized to produce the greatest advertising revenue to Google Inc. in the short-term and the greatest shareholder value to GOOG in the long-term. To be fair, Google has exercised great restraint. They understand that for long-term success, Google must provide the most useful results and the best user experience, so they have maintained a clean interface, and they haven’t yet tilted their algorithms too far towards commerce.
But we can already see a subtle bias towards the types of web sites most likely to host Google’s sponsored links. This partially explains why Google searches on “melanoma” and “breast cancer” don’t present the authoritative content from the National Cancer Institute in the first few hits. Government web sites are not great clients for advertising, so Google doesn’t like .gov.
This is why I like Yahoo! Mindset. It uncovers the hidden bias and puts the user in charge of the algorithms. Algorithmic openness is a great strategy for Yahoo! I’m not sure Google can maintain its algorithmic secrecy indefinitely without consequence. I’m in favor of more transparent, user-configurable algorithms.
We’ve hardly mastered user experience on the web and now we’re facing a future where the detritus of our lives will be “tagged” with RFID chips. How do we design for ambient findability?
The user experience will begin with a keyword search, but we’ll have all sorts of new facets and filters for refining our query. I may want to Google my possessions or the contents of my house or your bookshelf. Last week I went to the shopping mall for the first (and last) time this year. It was a horrible experience. I had to physically drag my body from store to store in search of a specific product. I desperately wanted to Google the Mall. And thanks to RFID, it may not be too long before that’s possible. Of course, Endeca may be a better choice than Google, since their Guided Navigation approach fits perfectly with an increasingly faceted future.
Of course, it’s tough to predict how this will all pan out. A few futurists including Adam Greenfield, Mike Kuniavsky, Bruce Sterling, and myself have written about the user experience in a world of ambient findability, but we’re only scratching the surface, and we’re all probably suffering from apophenia anyway.
Tim O’Reilly was recently quoted as saying that collective intelligence is the innovation that will most alter how we live in the next few years. What do you think?
Collective intelligence existed before humans could speak (or tag). The waggle dance of honeybees is all about the wisdom of the crowd (and finding the best food). So is gossip. So is the stock market. But I agree with Tim that the Web (or Web 2.0 if you prefer) is spreading the hive mind into far more nooks and crannies than we could have imagined only a few years ago. So get ready to be found. Collective intelligence is coming to a niche near you.
Thanks, Peter!
September 16, 2005
Last week Ian Davis wrote an interesting post on Why Tagging is Expensive:
On the surface tagging seems to offer a new paradigm of organising information, one that reduces the cost of entry and so enables a long tail of participation to emerge. I’ve come to realise that the cost isn’t removed, instead it’s displaced and possibly increased. Tagging bulldozes the cost of classification and piles it onto the price of discovery.
There’s a saying I’ve heard once or twice (I wish I could attribute it): “The cost of metadata is in its application, but the value of metadata is in its use.”
Not exactly something you’ll be quoting at dinner parties, but it nicely captures the cost/benefit gaps of metadata.
The arguments against professional classification (including Clay’s views on tagging) have almost always worked on the cost side of the equation. Automated indexing, search and now tagging are seen as ways drive down classification costs. But as Davis explains, classification costs are only one part of the system:
In my view the total cost of an information retrieval system is the cost of classification plus the cost of discovery. In the formal classification world you have a very small number of people incurring a high cost in order to reduce the costs incurred by a very large number of people. In contrast the tagging world has the unit costs reversed: it’s cheap to classify, expensive to find. But the numbers of people involved are large in both cases so you end up with a lot of people paying a tiny cost to classify added to a lot of people paying a high price to discover. I think it’s pretty likely that the total cost is going to end up much higher than in the classification scenario.
Here’s an analogy. I visit a lot of thrift stores. The true cost of an item in a thrift store is a function of the time it takes me to find it, not the price (which is always cheap). A very large thrift store is more likely to have what I want, but at a greater discovery cost. Like del.icio.us, a thrift store is great for serendipitous discovery but not so good for known item retrieval. Put another way, del.icio.us wouldn’t be your first choice if you needed articles on Rousseau and the French Revolution, just like the Sally Ann wouldn’t be your first choice if you needed a smoking jacket, size 42T.
Where I think Davis might be wrong is suggesting that the discovery costs are shifted back to the user. In fact, the costs are shifted to search, blogs and other more efficient discovery tools. In large part this is because the domain of tagging systems has been the “big messy” web.
In that case, the “classicial” economics of information retrieval don’t apply because there are often multiple ways of finding things. Or because Google can radically lower your discovery costs by selling keyword advertising to offset their infrastructure. Or because algorithms can do much of the heavy lifting. Or because users’ expectations are for “just good enough” results. Or because users are not interesting in finding so much as tracking. And so on.
But I’d argue that once the domain is constrained–by subject, by context, by user population, by privacy/security, by business goals, or by those things in combination–the economic prinicples of classification and retrieval come back into play. Because other discovery tools are either not available or not optimal, poorly designed retrieval systems do shift the burden back to the user. (Karl Fast’s thoughts on problems in the middle are worth a read here).
In that middle ground–and the “big messy” web contains probably millions of cases where local structure is valuable, not to mention information systems that aren’t part of the “big messy” web–I think there’s a large area where a mixture of emergent, algorithmic, formal and now social classification systems will make for optimal retreival.
August 7, 2005
We all know about tagging on Flickr and del.icio.us, and we also know of how it’s being used in some fringier sites like Consumating or Dinnerbuzz.
My question: are tagging and folksonomies occurring in more mundane enterprises? In an essay I wrote, I imagined a tagging interface for an intranet. Has that happened? Gene has told me of how he developed tagging for an internal employee directory (where you could tag your skills), but that’s pretty much the only mundane example of heard of.
Anyway, please post comments if you’re familiar with tagging and folksonomies in less fringe-y web use.
July 31, 2005
Adam Weinroth asked Clay and I five questions about tagging in an email interview posted here.
As for uniform tagging, that can only work in situations where there is enough force to expend making the users behave uniformly, and where it is worth expending that force. For example, the Diagnostic and Statistical Manual, (DSM-IV) used by American psychiatrists and psychologists, provides a relatively standard way to diagnose mental disorders. It only works as well as it does, however, and that not perfectly, because it is produced by the American Psychiatric Association, which body can exert considerable force over its members, and DSM-IV is only to be used by the members.
The amount of human cost, in other words, in creating an enforcing uniformity is so high that attempts at such uniformity will fail in most cases. Fortunately, tagging allows for degenerate cases such as alternate spellings and phrasing. This would be a problem if there were only one tagger, responsible for a large group of users, but with every user a tagger, the loss stemming from degenerate cases quickly shrinks, while the value from multiple points of view grows.
July 6, 2005
Hi, I’m Don Turnbull – an Assistant Professor in the School of Information at the University of Texas at Austin. I teach more than a couple of courses related to issues that could revolve around tagging, including a course on what I call Knowledge Management Systems as well as Web Information Retrieval, Evaluation & Design and even about designing information systems from the perspective of Information Architecture & Design.
I’ve also been doing this kind of work in industry, including leading the research efforts at a startup that was acquired by Google. I currently also spend a bit of time consulting with software companies to develop new kinds of information systems as well as with other businesses to collect, manage and retrieve their own information.
I’m interested in tagging and ad-hoc taxonomies for a number of reasons. Primarily, because they seem to be leveraging the power of large numbers in concert with the practical intelligence of normal people, not just automated systems. The advent of user-defined taxonomies or even user-driven world views (let’s call them ontologies for this post), is transforming how we as individuals manage our own information, how we share it with others (like minded or not) and how we can both browse and search through personal and networked resources in a world where tags might just offer the silver bullet of metadata that helps us manage information overload.
I’ll do my best to talk about issues related to all these claims and ideas both here and on my own blog: donturn.com.