August 21, 2009

Who is “Everybody”?

One of the most memorable quotes on the subject of ‘categorization’ and the formulation (or not) of ontologies and taxonomies in information-science (and in relation to web-bookmarking and/or tagging), came from Clay Shirky, when he said:

“The Only Group That Can Categorize Everything Is Everybody” – Clay Shirky

The statement is both simple and logical at the same time, and this message is reflected again in Shirky’s eminently worthy 2008 book: ‘Here Comes Everybody: The Power of Organizing Without Organizations’ and previously explored in his July 2005 talk at Oxford, ‘Institutions vs Collaboration’ featured on TEDtalks, [link]. Shirky’s primary message in these publications has been about how the web has had the effect of increasing collaboration outside of institutional models by lowering ‘coordination-costs’, and in fact how the web has facilitated this by embedding ‘cooperation’ into it’s infrastructure. However, the quantum leap in cooperation enabled by the web is still subject to certain ‘bottle-neck’ effects by virtue of the client-server (provider-receiver) dichotomy upon which the web is based.

One of the great levelers on the web however, has been another of the areas of Shirky’s research, and that of course is ‘tagging’. Tagging is so simple to execute that literally ‘everybody’ can do it. However, generally, as far as most people are concerned, tagging is just about labelling stuff within what are still ultimatley web-silos, and the general web population don’t control those silos. However, the inherent tripartite stucture of a tag which potentially informs us a) about the resource being tagged, b) about the identity of the tagger and c) about the interests of the tagger, all combine to define the ternary relationship between them. When this is multiplied by ‘x’ number of tags in a given system, it opens a veritable pandora’s box of application options.

In his 2005 talks ‘Ontology is Overrated’ and ‘Folksonomies & Tags’ Shirky provocatively stated: “…there is no shelf. There is no file system. The links alone are enough.” [link] …and so began an ongoing debate between those who believe that it is the natural role and responsibility of ‘experts’ to classify and categorize information, (the top-downers) and those who believe that prescribing rules and ontologies interfere with a potentially more natural and egalitarian process, (the bottom-uppers).

The application of Shirky’s prophetic statement about the role of ‘everybody’ (which let me say, truly inspires me) in the real world is not only not straight forward, but (I personally believe) is effectively under challenge by (another group of top-downers) the exponents of the ever increasing number of ’standards’ that are being developed by an army of coders and ‘webocrats’, creating (what amounts to) so many endless firmware-updates to de-bug the truly massive virtual operating-system that our web has become.

At the centre (more or less) of this cornucopia of solutions and tweaks lies Tim Berners-Lee’s ‘Semantic-Web’ and its more recent incarnation as ‘Linked Data’. However, in this ecosystem of rules, and rules about rules, we also have: The ‘Resource Description Framework’ RDF (which grew from the ‘Meta Content Framework’ MCF, combined with XML, whereupon Microsoft created ‘Channel Definition Format’ CDF, to revolutionize “push” technology, only to be trumped when the WC3 backed RDF) then we got ‘Web Ontology Language’ OWL, which (according to some) can be trumped by ‘Extensible Stylesheet Language’ XSLT… and, just to be sure, now we’ve got RDFa, to help deal with issues with RDF. As Greg Boulten has opined on his Blog ’’ [link] :

“If you really needed the whole stack for Linked Data to work, you would have a bigger problem on your hands: now you’d have to explain to users that to try Linked Data you need to understand and use RDF, URIs, SPARQL, OWL, and the other pieces of the stack. Good luck with driving adoption that way… Oh, but wait, that’s what has been done so far, with the slow results we know of.” – Greg Boulten

So… how does this all relate to tagging and who is best to categorise ‘everything’? Well… Its ultimately about ‘prescriptive-solutions’ (and I’m thinking mainly of the ’semantic web’ movement here) that are conceived as potentially machine-readable conventions that will somehow help us, by removing dependencies on that unreliable faculty: ‘intuitive human decision making’. – This is where the polarity exists… There are those who believe in prescribing rules and systems to deal with the exponentially expanding web, and those that recognise that an evolutionary process is underway, both in new forms of language creation (emergent semantics) and in the need for alternative ways to deal with this unprecedented explosion of information.

In Folksonomies we have become used to the counter hierarchical process of selection that are displayed in tag-clouds, but being part of this evolution, we are still in the early stages of this change process. As was discussed on AllPeers blog (before their VCs pulled the plug)

“…folksonomies work because they leverage a very efficient natural language processing tool: the human brain. By offloading the task of disambiguation onto the user, folksonomies reduce the need for all of those fiddly niceties like hierarchy that ontologists have traditionally considered necessary.” [link]

So, ‘Who is Everybody?’… and how will ‘everybody’ maintain the promise and power of Clay Shirky’s beautifully simple and self evidently true statement? – I do believe there definitely is a role for technolgy, protocols and systems to help ‘everybody’ do this, but we have to make sure that a clear distinction is drawn between ‘top-down’ prescriptive solutions vs ‘bottom-up’ models which would allow ‘everybody’ to not just be part of the process, but in fact to BE the process.

July 21, 2007

Is Tagging A Disruptive Innovation?

Regarding my post Tagging and the Hype Cycle, Xian said:

…You write: “Tagging does not seek to displace existing technologies or entrenched vendors” but are there not automated taxonomy generating tools that might be disrupted by the widespread adoption of tagging?

More broadly, isn’t tagging something of a threat to top-down ontology and taxonomy approaches?

Great to see some chatter here to dispell the “trough” idea.

Indeed there are classes of existing metadata management tools which may suffer a decline as the practice of social / distributed tagging spreads. And tagging can also be seen as a challenge to top-down approaches, with the corollary of it being a challenge to the software tools / services / hardware connected to those approaches. Good points, both.

I should make clear that I’m drawing boundaries for the conversation at this first step, looking at tagging as it compares to and contrasts with the other common candidates for the Hype Cycle style analysis Keller offered. That means comparing tagging to the broad class of IT solutions tracked by the (now myriad) Hype Cycles, and, amongst other analyst offerings, their close cousins the Forrester Waves (there must be almost 200 of each by now…). These solutions are themselves parts of the larger IT ecosystem which includes well defined roles (a bit like niches) for all the parties involved; vendor, buyer, partner, competitor, regulator, etc.

In these terms, it is difficult to identify direct market actors (business or otherwise) associated with tagging. To date, there are few potential or actual agents trying to take on any of the above roles available in the IT ecosystem. There are some recently available tagging solutions – in the traditional style of software you lease / install / subscribe to – offered for purchase. Does anyone know how well they are selling…?

Thus, I don’t think the Hype Cycle comparison holds. In simple financial terms, I’m not aware of anyone making or losing substantial amounts of money specifically in relation to tagging. For many reasons, tagging has not yet emerged – and may never emerge – as a category of technology investment and activity for businesses.

Moving forward, Xian’s done good work reframing the conversation to address another level. Xian’s questions shift the discussion outside the tight boundaries I drew, to consider the impact of tagging on existing solutions for metadata management and related parties. And underlying this impact assessment is the larger question of whether tagging is a disruptive innovation: will tagging change the shape of the metadata management ecosystem? Will tagging lead to new niches?

In comparison to established metadata management solutions, tagging shows several of the characteristics of disruptive innovations:

  • tagging is cheaper
  • tagging has low entry barriers
  • tagging is self-service

Not coincidentally, these attributes are the centerpieces of Clay Shirky’s earlier arguments in favor of tagging, and there is no need to revisit them in depth.

But there is still debate about the specifics of these attributes. For example, in what ways is tagging cheaper? And in what contexts (maybe not for me)? Or does tagging simply distribute costs differently; perhaps over time (pay now, or pay later…), or across actors (is free really free for *you*?), or by manifesting costs in different ways (time is often money. so is quality. so are mistakes)?

The conversations playing out around these questions indicate progress in how well tagging is understood. But they also demonstrate that the major cultural and organizational shifts in thinking – shifts that pave the way for people to invest, build, buy, and do all the other things that drive changes in the ecosystem – are still underway.

Though it’s been a few years since tagging became visible, it seems too early to understand what kind of changes – if any – will occur in the metadata management ecosystem as a result of tagging’s emergence. In the meantime, insights and examples of tagging’s impact from those better-informed (or more insightful) are welcome.

November 23, 2006

Do you have a tagging case study?

I’m looking for a couple of good tagging case studies for a project I’m working on. Enterprise or corporate tagging applications would be particularly good, but consumer web apps or even desktop tagging examples are welcome too.

These case studies may be published so it would be great if you…

  • Could provide screen captures of your application
  • Had clearance from your legal department (or other powers that be)
  • Could talk openly about the benefits/costs of tagging in your application, challenges or problems you or your users faced, how tags work with other classification/retrieval systems you might use… you know, the usual issues.

I can’t promise any compensation for participating aside from the esteem of your peers and recognition as an innovator in this emerging field. But I might be able to get you some swag.

Email me at genesmith [at] atomiq [dot] org if you’re interested, and I’ll give you more details about the project. Thanks!

May 24, 2006

Collaborative Tagging Workshop

Our own Don Turnbull posted from the Collaborative Tagging Workshop at WWW2006 in Edinburg, mentioned in Christian’s last post. Don includes a link to a rich set of 16 papers on collaborative tagging. Fire up your Adobe Readers! (Posted by Jon L. as Admin)

April 28, 2006

Siderean’s tagged facets

Siderean, one of the interesting faceted classification companies, has announced some new capabilities that aim at automating the generation of metadata and that integrate tagging with facets.

The automation comes from entity extraction tools (plus the ability to integrate third party tools, because, frankly, Siderean is not in the entity extraction business) that isolate names of people, places, organizations, dates, etc. from a collection of pages. This addresses one of the real inhibitors of the use of faceted classification: The data has to already be well structured and well tagged. That makes it great for browsing databases but not as good for browsing big piles of unstructured data (= documents).

The system integrates tags in a useful way. Users can tag items and then use tags to further specify searches through the faceted interface. In fact, the tags can be “bucketed” and treated as facets. The tags can be marked as personal or public, and can be associated with groups and other contexts. Yes, the system does integrate with (Siderean fooled around with this in a beta project called — wonderfully —

Siderean also announced that it’s now using the faceted information to drive analytics. This is really “just” another way of displaying the faceted information. But it can be quite useful because a faceted system has so much data built into it. For example, a library system might know that (and this is a made-up example) there were fifteen times as many books about Iraq published in the past two years than in the past twenty; it has to know this if it’s going to let users browse for books by subject and then by year (or vice versa). Siderean’s analytics offering follows that of Endeca.

Faceted classification is young. It’s exciting watching imaginative companies like Siderean invent new twists and turns right under our eyes.

April 23, 2006

Impure folksonomies for retailers

Dan Klyn has some practical suggestions for retailers thinking about letting users tag merchandise. Why not pre-populate your catalog with tags drawn from the item descriptions? Why not rank tags higher based on the popularity of the page or item? What do you do about a product that’s tagged “crappy” or “over-priced”? (I think Dan’s answer that last one is that you surface tags based in part on how popular they are.) The result is not a pure folksonomy, but purity isn’t always what we — merchants and shoppers — need.

He also points to as an example of a merchant using tags well.

March 30, 2006

Social information architecture, sorting, and tagging

Here are my raw notes from Rashmi Sinha’s talk at the IA Summit, “Sorting, Tagging and Social Information Architecture or The Missing Chapter in the Polar Bear Book “:

Who’s sick of hearing about tagging?

[Tagging provides a] focus on the individual….

Have you ever heard of “The man who could not sort”? The discussion of the Chandler card-sorting exercise reminded me of this. A man was asked to sort email into three categories. He couldn’t do it, saying “This is a waste of time.” It didn’t represent him. The test was torturing him. He finally gave up.

I noticed delicious around that time… something about categorization can be really hard, especially social categorization.

Cognitively speaking, analysis paralysis, balancing your scheme. Category boundaries change, labels become obsolete systems hide items – mistakes are costly.

The idea of “the one right category,” people really struggle with it. It’s almost an existential question

How tagging works

It maps to the cognitive process, a reduced load. It’s fun. There is self-feedback, social feedback, no balancing of scheme.

Findability is still the missing bit. Here’s where IA comes in. How do you add sorting, exploration, discovery?

Sorting Tagging
higher cognitive cost lower
richer data less rich data
harder to aggregate socially easy to aggregate socially

How to reduce cognitive cost of categorization

Better interaction design: don’t hide item as soon as you add it to category, flat schemes [q: flatter schemes?], non-exclusive categories.

Categorization is going to make a comeback. These are all fashions. (applause)

Reference to Don’t take my folders away! Organizing personal information to get things done, the feeling of satisfaction that comes from filing things in folders.

Typical IA approach: card sorting… etc. Try it with tags

Brainstorm tags for Apple:

Calculate co-occureence. Do hierarchical cluster analysis. You should get similar results if same domain (to heuristics?).

Hybrid approach: TagSorting

  1. Gather terms from
  2. Ask users to do cardsorting

Rashmi asks if anyone has done any other variation

Audience comment:
We do sorts and then ask them to tag the clusters (”how would you refer to this?”)

A lot of product and brand research involves understanding customer categorizations…

Understanding how people think… Reference to Gerald Zaltman, “How Customers Think” (2003)

Product Positioning

Consensus Building Techniques – KJ Method

  • Popular in Japan
  • Allows groups to quickly build consensus (back and forth between individual and group)

MindMapping for Stakeholder Analysis

  • map concepts across multiple stakeholders
  • Trochim’s method
    • ask stakeholders to sort statements related to issue
    • rate importance of each statement
    • create groups; through cluster analysis
    • depict importance of each group

Why tagging is sometimes appropriate…

The Web has become social

  • Findings from Pew Internet Report
  • internet & email play important role in maintaining dispersed social networks
  • people use internet to maintain contact with sizable social networks
  • people use internet to seek out others in tehir networks when they need help
  • concept of networked invidualism (connetions are indiv – to – indiv)

People hang out on the web just for fun – 40 million a day (US)

of men 34%
of women 26%

of 18-29 37%
of 30-49 31%
of 50-64 25%
of 65 20%

Tags make the web a shared experience

  • tags give you community
  • other social characteristics
  • social play
  • stalking
  • imitation
  • gossip
  • eavesdropping – [my addition --xian]

Concept of shared browsing, a way of socializing without having to deal with email list strife

Thomas VW: white hat and black hat stalking privacy issues

Why tagging, why now?

Pace layering: No time for consensus to emerge. Tags allow you to respond to fast-changing things. Categorization about consensus.


No focus on early adopters. Most IAs on non tech-savvy users. Should balance that by studying early adopters.

Designers like control, but design of social system means letting go

You don’t need Jonathan Ive for MySpace, craigslist, or TagWorld. This is a completely different type of design (social systems)

Tagworld is taking over from Myspace

Menus and Tag Clouds

The tag cloud-menu is not the future…


  • structured
  • stable over time
  • comprehensive


  • unstructured
  • relatively unstable
  • not comprehensive
  • let current stuff bubble to top

To respond to hurricane Katrina, most companies added link to the home page, but Flickr and Delicious didn’t need to do anything different.

Comment from audience: Cloud shows relative importance, something easier to assess than absolute importance

Q: why did MS adaptive menus fail
A: Because it’s not just you personally – it’s the social stuff

Design of Social Systems

Serve the individual’s selfish goal.

Create a symbiotic relation (avoid mob, tragedy of commons). Think about when should the individual feel alone, when part of group. How to encourage social sharing. How much mimicry to encourage. How to accommodate local groups. How to encourage expression of alternate viewpoints. W hen to introduce social networks. H ow to encourage wise crowds. How to augment navigation with tags.

Things to Try

  • Create an account on MySpace
  • Read Emergence, Wisdom of Crowds
  • Play a Multiplayer Online Game (World of Warcraft, Second Life)
  • Play with an API (Google maps API for example)
  • Think about what is fun on the web (not just tasks, work)

Q: what about bad-faith or ill-conceived early tagging, setting the wrong tone?
A: [I missed this]. Reference to Erich Von Hippel at MIT, research on lead users

Q: I don’t use tags/tag clouds to find, I search At Yahoo we use lots of tags

Q: Re tag drift, meanings change
A: Tom Coates wrote article on how the meaning of “Ajax” tag changed over time “Tags and Cultural Change”

Q: In spirit of fun and play, other good social applications in the local space (beyond DodgeBall)?
Comment: An app called Socialight out of ITP at NYU, allows you to add stories to buildings, “this is a great coffeeshop,” “there were three murders here in 1932 and everybody says this house is haunted.”

tags: ,

March 16, 2006

Linking Up Research Papers Using Tags

Back in my first post to this blog, I said that over here at Nature we’re interested in the question of "…how far tagging can take us in tackling the (formidable) information organisation needs of modern science." Today we’re starting on a cool (I think) new experiment that might help provide some early answers.

Many of you are no doubt familiar with Matt Biddulph’s wonderful mock-up of the BBC Radio 3 website as it might work with embedded functionality. (See in particular Matt’s Flash movie here.) Inspired by this, we’ve just released some code that adds the same type of functionality (but this time for real) to ‘institutional repositories’ (IRs) — websites that scientists and other academics use to share their work with each other.

One general problem with IRs is that, notwithstanding services like Google Scholar, a lot of their content isn’t very easy to find, and it certainly isn’t easy to browse between related items in different repositories. Our new code aims to improve things by allowing IR users to tag articles and see links to related content, all from within the IR web page itself. Behind the scenes, the software communicates with and/or Connotea (Nature’s own social bookmarking service for scientists). Since Connotea is open source, it will also work with any instance of Connotea Code.

The good folks at the University of Southampton’s Electronics and Computer Science Department have now put this code on their institutional repository, creating our first real-world installation (yeah! :) ). Here is an example of a tagged paper. You need to enter Connotea user details for it to work (because calls to Connotea’s web API require you to be a known user). For those who can’t be bothered with that, here’s a screenshot of the sort of thing you see just below the article abstract:

The recommendations (which are generated based mainly on tag co-occurrence) already seem OK to me, but they should get better as more links and tags get entered into the system.

There’s lots of different IR software out there, and our code currently only works with EPrints, which we chose because it’s very popular, is written in a language (Perl) that we’re familiar with, and has a friendly development team just down the road from us. If you’re the administrator of an EPrints repository then you can get instructions and code from here. I’m told that it’s a doddle to install.

More information is available in this blog entry by Ben Lund, who runs Connotea for Nature. We’re really grateful to the UK Joint Information Systems Committee for funding this work and would be very interested to hear about people’s experiences, either in comments posted here or by email (t DOT hannay AT nature DOT com).

March 12, 2006

Tagging 2.0 at South By


I’m going to try to group the comments into subject areas. Let’s see how well that works.

Tags going mainstream

Don Turnbull:

Who’d have thought we’d be talking about metadata on a beautiful Sunday morning in Austin?

Is tagging the key element of Web 2.0? (Probably not.) The ETech definition: Web 1.0 was the read-only web. Web 2.0 is the read-write web.

Thomas Vander Wal:

I coined the word folksonomy… and the correct definition wasn’t given in the beyond folksonomy panel.

People used to tag on the command line. Web 1.0 tagging didn’t work. Tools like Bitsy. Cory’s “metacrap” article. Web 2.0: delicious and flickr, actually useful for finding and re-finding information.

More than 40 sites are doing social bookmarking.

60 to 70 sites using tagging as their main way to bring people in. (7 travel sites, for example, using tagging as their appeal.) More than 200 services have included tagging (Amazon).

What are tags useful for?

Don Turnbull:

Are these systems useful beyond a few types of tasks or categories of information?

  • Re-finding information
  • Creating personal metadata
  • The new command line (quicker than drag/drop, sort, click)
  • Gateway to the next PIM?
  • Tags as verbs (”buy,” “sell”), expanding the vocabulary (ratings: “*,” “**,” “***” etc.)
  • People-centric view of data, vs. system-centric.
  • Good for keeping track of things you already know about, but what about discovery?
  • It’s more interesting to find a like mind than just a resource

Adina Levin:

Tagging is social, helpful to the individual and increasingly valuable to the group.

Tag games (Flickr came from the game design world), example of red and green game leading to joining the Japanese Maple group, aircraft spotters.

Jon Udell’s InfoWorld Explorer tool crawl’s delicious and aggregrates InfoWorld articles by genre, author, date, tags, title

Why is Tagging better than Categorization?

Rashmi Sinha:
I’m going to be a cheerleader for tagging

When categorizing, we choose between multiple concepts. Tagging is easier. Joshua Schachter in his infinite wisdom figured out you can just write down what comes to mind. Note all concepts instead of choosing one and invoking a hierarchy.

Better than any other social system on the web, tagging approximates the wisdom of crowds:

  • cognitive diversity
  • independence
  • decentralization
  • easy aggregation

The moment of tagging is you and that object alone (but – I interject in my mind – what about delicious’s “recommendations”? – isn’t that influence from the crowd?).

Social formations supported by tags

  • ad hoc groups
  • lots of weak social ties
  • conceptually mediated ties

Flaws, Issues, Usability

Don Turnbull:
Are these systems usable beyond alpha geeks?

  • Interface improvements: Good import? Teach vocabulary? Make re-finding information easier.
  • Tag clouds probably not the answer
  • Spamming, gaming, TagFraud
  • Tagging is implicit (good and bad)
  • Not all resources are as identifiable (microcontent?)… granular, web pages; items, commerical products
  • Tags as identity (how so? i-tags?)

Vander Wal:

  • “Re-findability sucks… We need to fix the re-findability problem.”
  • Looks messy to others.
  • No identity in Flickr. (Example: can’t see the 40 things Don has tagged with “orange”)
  • Folksonomy triad (one person), dual folksonomy triad (including community) – really need slide to illustrate
  • Context often missing, it gets messy, we have silos

Prentiss Riddle:
Six dirty secrets of tagging

  1. It’s the content stupid
  2. Ordinary people don’t get tags (text box prompt gets a sentence response or maybe a Google search) and tag clouds
  3. It’s the UX, stupid – flickr guides you
  4. Tags don’t play well with others (interop)

    • Character sets
    • Delimiter wars (commas, spaces, etc.)
    • Synonyms (singular vs. plural, homonyms)
    • aggregration, portability
  5. Rich functionality requires rich metadata (where’s my flying car? I wouldn’t want to use them for medical applications, managing money, hunting terrorists)
  6. Nobody wants “real tags” (simple keyword metadata, no control, no hierarchy, no syntax or semantics, minimal cognitive effort by the user). What people really want is “tagginess” (Stephen Colbert image)… delicious for:username, Shadows @group, geotagging, consensus tagging (sxsw2006, chosendarkness), hierarchical tagging (,… it’s the oppostie of tagging

Faceted tagging: Mefeedia (by place, by content, etc.), is available for sale.

Adina Levin:
Tags are messy (blog, blogging, blogs) in tag clouds, compound words

Tag refactoring: consolidate synonyms, fix and standardize spelling, add hierarchy

Don’t make me think, loss of tag snark, loses “bottom-up” purity, a hybrid of top-down and the group mind

Rashmi Sinha:
Tips for tag designers

  1. How are you serving the individual motive
  2. does the individual understand and want to fulfill that goal
  3. What is the relationship between social and perosnal
  4. Is it too easy to mimic the tags of others
  5. Is finding all about the most popular, most tagged?
  6. Enable discovery, exploration, finding new things
  7. Don’t force users to do things differnetly than what come snaturally
  8. Solve problems by ensuring good finability


Q: How to deal with Tag spam, tag fraud?

Thomas Vander Wal: blacklists, another reason why you need to see who tagged it and what object was tagged.

Question: How to work with synonyms and homonyms

Prentiss: Clustering at Flickr works well because they have so much rich metadata available to mine.

Adina Levin: I like delicious’s suggestions

Rashmi Sinha: In input let the user do what they want. In the findability stage deal with the problem.

Technorati tags: , (in case Technorati’s not picking up our native tags)

March 11, 2006

Notes on Beyond Folksonomies at SXSW

I posted my notes on the Beyond Folksonomies at SXSW at my “The Power of Many” blog.

Update: Scot Hacker took notes on this panel too.

Technorati tags: , (in case Technorati’s not picking up our native tags)