September 1, 2007

The Tagging Growth Curve

A recent flurry of postings from the tagerati on the state of tagging follows up on the idea broached by Phillip Kelleher, and then addressed here in previous posts; to wit, tagging is in a bit of a lull, if not an authentic spate of the doldrums.

A quick listing of postings from the thread:

I’ll assume you’ve read all these worthwhile pieces, and move on to discuss what seems most interesting about both the state of tagging, and what the above interpretations of that state imply about how we think of the intertwined phenomena of technology and culture change in general.

The apparently irregular growth and spread of tagging is simply example of the real nature of how innovations spread. Professional analysts and other meaning makers tend to draw smooth graphs to depict these things. But in reality, natural systems (and the tagging / technology landscape is a legitimate ecosystem) are noisy, cyclical, chaotic, complex, fuzzy, non-linear, and unpredictable. They only appear to follow smooth curves at a high level of abstraction, or a low level of resolution.

When the subject is growth, adoption, and change for tagging, a better comparison to use when gauging status (and thus implied progress) is punctuated equilibrium, the idea “that evolution jumps between stability and relative rapidity”. [Yes, this is also only an approximate frame for tagging. With that noted, I still believe it is better than those frames in current use.]

To set the stage for a look at how this maps to the growth of tagging, I’ll quote Stephen Jay Gould and Niles Eldrige on punctuated equilibrium:

In summarizing the impact of recent theories upon human concepts of nature’s order, we cannot yet know whether we have witnessed a mighty gain in insight about the natural world (against anthropocentric hopes and biases that always hold us down), or just another transient blip in the history of correspondence between misperceptions of nature and prevailing social realities of war and uncertainty. Nonetheless, contemporary science has massively substituted notions of indeterminacy, historical contingency, chaos and punctuation for previous convictions about gradual, progressive, predictable determinism.

These transitions have occurred in field after field; Kuhn’s celebrated notion of scientific revolutions is, for example, a punctuational theory for the history of scientific ideas. Punctuated equilibrium, in this light, is only palaeontology’s contribution to a Zeitgeist, and Zeitgeists, as (literally) transient ghosts of time, should never be trusted. Thus, in developing punctuated equilibrium, we have either been toadies and panderers to fashion, and therefore destined for history’s ashheap, or we had a spark of insight about nature’s constitution. Only the punctuational and unpredictable future can tell.

From Punctuated Equilibrium Comes of Age by Stephen Jay Gould and Niles Eldredge
Applying the frame of punctuated equilibrium to the growth of tagging implies a very differently shaped growth curve.

Tagging Growth Curve

This illustration shows a growth curve with several stages of rapid growth, followed by plateaus of comparative stability. Each stage is a complete cycle of diffusion throughout a community: Pioneers, Enthusiasts, Commercial Innovators, the Commercial Market. Technologies begin “below the cultural waterline”, meaning that they are not part of the generally known or accepted constellation of how things work, and move “above the waterline” to awareness and acceptance.

Boundaries formed by common interests, goals, levels of expertise, or expected investment separate the communities from one another. Each successive community is larger in size. The criteria for successful adoption and diffusion change with each community. Generally, the entry cost thresholds for members of a given community to adopt the technology will become lower, meaning less need for specialized knowledge, or substantial time or or money investments. Of course, this simply means that different actors within successive communities bear investment costs in different proportions. In the pioneer days, everyone “pitches in”. As adoption proceeds across community boundaries, relative equality of participation in innovation – and thus cost sharing – declines.

If you have a commercial perspective – meaning you either want to make money on tagging, or you want someone else to figure out the difficult bits for you and just what they come up with – the goal is to bring the technology “above the cultural waterline”. Crossing this threshold means successful commercialization and profit for those who invest to lower barriers for as successive communities.

Note, it is during the plateaus that innovation occurs. These intervals that sometimes feel like doldrums are the periods when serious minded people are quietly tinkering, building things, and circulating half-complete alphas to friends, colleagues, and thought leaders within their respective communities.

It is during the spikes that the members of the next and larger community adopt the new, refined technology.

What does this mean for tagging? More specifically, how should we understand the state of tagging with this model as a guide?

First, tagging is definitely past the Pioneer stage, when only a few even knew or heard of it. The burst of tag mania that began a few years ago (and is now, in retrospect, clearly over) marked the close of this phase, and the beginning of visible experimentation amongst Enthusiasts. Yes, thanks to vastly lowered design and development costs, organizations are often enthusiasts. Think of the ever-multiplying menagerie of social bookmarking tools that debuted in 2005 and 2006.

Second, tagging is in transition from the stage of experimental exploration by the Enthusiasts, to being legitimately productized, or transformed by money-making organizations – the Commercial Innovators – into something that can be sold for a profit. The recent eWeek demo of not one but *four* enterprise tagging tools from leading vendors BEA, Cogenz, Connectbeam, and IBM, shows this quite clearly.

As long as our current models of adoption and change hold true (and there are good reasons to think these fundamental modes of production are changing), tagging will follow two paths to varying degrees. The first path leads tagging to become commercialized as a recognized part of the technology ecosystem, in which case we can expect to see all the customary signs of productization and the Commercial Market such as packages, vendors, integrators, public release schedules, service tiers, big-ticket invoices, etc. The second path leads tagging to open source legitimacy, with ongoing commitment from a hybrid community of developers and users, and a permanent place in the open source infrastructure. A quick survey of the open source community shows several tagging projects underway, at varying levels of activity.

My current prediction is that tagging will progress along both paths for the next 12 months, pursuing commercialization under the aegis of existing enterprise solutions, while the open source community comes to some sort of consensus on the level of effort tagging needs or warrants. Looking at things from the macroscale, we should check in about a year…

November 23, 2006

Do you have a tagging case study?

I’m looking for a couple of good tagging case studies for a project I’m working on. Enterprise or corporate tagging applications would be particularly good, but consumer web apps or even desktop tagging examples are welcome too.

These case studies may be published so it would be great if you…

  • Could provide screen captures of your application
  • Had clearance from your legal department (or other powers that be)
  • Could talk openly about the benefits/costs of tagging in your application, challenges or problems you or your users faced, how tags work with other classification/retrieval systems you might use… you know, the usual issues.

I can’t promise any compensation for participating aside from the esteem of your peers and recognition as an innovator in this emerging field. But I might be able to get you some swag.

Email me at genesmith [at] atomiq [dot] org if you’re interested, and I’ll give you more details about the project. Thanks!

May 24, 2006

Collaborative Tagging Workshop

Our own Don Turnbull posted from the Collaborative Tagging Workshop at WWW2006 in Edinburg, mentioned in Christian’s last post. Don includes a link to a rich set of 16 papers on collaborative tagging. Fire up your Adobe Readers! (Posted by Jon L. as Admin)

April 3, 2006

Interview with Gordon Luk (FreeTag)

Nearly ten months ago, at the suggestion of Andy Baio I interviewed Gordon Luk (via IM) about FreeTag, an “Open Source Tagging / Folksonomy module for PHP/MySQL applications” he originally created for Upcoming and announced almost a year ago in his blog.

In the meantime I’ve continually intended to edit the chat transcript into a coherent article a post it here. Unfortunately, a strange thing called “life” has intruded. Then, I ran into Andy in Austin at South By Southwest and my embarrassment over sitting on this dialogue returned to the surface, kicking the to-do back to the top of my list.

I started thinking I should touch base with Gordon again, and find out who else has adopted FreeTag lately and any other news updates or developments but then I realized this was just another form of procrastination. What the web wants me to do is post what I’ve got and then Gordon or anyone else can comment on it, or correct it, or update it, and so on.

So, without further ado, here is my interview with Gordon Luk:

xian: Can you tell me how you got the idea for freetag?

Gordon: Sure! It starts with a discussion of who I eat lunch with, actually. I am lucky enough to work with some really smart guys – among them, Andy Baio, Phil Fibiger, Greg Knauss, Christian Newton, and Jason Stuck.

We got to talking about tagging when the term folksonomy was coined.

I can’t remember exactly who had the idea, but we started discussing cross-site interactions between tags on different platforms.

In what sense?

The idea that you could be browsing puppies on flickr, and perhaps you could extract some of del.icio.us’s puppy-tagged links.

Was Technorati doing their pages yet that show items tagged by several different systems?

At that point, I don’t believe so. We got a few of our other friends involved, including the venerable Leonard Lin. Greg included Leonard Richardson on the email that he sent out that night by mistake, so we got some of his feedback too.

So when did it turn into a plan to actually do something?

Well, first it turned into a wiki.

Naturally…

I started off in the direction of creating a PHP class that would implement a standardized XML-RPC or REST communication layer. Greg was more of a proponent of the actual standard to be implemented by that layer.

At that point, we all got busy and it sat for a couple of months.

During another lunchtime conversation, I came up with the idea for eatlunch.at and made it that weekend.

I wanted to use it as a testbed so I could play with tagging, so instead of building it into the whole site, I made the tagging system generic.

One thing that interests me is the enabling or catalysing idea of not just pumping out yet another site or application but instead producing a plug-in that can be distributed across a whole class of projects.

It seems altruistic in the sense of it’s not yet another system trying to collect my contact info, but on the other hand, I’m surprised people don’t modularize like that more often.

Yeah, that’s absolutely very interesting – I wrote a post not too long ago about how I’m interested in the strange inversion of privacy preferences that we subject ourselves to on social services.

Especially public ones like del.icio.us.

We really wanted to enable cross-communication between sites, because it seemed like such a no-brainer once we started talking about it. Typically, when you’re dealing with hierarchies, every site dev has their own view of the world, and things don’t match too well. With freetagging (the term used back then), it doesn’t really matter, because the classification systems emerge from the utility of the application and data.

It’s interesting how tagging is emerging as a kind of meta-glue for the web (if it is – still not sure).

It’s interesting that tag clouds (and now del.icio.us’s recommended tags) are enforcing community standards for popular tags, because with a distributed system, you’d have that not only on a single site, but you could implement that across a wide range of sites.

There’s a tension there – still not clear where it’s going, but it’s fun to watch it emerge (or in your case, i suppose, help move it along). So, the wiki hosted the debate about how to implement or at what conceptual level to implement the idea?

Yes, it might actually still be around, too. It’s hard to say, because we all worked on it for about a week before getting too busy to do anything about it. It was mostly planning and RFC-style note-taking. It was a lot of design work, no coding involved.

Not even pseudocode?

Well, I guess it depends on your definition of that. I think there was some standard communication XML-RPC samples that were flying around, and there was also some API specs that I wrote up.

so did you just sit down and hack out the first version next?

I actually wrote it the same weekend as I wrote eatlunch.at’s core code. It was pretty crummy at first – had some serious issues with special chars, and just ignored quoted tags entirely, among other problems. But the core was there – the schema and a basic API.

Luckily, i’d been practicing with generalized module development through work. I owe Mike Benoit of phpGACL thanks for helping teach me generalized module style in PHP.

phpGACL is a generalized access control lists module that fits into PHP-MySQL apps. It’s an excellent module for anyone to start with. It’s pretty well separated and very generalized. I’d recommend looking at both that and Freetag, because each does things well in a different way. (I get nerdy when I talk about this stuff, so feel free to let me know if I go too far.)

OK, so was implementing it in Upcoming the next test case after eatlunch.at?

Yes, when Andy asked me if I’d like to help with Upcoming, I was chomping at the bit to implement Freetag and see how well it worked. I implemented the core Freetag API in Upcoming in about an hour and a half.

I had event tagging, listing of tags, and tag clouds all done within that timespan.

It made me really implement the trickier things about writing a tagging system, because Andy’s got such a big user base, I can’t get away with being lazy about certain bugs.

Specifically what did you have to nail down?

I really ended up polishing it up to support quoted tags, better ordering and limits on each API function, and normalization. I also had to rewrite the core to separate raw tags and normalized tags, because Andy wanted it to work like Flickr. But that wasn’t too hard once I understood what it meant.

When developing a generalized API, it’s important to provide as many parameters as possible to your core calls – such as offsets, limits, sort order, and sort direction.

So a limit on each API function in that sense means what exactly?

Such as, show me only 5 tags at once, and start 10 tags down in the list. In that case, 5 is the limit, and 10 is the offset.

I understand normalization in a database context but what does it mean when you talk about normalized tags?

It’s a tricky topic – if you look at flickr and upcoming, here’s what we do when someone tags something as “John’s First Movie!” We take that, and normalize it by removing any non-allowed characters, then we lowercase it. Then we store that as an independent tag in Upcoming.

I’m not sure how Flickr does theirs, but in each case, if you’re not the creator of that tag, you’ll see “johnsfirstmovie”. If you’re the actual creator, theoretically you wanted it to be “John’s First Movie,” at least so you can find it again later. So we keep that as a raw tag.

Unfortunately, FreeTag doesn’t go completely normalized between raw and normalized tags, for performance reasons. So it’s not perfectly normalized, but it’s close.

I adjust most of the API functions to handle that so you don’t get duplicates, but that’s a bit technical, you probably don’t need to worry about that.

Sadly, Delicious doesn’t do that, so I have tags there called “foo and bar”

One of my recent Freetag releases implemented a feature where you can pass in all of your configuration parameters to the constructor of the class. That means you don’t have to go in and edit config files each time you upgrade.

One of the cool things that lets you do is keep around your custom valid characters pattern, so you can pick your normalization scheme for yourself.

That lets you keep dashes, underscores, spaces, or even high ascii (for internationalized sites) in the normalized format, if you want it.

I wonder if the web helps force you to plan ahead that way, as it is such a moving target of an environment. It’s almost never a good idea to nail things down too literally.

It’s one of the biggest challenges of developing a generalized module like Freetag. You really need to think ahead and make sure that it’s as generic as possible, so that people don’t have to hack into it themselves and potentially lose their modifications every time they want to upgrade.

It’s all so meta-

Yeah, it’s definitely pretty meta and kinda hard. I have a newfound respect for open source software maintainers.

Has the Upcoming user base given any feedback to you or Andy?

Yes, they actually ended up filing a bug about the tag normalization on the wiki. I ended up explaining it, and they moved it to its own page.

Meaning they thought the feature was a bug?

Yes, that’s what happened. I know that a lot of people really liked the contributions I made to Upcoming, just based upon the press when we released.

So that is a bit of intelligence into what people expect and what confuses them (I’m thinking like a UI/IA guy now).

Hehe, yeah, it confuses people when their perspective doesn’t match that of others. But I think you’ll see that more and more on the web, especially as sites get more complex.

Yeah, for sure. User-experience is a series of tradeoffs. It’s easy to stand off to one side and say it should be optimized for users just like oneself.

The other major things I’ve worked on with Upcoming have been the REST-like API, and the invite feature.

REST-like, does that mean not 100% RESTful?

Hah, I’m specifically using that word, because I know guys who bring up all the time that our API isn’t fully RESTian. AFAIK, there are very few fully RESTful web applications out there that are popular.

Everyone makes tradeoffs – like what happened with Backpack and their $_GET and google web accel fiasco.

Yeah, fundamentalism is never pretty.

I made sure to use $_POST instead on the state-changing calls, which turned out to be the right move. However, I didn’t design with the verb/noun aspect of REST, so I hear that all the time.

People are always mailing in, who don’t understand POST. It’s hard, because everyone understands how to construct a url and make a GET request.

So as far as making an easy platform for beginners to write apps upon, GET is probably the way to go.

In the beginning, it was written, that the HTTP should have four verbs, and Tim Berners-Lee saw that it was good.

Yes, but not even cURL implements DELETE. That’s why I don’t fix that bug.

Yeah, I think I’d be wary of using DELETE outside of a totally secure web app environment, and even then I’d have second thoughts.

well, I overload POST to DELETE for me, but you’ve got to authenticate, etc. But its’ a tricky subject, and I figure by saying REST-like instead of RESTful, I kind of avoid it.

REST-esque

That’s a good one.

It is interesting that you need to think about these things when you’re developing for such a wide potential base.

Yeah, it’s a lot more challenging, because I really want to do things the right way. That’s why i’m lucky to get emails from people smarter than me, telling me how to do things better.

Ok, so have there been any other (significant) implementations yet? I imagine that Upcoming really promoted the hell out of FreeTag, relatively speaking.

A few pretty cool ones – Blogskins implemented it over on their site really quickly too.

I’ve gotten some emails from people planning on using it, and when those go public I’ll be sure to announce it on the mailing list.

It could really speed up adoption of tagging.

OK, let’s take one step back and let me ask you where you think all this tagging is leading us, with the cross-platform tagging idea or maybe other things (that i can’t really imagine, yet) that might be built on top of a heavily tagged web.

Well, I think we’ll start to see tagging systems interoperate once the first person gets out the gate in implementing a tag communication standard. Maybe that will be me, I’m not sure.

But once that happens, I think we’ll see convergence on a wider scale into a really interesting set of tags.

What will that enable beyond the obvious ability to tag more than one kind of thing with the same gesture?

Really freakin big tag clouds.

I’m being a little facetious, but that is actually where you might see things go.

If you’ve ever seen Flittr, it kind of consolidates tagging systems in a one-off way, taking one tag and finding samples in different systems. It’s just kind of slow, unfortunately.

I’ll check it out – sounds interesting at least as a proof of concept.

I personally don’t have time to do this right now, but it would be awesome to have a tag thunderstorm, where you can browse a global tag cloud aggregated from many sites, and then dig down into individual ones.

That does sound pretty cool! But don’t we already have problems with tag clouds (scaling, imposing norms on people vs. harnessing self-interest…)?

I don’t really mind tag clouds that much. In my API, the function that generates one is called silly_list.

Well, they are sort of a stab at the kinds of interfaces we’ve been waiting for for 20 years or so, with an almost 3-D sense of space, relative importance, closeness, etc.

Yeah, totally. I think sometimes it’s just popular to be contrarian.

I don’t think we’ll see the death of hierarchy anytime soon.

You just have to look at how hard it is sometime to dig data out of niche wikis.

When there aren’t that many people tagging a set of stuff, it’s not really that useful.

Do you think folder-like hierarchies and free-tagging complement each other well?

Absolutely. Both are useful – in some ways, it’s kind of the opposition between Google and Yahoo.

I think tag systems are just the collapsed leaves of individual categorization trees, right? That’s totally my nutshell view of what’s going on.

Sure, in a sense, and they do overlapping well without a lot of either duplication or aliasing.

You’re basically flattening then merging personal hierarchies.

Well this is a lot for me to chew on. Thanks for taking the time out to talk to me.

Thanks for asking me to talk about it!

My pleasure, and we can thank Andy for suggesting it too. I’ll be keeping an eye on your stuff, I’m sure.

Sounds great. It was a lot of fun talking about it, and I’ll look forward to seeing what comes from it!

…and, scene.

Gordon, I apologize for taking so long on this. In the end I figured the conversation works better than any sort of “article” I could have turned it into.

March 30, 2006

Social information architecture, sorting, and tagging

Here are my raw notes from Rashmi Sinha’s talk at the IA Summit, “Sorting, Tagging and Social Information Architecture or The Missing Chapter in the Polar Bear Book “:

Who’s sick of hearing about tagging?

[Tagging provides a] focus on the individual….

Have you ever heard of “The man who could not sort”? The discussion of the Chandler card-sorting exercise reminded me of this. A man was asked to sort email into three categories. He couldn’t do it, saying “This is a waste of time.” It didn’t represent him. The test was torturing him. He finally gave up.

I noticed delicious around that time… something about categorization can be really hard, especially social categorization.

Cognitively speaking, analysis paralysis, balancing your scheme. Category boundaries change, labels become obsolete systems hide items – mistakes are costly.

The idea of “the one right category,” people really struggle with it. It’s almost an existential question

How tagging works

It maps to the cognitive process, a reduced load. It’s fun. There is self-feedback, social feedback, no balancing of scheme.

Findability is still the missing bit. Here’s where IA comes in. How do you add sorting, exploration, discovery?

Sorting Tagging
higher cognitive cost lower
richer data less rich data
harder to aggregate socially easy to aggregate socially

How to reduce cognitive cost of categorization

Better interaction design: don’t hide item as soon as you add it to category, flat schemes [q: flatter schemes?], non-exclusive categories.

Categorization is going to make a comeback. These are all fashions. (applause)

Reference to Don’t take my folders away! Organizing personal information to get things done, the feeling of satisfaction that comes from filing things in folders.

Typical IA approach: card sorting… etc. Try it with tags

Brainstorm tags for Apple:
mac
osx
ipod
software
itunes
music
history
technology
windows
macintosh
hardware

Calculate co-occureence. Do hierarchical cluster analysis. You should get similar results if same domain (to heuristics?).

Hybrid approach: TagSorting

  1. Gather terms from del.icio.us
  2. Ask users to do cardsorting

Rashmi asks if anyone has done any other variation

Audience comment:
We do sorts and then ask them to tag the clusters (”how would you refer to this?”)

A lot of product and brand research involves understanding customer categorizations…

Understanding how people think… Reference to Gerald Zaltman, “How Customers Think” (2003)

Product Positioning

Consensus Building Techniques – KJ Method

  • Popular in Japan
  • Allows groups to quickly build consensus (back and forth between individual and group)

MindMapping for Stakeholder Analysis

  • map concepts across multiple stakeholders
  • Trochim’s method
    • ask stakeholders to sort statements related to issue
    • rate importance of each statement
    • create groups; through cluster analysis
    • depict importance of each group

Why tagging is sometimes appropriate…

The Web has become social

  • Findings from Pew Internet Report
  • internet & email play important role in maintaining dispersed social networks
  • people use internet to maintain contact with sizable social networks
  • people use internet to seek out others in tehir networks when they need help
  • concept of networked invidualism (connetions are indiv – to – indiv)

People hang out on the web just for fun – 40 million a day (US)

of men 34%
of women 26%

age
of 18-29 37%
of 30-49 31%
of 50-64 25%
of 65 20%

Tags make the web a shared experience

  • tags give you community
  • other social characteristics
  • social play
  • stalking
  • imitation
  • gossip
  • eavesdropping – [my addition --xian]

Concept of shared browsing, a way of socializing without having to deal with email list strife

Thomas VW: white hat and black hat stalking privacy issues

Why tagging, why now?

Pace layering: No time for consensus to emerge. Tags allow you to respond to fast-changing things. Categorization about consensus.

Problems?

No focus on early adopters. Most IAs on non tech-savvy users. Should balance that by studying early adopters.

Designers like control, but design of social system means letting go

You don’t need Jonathan Ive for MySpace, craigslist, or TagWorld. This is a completely different type of design (social systems)

Tagworld is taking over from Myspace

Menus and Tag Clouds

The tag cloud-menu is not the future…

Menus

  • structured
  • stable over time
  • comprehensive

Tagclouds

  • unstructured
  • relatively unstable
  • not comprehensive
  • let current stuff bubble to top

To respond to hurricane Katrina, most companies added link to the home page, but Flickr and Delicious didn’t need to do anything different.

Comment from audience: Cloud shows relative importance, something easier to assess than absolute importance

Q: why did MS adaptive menus fail
A: Because it’s not just you personally – it’s the social stuff

Design of Social Systems

Serve the individual’s selfish goal.

Create a symbiotic relation (avoid mob, tragedy of commons). Think about when should the individual feel alone, when part of group. How to encourage social sharing. How much mimicry to encourage. How to accommodate local groups. How to encourage expression of alternate viewpoints. W hen to introduce social networks. H ow to encourage wise crowds. How to augment navigation with tags.

Things to Try

  • Create an account on MySpace
  • Read Emergence, Wisdom of Crowds
  • Play a Multiplayer Online Game (World of Warcraft, Second Life)
  • Play with an API (Google maps API for example)
  • Think about what is fun on the web (not just tasks, work)

Q: what about bad-faith or ill-conceived early tagging, setting the wrong tone?
A: [I missed this]. Reference to Erich Von Hippel at MIT, research on lead users

Q: I don’t use tags/tag clouds to find, I search At Yahoo we use lots of tags

Q: Re tag drift, meanings change
A: Tom Coates wrote article on how the meaning of “Ajax” tag changed over time “Tags and Cultural Change”

Q: In spirit of fun and play, other good social applications in the local space (beyond DodgeBall)?
Comment: An app called Socialight out of ITP at NYU, allows you to add stories to buildings, “this is a great coffeeshop,” “there were three murders here in 1932 and everybody says this house is haunted.”

tags: ,

March 16, 2006

Linking Up Research Papers Using Tags

Back in my first post to this blog, I said that over here at Nature we’re interested in the question of "…how far tagging can take us in tackling the (formidable) information organisation needs of modern science." Today we’re starting on a cool (I think) new experiment that might help provide some early answers.

Many of you are no doubt familiar with Matt Biddulph’s wonderful mock-up of the BBC Radio 3 website as it might work with embedded del.icio.us functionality. (See in particular Matt’s Flash movie here.) Inspired by this, we’ve just released some code that adds the same type of functionality (but this time for real) to ‘institutional repositories’ (IRs) — websites that scientists and other academics use to share their work with each other.

One general problem with IRs is that, notwithstanding services like Google Scholar, a lot of their content isn’t very easy to find, and it certainly isn’t easy to browse between related items in different repositories. Our new code aims to improve things by allowing IR users to tag articles and see links to related content, all from within the IR web page itself. Behind the scenes, the software communicates with del.icio.us and/or Connotea (Nature’s own social bookmarking service for scientists). Since Connotea is open source, it will also work with any instance of Connotea Code.

The good folks at the University of Southampton’s Electronics and Computer Science Department have now put this code on their institutional repository, creating our first real-world installation (yeah! :) ). Here is an example of a tagged paper. You need to enter Connotea user details for it to work (because calls to Connotea’s web API require you to be a known user). For those who can’t be bothered with that, here’s a screenshot of the sort of thing you see just below the article abstract:

The recommendations (which are generated based mainly on tag co-occurrence) already seem OK to me, but they should get better as more links and tags get entered into the system.

There’s lots of different IR software out there, and our code currently only works with EPrints, which we chose because it’s very popular, is written in a language (Perl) that we’re familiar with, and has a friendly development team just down the road from us. If you’re the administrator of an EPrints repository then you can get instructions and code from here. I’m told that it’s a doddle to install.

More information is available in this blog entry by Ben Lund, who runs Connotea for Nature. We’re really grateful to the UK Joint Information Systems Committee for funding this work and would be very interested to hear about people’s experiences, either in comments posted here or by email (t DOT hannay AT nature DOT com).

March 12, 2006

Tagging 2.0 at South By

Panelists:

I’m going to try to group the comments into subject areas. Let’s see how well that works.

Tags going mainstream

Don Turnbull:

Who’d have thought we’d be talking about metadata on a beautiful Sunday morning in Austin?

Is tagging the key element of Web 2.0? (Probably not.) The ETech definition: Web 1.0 was the read-only web. Web 2.0 is the read-write web.

Thomas Vander Wal:

I coined the word folksonomy… and the correct definition wasn’t given in the beyond folksonomy panel.

People used to tag on the command line. Web 1.0 tagging didn’t work. Tools like Bitsy. Cory’s “metacrap” article. Web 2.0: delicious and flickr, actually useful for finding and re-finding information.

More than 40 sites are doing social bookmarking.

60 to 70 sites using tagging as their main way to bring people in. (7 travel sites, for example, using tagging as their appeal.) More than 200 services have included tagging (Amazon).

What are tags useful for?

Don Turnbull:

Are these systems useful beyond a few types of tasks or categories of information?

  • Re-finding information
  • Creating personal metadata
  • The new command line (quicker than drag/drop, sort, click)
  • Gateway to the next PIM?
  • Tags as verbs (”buy,” “sell”), expanding the vocabulary (ratings: “*,” “**,” “***” etc.)
  • People-centric view of data, vs. system-centric.
  • Good for keeping track of things you already know about, but what about discovery?
  • It’s more interesting to find a like mind than just a resource

Adina Levin:

Tagging is social, helpful to the individual and increasingly valuable to the group.

Tag games (Flickr came from the game design world), example of red and green game leading to joining the Japanese Maple group, aircraft spotters.

Jon Udell’s InfoWorld Explorer tool crawl’s delicious and aggregrates InfoWorld articles by genre, author, date, tags, title

Why is Tagging better than Categorization?

Rashmi Sinha:
I’m going to be a cheerleader for tagging

When categorizing, we choose between multiple concepts. Tagging is easier. Joshua Schachter in his infinite wisdom figured out you can just write down what comes to mind. Note all concepts instead of choosing one and invoking a hierarchy.

Better than any other social system on the web, tagging approximates the wisdom of crowds:

  • cognitive diversity
  • independence
  • decentralization
  • easy aggregation

The moment of tagging is you and that object alone (but – I interject in my mind – what about delicious’s “recommendations”? – isn’t that influence from the crowd?).

Social formations supported by tags

  • ad hoc groups
  • lots of weak social ties
  • conceptually mediated ties

Flaws, Issues, Usability

Don Turnbull:
Are these systems usable beyond alpha geeks?

  • Interface improvements: Good import? Teach vocabulary? Make re-finding information easier.
  • Tag clouds probably not the answer
  • Spamming, gaming, TagFraud
  • Tagging is implicit (good and bad)
  • Not all resources are as identifiable (microcontent?)… granular, web pages; items, commerical products
  • Tags as identity (how so? i-tags?)

Vander Wal:

  • “Re-findability sucks… We need to fix the re-findability problem.”
  • Looks messy to others.
  • No identity in Flickr. (Example: can’t see the 40 things Don has tagged with “orange”)
  • Folksonomy triad (one person), dual folksonomy triad (including community) – really need slide to illustrate
  • Context often missing, it gets messy, we have silos

Prentiss Riddle:
Six dirty secrets of tagging

  1. It’s the content stupid
  2. Ordinary people don’t get tags (text box prompt gets a sentence response or maybe a Google search) and tag clouds
  3. It’s the UX, stupid – flickr guides you
  4. Tags don’t play well with others (interop)

    • Character sets
    • Delimiter wars (commas, spaces, etc.)
    • Synonyms (singular vs. plural, homonyms)
    • aggregration, portability
  5. Rich functionality requires rich metadata (where’s my flying car? I wouldn’t want to use them for medical applications, managing money, hunting terrorists)
  6. Nobody wants “real tags” (simple keyword metadata, no control, no hierarchy, no syntax or semantics, minimal cognitive effort by the user). What people really want is “tagginess” (Stephen Colbert image)… delicious for:username, Shadows @group, geotagging, consensus tagging (sxsw2006, chosendarkness), hierarchical tagging (history.us.wwii, history.wwii.us)… it’s the oppostie of tagging

Faceted tagging: Mefeedia (by place, by content, etc.), tagginess.com is available for sale.

Adina Levin:
Tags are messy (blog, blogging, blogs) in tag clouds, compound words

Tag refactoring: consolidate synonyms, fix and standardize spelling, add hierarchy

but…
Don’t make me think, loss of tag snark, loses “bottom-up” purity, a hybrid of top-down and the group mind

Rashmi Sinha:
Tips for tag designers

  1. How are you serving the individual motive
  2. does the individual understand and want to fulfill that goal
  3. What is the relationship between social and perosnal
  4. Is it too easy to mimic the tags of others
  5. Is finding all about the most popular, most tagged?
  6. Enable discovery, exploration, finding new things
  7. Don’t force users to do things differnetly than what come snaturally
  8. Solve problems by ensuring good finability

Questions

Q: How to deal with Tag spam, tag fraud?

Thomas Vander Wal: blacklists, another reason why you need to see who tagged it and what object was tagged.

Question: How to work with synonyms and homonyms

Prentiss: Clustering at Flickr works well because they have so much rich metadata available to mine.

Adina Levin: I like delicious’s suggestions

Rashmi Sinha: In input let the user do what they want. In the findability stage deal with the problem.

Technorati tags: , (in case Technorati’s not picking up our native tags)

March 11, 2006

Notes on Beyond Folksonomies at SXSW

I posted my notes on the Beyond Folksonomies at SXSW at my “The Power of Many” blog.

Update: Scot Hacker took notes on this panel too.

Technorati tags: , (in case Technorati’s not picking up our native tags)

January 17, 2006

Update: Tim O’Reilly on “Search Engines as Leeches”

Tim O’Reilly posts on the O’Reilly radar that Jakob Nielsen’s concern about search engines “strikes a chord.”

It’s easy to see why folks with paid content businesses would be concerned about giving away too much information via search engines, but it’s really interesting to see the same concerns springing up around free content sites. Google and Yahoo! have done a good job of providing ad revenue back to small content providers with AdSense and Overture, but their model is also a threat to many prevalent kinds of advertising. And of course, the search engines get a huge amount of revenue from advertising on the index pages themselves. I tend to think that the search engines earn their keep, but I’ve got my ear to the ground, and Jakob makes a thoughtful case.

Since Tim authored the Web 2.0 piece I linked from my previous post, I thought I should note that his take on the Nielsen piece was more supportive.

January 16, 2006

Jakob Nielsen: Search Engines are Leeches!

Jakob Nielsen complains that search engines “are sucking out too much of the Web’s value, acting as leeches on companies that create the very source materials the search engines index.” He’s worried that search engines “build their business on other websites’ content,” and that “paid search confiscates too much of a website’s value.”

Reading between the lines, I think Nielsen’s complaint is not about search engines per se, but about “Web 2.0″ and the evolving semantic web. If I’m right, his concern is also applicable to RSS (which he notes that he hasn’t researched yet) and tagging… we’re moving away from “site” and “page” as controlling metaphors, and focusing more on information, less on presentation. Nielsen’s been so focused on web page and site usability that he’s only just beginning to get the message.