Nearly ten months ago, at the suggestion of Andy Baio I interviewed Gordon Luk (via IM) about FreeTag, an “Open Source Tagging / Folksonomy module for PHP/MySQL applications” he originally created for Upcoming and announced almost a year ago in his blog.
In the meantime I’ve continually intended to edit the chat transcript into a coherent article a post it here. Unfortunately, a strange thing called “life” has intruded. Then, I ran into Andy in Austin at South By Southwest and my embarrassment over sitting on this dialogue returned to the surface, kicking the to-do back to the top of my list.
I started thinking I should touch base with Gordon again, and find out who else has adopted FreeTag lately and any other news updates or developments but then I realized this was just another form of procrastination. What the web wants me to do is post what I’ve got and then Gordon or anyone else can comment on it, or correct it, or update it, and so on.
So, without further ado, here is my interview with Gordon Luk:
xian: Can you tell me how you got the idea for freetag?
Gordon: Sure! It starts with a discussion of who I eat lunch with, actually. I am lucky enough to work with some really smart guys – among them, Andy Baio, Phil Fibiger, Greg Knauss, Christian Newton, and Jason Stuck.
We got to talking about tagging when the term folksonomy was coined.
I can’t remember exactly who had the idea, but we started discussing cross-site interactions between tags on different platforms.
In what sense?
The idea that you could be browsing puppies on flickr, and perhaps you could extract some of del.icio.us’s puppy-tagged links.
Was Technorati doing their pages yet that show items tagged by several different systems?
At that point, I don’t believe so. We got a few of our other friends involved, including the venerable Leonard Lin. Greg included Leonard Richardson on the email that he sent out that night by mistake, so we got some of his feedback too.
So when did it turn into a plan to actually do something?
Well, first it turned into a wiki.
Naturally…
I started off in the direction of creating a PHP class that would implement a standardized XML-RPC or REST communication layer. Greg was more of a proponent of the actual standard to be implemented by that layer.
At that point, we all got busy and it sat for a couple of months.
During another lunchtime conversation, I came up with the idea for eatlunch.at and made it that weekend.
I wanted to use it as a testbed so I could play with tagging, so instead of building it into the whole site, I made the tagging system generic.
One thing that interests me is the enabling or catalysing idea of not just pumping out yet another site or application but instead producing a plug-in that can be distributed across a whole class of projects.
It seems altruistic in the sense of it’s not yet another system trying to collect my contact info, but on the other hand, I’m surprised people don’t modularize like that more often.
Yeah, that’s absolutely very interesting – I wrote a post not too long ago about how I’m interested in the strange inversion of privacy preferences that we subject ourselves to on social services.
Especially public ones like del.icio.us.
We really wanted to enable cross-communication between sites, because it seemed like such a no-brainer once we started talking about it. Typically, when you’re dealing with hierarchies, every site dev has their own view of the world, and things don’t match too well. With freetagging (the term used back then), it doesn’t really matter, because the classification systems emerge from the utility of the application and data.
It’s interesting how tagging is emerging as a kind of meta-glue for the web (if it is – still not sure).
It’s interesting that tag clouds (and now del.icio.us’s recommended tags) are enforcing community standards for popular tags, because with a distributed system, you’d have that not only on a single site, but you could implement that across a wide range of sites.
There’s a tension there – still not clear where it’s going, but it’s fun to watch it emerge (or in your case, i suppose, help move it along). So, the wiki hosted the debate about how to implement or at what conceptual level to implement the idea?
Yes, it might actually still be around, too. It’s hard to say, because we all worked on it for about a week before getting too busy to do anything about it. It was mostly planning and RFC-style note-taking. It was a lot of design work, no coding involved.
Not even pseudocode?
Well, I guess it depends on your definition of that. I think there was some standard communication XML-RPC samples that were flying around, and there was also some API specs that I wrote up.
so did you just sit down and hack out the first version next?
I actually wrote it the same weekend as I wrote eatlunch.at’s core code. It was pretty crummy at first – had some serious issues with special chars, and just ignored quoted tags entirely, among other problems. But the core was there – the schema and a basic API.
Luckily, i’d been practicing with generalized module development through work. I owe Mike Benoit of phpGACL thanks for helping teach me generalized module style in PHP.
phpGACL is a generalized access control lists module that fits into PHP-MySQL apps. It’s an excellent module for anyone to start with. It’s pretty well separated and very generalized. I’d recommend looking at both that and Freetag, because each does things well in a different way. (I get nerdy when I talk about this stuff, so feel free to let me know if I go too far.)
OK, so was implementing it in Upcoming the next test case after eatlunch.at?
Yes, when Andy asked me if I’d like to help with Upcoming, I was chomping at the bit to implement Freetag and see how well it worked. I implemented the core Freetag API in Upcoming in about an hour and a half.
I had event tagging, listing of tags, and tag clouds all done within that timespan.
It made me really implement the trickier things about writing a tagging system, because Andy’s got such a big user base, I can’t get away with being lazy about certain bugs.
Specifically what did you have to nail down?
I really ended up polishing it up to support quoted tags, better ordering and limits on each API function, and normalization. I also had to rewrite the core to separate raw tags and normalized tags, because Andy wanted it to work like Flickr. But that wasn’t too hard once I understood what it meant.
When developing a generalized API, it’s important to provide as many parameters as possible to your core calls – such as offsets, limits, sort order, and sort direction.
So a limit on each API function in that sense means what exactly?
Such as, show me only 5 tags at once, and start 10 tags down in the list. In that case, 5 is the limit, and 10 is the offset.
I understand normalization in a database context but what does it mean when you talk about normalized tags?
It’s a tricky topic – if you look at flickr and upcoming, here’s what we do when someone tags something as “John’s First Movie!” We take that, and normalize it by removing any non-allowed characters, then we lowercase it. Then we store that as an independent tag in Upcoming.
I’m not sure how Flickr does theirs, but in each case, if you’re not the creator of that tag, you’ll see “johnsfirstmovie”. If you’re the actual creator, theoretically you wanted it to be “John’s First Movie,” at least so you can find it again later. So we keep that as a raw tag.
Unfortunately, FreeTag doesn’t go completely normalized between raw and normalized tags, for performance reasons. So it’s not perfectly normalized, but it’s close.
I adjust most of the API functions to handle that so you don’t get duplicates, but that’s a bit technical, you probably don’t need to worry about that.
Sadly, Delicious doesn’t do that, so I have tags there called “foo and bar”
One of my recent Freetag releases implemented a feature where you can pass in all of your configuration parameters to the constructor of the class. That means you don’t have to go in and edit config files each time you upgrade.
One of the cool things that lets you do is keep around your custom valid characters pattern, so you can pick your normalization scheme for yourself.
That lets you keep dashes, underscores, spaces, or even high ascii (for internationalized sites) in the normalized format, if you want it.
I wonder if the web helps force you to plan ahead that way, as it is such a moving target of an environment. It’s almost never a good idea to nail things down too literally.
It’s one of the biggest challenges of developing a generalized module like Freetag. You really need to think ahead and make sure that it’s as generic as possible, so that people don’t have to hack into it themselves and potentially lose their modifications every time they want to upgrade.
It’s all so meta-
Yeah, it’s definitely pretty meta and kinda hard. I have a newfound respect for open source software maintainers.
Has the Upcoming user base given any feedback to you or Andy?
Yes, they actually ended up filing a bug about the tag normalization on the wiki. I ended up explaining it, and they moved it to its own page.
Meaning they thought the feature was a bug?
Yes, that’s what happened. I know that a lot of people really liked the contributions I made to Upcoming, just based upon the press when we released.
So that is a bit of intelligence into what people expect and what confuses them (I’m thinking like a UI/IA guy now).
Hehe, yeah, it confuses people when their perspective doesn’t match that of others. But I think you’ll see that more and more on the web, especially as sites get more complex.
Yeah, for sure. User-experience is a series of tradeoffs. It’s easy to stand off to one side and say it should be optimized for users just like oneself.
The other major things I’ve worked on with Upcoming have been the REST-like API, and the invite feature.
REST-like, does that mean not 100% RESTful?
Hah, I’m specifically using that word, because I know guys who bring up all the time that our API isn’t fully RESTian. AFAIK, there are very few fully RESTful web applications out there that are popular.
Everyone makes tradeoffs – like what happened with Backpack and their $_GET and google web accel fiasco.
Yeah, fundamentalism is never pretty.
I made sure to use $_POST instead on the state-changing calls, which turned out to be the right move. However, I didn’t design with the verb/noun aspect of REST, so I hear that all the time.
People are always mailing in, who don’t understand POST. It’s hard, because everyone understands how to construct a url and make a GET request.
So as far as making an easy platform for beginners to write apps upon, GET is probably the way to go.
In the beginning, it was written, that the HTTP should have four verbs, and Tim Berners-Lee saw that it was good.
Yes, but not even cURL implements DELETE. That’s why I don’t fix that bug.
Yeah, I think I’d be wary of using DELETE outside of a totally secure web app environment, and even then I’d have second thoughts.
well, I overload POST to DELETE for me, but you’ve got to authenticate, etc. But its’ a tricky subject, and I figure by saying REST-like instead of RESTful, I kind of avoid it.
REST-esque
That’s a good one.
It is interesting that you need to think about these things when you’re developing for such a wide potential base.
Yeah, it’s a lot more challenging, because I really want to do things the right way. That’s why i’m lucky to get emails from people smarter than me, telling me how to do things better.
Ok, so have there been any other (significant) implementations yet? I imagine that Upcoming really promoted the hell out of FreeTag, relatively speaking.
A few pretty cool ones – Blogskins implemented it over on their site really quickly too.
I’ve gotten some emails from people planning on using it, and when those go public I’ll be sure to announce it on the mailing list.
It could really speed up adoption of tagging.
OK, let’s take one step back and let me ask you where you think all this tagging is leading us, with the cross-platform tagging idea or maybe other things (that i can’t really imagine, yet) that might be built on top of a heavily tagged web.
Well, I think we’ll start to see tagging systems interoperate once the first person gets out the gate in implementing a tag communication standard. Maybe that will be me, I’m not sure.
But once that happens, I think we’ll see convergence on a wider scale into a really interesting set of tags.
What will that enable beyond the obvious ability to tag more than one kind of thing with the same gesture?
Really freakin big tag clouds.
I’m being a little facetious, but that is actually where you might see things go.
If you’ve ever seen Flittr, it kind of consolidates tagging systems in a one-off way, taking one tag and finding samples in different systems. It’s just kind of slow, unfortunately.
I’ll check it out – sounds interesting at least as a proof of concept.
I personally don’t have time to do this right now, but it would be awesome to have a tag thunderstorm, where you can browse a global tag cloud aggregated from many sites, and then dig down into individual ones.
That does sound pretty cool! But don’t we already have problems with tag clouds (scaling, imposing norms on people vs. harnessing self-interest…)?
I don’t really mind tag clouds that much. In my API, the function that generates one is called silly_list.
Well, they are sort of a stab at the kinds of interfaces we’ve been waiting for for 20 years or so, with an almost 3-D sense of space, relative importance, closeness, etc.
Yeah, totally. I think sometimes it’s just popular to be contrarian.
I don’t think we’ll see the death of hierarchy anytime soon.
You just have to look at how hard it is sometime to dig data out of niche wikis.
When there aren’t that many people tagging a set of stuff, it’s not really that useful.
Do you think folder-like hierarchies and free-tagging complement each other well?
Absolutely. Both are useful – in some ways, it’s kind of the opposition between Google and Yahoo.
I think tag systems are just the collapsed leaves of individual categorization trees, right? That’s totally my nutshell view of what’s going on.
Sure, in a sense, and they do overlapping well without a lot of either duplication or aliasing.
You’re basically flattening then merging personal hierarchies.
Well this is a lot for me to chew on. Thanks for taking the time out to talk to me.
Thanks for asking me to talk about it!
My pleasure, and we can thank Andy for suggesting it too. I’ll be keeping an eye on your stuff, I’m sure.
Sounds great. It was a lot of fun talking about it, and I’ll look forward to seeing what comes from it!
…and, scene.
Gordon, I apologize for taking so long on this. In the end I figured the conversation works better than any sort of “article” I could have turned it into.
Panelists:
I’m going to try to group the comments into subject areas. Let’s see how well that works.
Tags going mainstream
Don Turnbull:
Who’d have thought we’d be talking about metadata on a beautiful Sunday morning in Austin?
Is tagging the key element of Web 2.0? (Probably not.) The ETech definition: Web 1.0 was the read-only web. Web 2.0 is the read-write web.
Thomas Vander Wal:
I coined the word folksonomy… and the correct definition wasn’t given in the beyond folksonomy panel.
People used to tag on the command line. Web 1.0 tagging didn’t work. Tools like Bitsy. Cory’s “metacrap” article. Web 2.0: delicious and flickr, actually useful for finding and re-finding information.
More than 40 sites are doing social bookmarking.
60 to 70 sites using tagging as their main way to bring people in. (7 travel sites, for example, using tagging as their appeal.) More than 200 services have included tagging (Amazon).
What are tags useful for?
Don Turnbull:
Are these systems useful beyond a few types of tasks or categories of information?
- Re-finding information
- Creating personal metadata
- The new command line (quicker than drag/drop, sort, click)
- Gateway to the next PIM?
- Tags as verbs (”buy,” “sell”), expanding the vocabulary (ratings: “*,” “**,” “***” etc.)
- People-centric view of data, vs. system-centric.
- Good for keeping track of things you already know about, but what about discovery?
- It’s more interesting to find a like mind than just a resource
Adina Levin:
Tagging is social, helpful to the individual and increasingly valuable to the group.
Tag games (Flickr came from the game design world), example of red and green game leading to joining the Japanese Maple group, aircraft spotters.
Jon Udell’s InfoWorld Explorer tool crawl’s delicious and aggregrates InfoWorld articles by genre, author, date, tags, title
Why is Tagging better than Categorization?
Rashmi Sinha:
I’m going to be a cheerleader for tagging
When categorizing, we choose between multiple concepts. Tagging is easier. Joshua Schachter in his infinite wisdom figured out you can just write down what comes to mind. Note all concepts instead of choosing one and invoking a hierarchy.
Better than any other social system on the web, tagging approximates the wisdom of crowds:
- cognitive diversity
- independence
- decentralization
- easy aggregation
The moment of tagging is you and that object alone (but – I interject in my mind – what about delicious’s “recommendations”? – isn’t that influence from the crowd?).
Social formations supported by tags
- ad hoc groups
- lots of weak social ties
- conceptually mediated ties
Flaws, Issues, Usability
Don Turnbull:
Are these systems usable beyond alpha geeks?
- Interface improvements: Good import? Teach vocabulary? Make re-finding information easier.
- Tag clouds probably not the answer
- Spamming, gaming, TagFraud
- Tagging is implicit (good and bad)
- Not all resources are as identifiable (microcontent?)… granular, web pages; items, commerical products
- Tags as identity (how so? i-tags?)
Vander Wal:
- “Re-findability sucks… We need to fix the re-findability problem.”
- Looks messy to others.
- No identity in Flickr. (Example: can’t see the 40 things Don has tagged with “orange”)
- Folksonomy triad (one person), dual folksonomy triad (including community) – really need slide to illustrate
- Context often missing, it gets messy, we have silos
Prentiss Riddle:
Six dirty secrets of tagging
- It’s the content stupid
- Ordinary people don’t get tags (text box prompt gets a sentence response or maybe a Google search) and tag clouds
- It’s the UX, stupid – flickr guides you
-
Tags don’t play well with others (interop)
- Character sets
- Delimiter wars (commas, spaces, etc.)
- Synonyms (singular vs. plural, homonyms)
- aggregration, portability
- Rich functionality requires rich metadata (where’s my flying car? I wouldn’t want to use them for medical applications, managing money, hunting terrorists)
- Nobody wants “real tags” (simple keyword metadata, no control, no hierarchy, no syntax or semantics, minimal cognitive effort by the user). What people really want is “tagginess” (Stephen Colbert image)… delicious for:username, Shadows @group, geotagging, consensus tagging (sxsw2006, chosendarkness), hierarchical tagging (history.us.wwii, history.wwii.us)… it’s the oppostie of tagging
Faceted tagging: Mefeedia (by place, by content, etc.), tagginess.com is available for sale.
Adina Levin:
Tags are messy (blog, blogging, blogs) in tag clouds, compound words
Tag refactoring: consolidate synonyms, fix and standardize spelling, add hierarchy
but…
Don’t make me think, loss of tag snark, loses “bottom-up” purity, a hybrid of top-down and the group mind
Rashmi Sinha:
Tips for tag designers
- How are you serving the individual motive
- does the individual understand and want to fulfill that goal
- What is the relationship between social and perosnal
- Is it too easy to mimic the tags of others
- Is finding all about the most popular, most tagged?
- Enable discovery, exploration, finding new things
- Don’t force users to do things differnetly than what come snaturally
- Solve problems by ensuring good finability
Questions
Q: How to deal with Tag spam, tag fraud?
Thomas Vander Wal: blacklists, another reason why you need to see who tagged it and what object was tagged.
Question: How to work with synonyms and homonyms
Prentiss: Clustering at Flickr works well because they have so much rich metadata available to mine.
Adina Levin: I like delicious’s suggestions
Rashmi Sinha: In input let the user do what they want. In the findability stage deal with the problem.
Technorati tags: sxsw2006, sxsw (in case Technorati’s not picking up our native tags)
This was a big year for tags. You could even say that tags went mainstream in 2005 (if tags were a band, they’d be The Killers). So, given we’re at the end of 2005, I thought I would take a look back at the major announcements and events in the world of ad hoc, user-created metadata.
These are the events I thought were important (feel free to add your own in the comments):
Technorati introduces tags (January). Technorati’s tags was the first implementation of distributed tagging (i.e. T’rati doesn’t own the tags the way Flickr does–it picks them up from blog posts while it’s crawling). It’s been criticized, but it’s widely used.
Folksonomies at the IA Summit (March). Tagging was one of the hot topics at the Information Architecture Summit in Montreal, and we kicked off the discussion with a pretty good panel (full disclosure: I moderated the panel).
Ontology is Overrated (March). Clay Shirky’s provocative talk at Etech predicted that the rise of tagging meant the death of hierarchy. Or something like that. A bit too dogmatic for me, but it was received like a papal bull and produced some interesting critiques and counter-critiques. (To give credit where it’s due, this blog wouldn’t exist but for Clay’s presentation.)
Yahoo buys Flickr (March). The first big acquisition of what some people call the “Web 2.0 era” (I call it “Ned”).
Yahoo launches My Web 2.0 (June). Yahoo integrated search, social networks and tags in its My Web 2.0 product. Some people noted the product’s lacklustre growth, which perhaps set the stage for another socsoft/web 2.0 acquisition by Yahoo at the end year.
Flickr adds interestingness and clustering (August). This was a big one for me–it proved that with some algorithmic mojo tags could act more like categories, separating like things from unlike things, wholes from parts and so on. It can even tell dog noses from cat noses.
Hurricane Katrina (September). Tags not only helped keep people connected during the aftermath of Hurricane Katrina, they tethered together many disparate pieces of content so we could make sense of complex, evolving, intertwingled events. (Or try to make sense of them, anyway.)
Google adds tagging. Kind of. (October) As part of its search history feature, Google allows tagging of pages. The tags are private, and the feature seems peripheral. To paraphrase Elvis Mitchell, this one would have to work a lot harder to earn the appellation uninspired.
Amazon launches tags (November). Amazon now lets you tag books. I wasn’t all that impressed with their implementation–their product pages seem cluttered–but Amazon has always been smart about social IA and user-created content. It’ll be interesting to see how they leverage their tag data. (SIPs are cool, too.)
Everyone must have tags! (November) Anil Dash points out that tags–like Ajax and Ruby on Rails–are becoming web 2.0 cliches.
Google Base launches with tagging. Kind of. (November) Google uses the term labels, but we know them as tags. Search Engine Watch didn’t think much of Google Base’s tagging, but it’s just one of several kinds of classification used on that product. Key takeaway: Tagging is a means, not an end.
Tag formats: Can’t we all just get along? (December) Matt from 37 signals points out the multiple variations of tagging UIs. The big question: “Will all these different formats still be around a year from now or will a standard emerge?”
Yahoo buys del.icio.us (December). The rumoured price was around $30 million, which has Yahoo spending about $100 for each of the 300,000 or so del.icio.us users. The interesting question is how does tagging behaviour figure into the price? I think (and this is purely speculation) Yahoo paid for millions of tagged URLs plus a community of active taggers, both of which promise to boost the relevance of search results.
Folksonomy makes the NY Times Magazine “Year in Ideas” list (Decemember). Despite its humble origins and many doubters (I’m looking at you Morville), folksonomy is named one of 2005’s best ideas by the New York Times Magazine.
Back in April, Tim Bray asked some important questions:
Are tags useful? Are there any questions you want to ask, or jobs you want to do, where tags are part of the solution, and clearly work better than old-fashioned search? I really want to believe that tagging is big, a game-changer, but the longer I go on asking this question and not getting an answer, the more nervous I get.
2005 has proven that tags are both big (in the financial sense) and useful. Whether or not tagging is a game-changer will, I think, depend on what Yahoo, Amazon and Google do with tags in 2006. But with three big players in the tagging game there’s a lot of opporunity for innovation.