April 28, 2006

Siderean’s tagged facets

Siderean, one of the interesting faceted classification companies, has announced some new capabilities that aim at automating the generation of metadata and that integrate tagging with facets.

The automation comes from entity extraction tools (plus the ability to integrate third party tools, because, frankly, Siderean is not in the entity extraction business) that isolate names of people, places, organizations, dates, etc. from a collection of pages. This addresses one of the real inhibitors of the use of faceted classification: The data has to already be well structured and well tagged. That makes it great for browsing databases but not as good for browsing big piles of unstructured data (= documents).

The system integrates tags in a useful way. Users can tag items and then use tags to further specify searches through the faceted interface. In fact, the tags can be “bucketed” and treated as facets. The tags can be marked as personal or public, and can be associated with groups and other contexts. Yes, the system does integrate with del.icio.us. (Siderean fooled around with this in a beta project called — wonderfully — fac.etio.us.

Siderean also announced that it’s now using the faceted information to drive analytics. This is really “just” another way of displaying the faceted information. But it can be quite useful because a faceted system has so much data built into it. For example, a library system might know that (and this is a made-up example) there were fifteen times as many books about Iraq published in the past two years than in the past twenty; it has to know this if it’s going to let users browse for books by subject and then by year (or vice versa). Siderean’s analytics offering follows that of Endeca.

Faceted classification is young. It’s exciting watching imaginative companies like Siderean invent new twists and turns right under our eyes.

April 26, 2006

Tagging a Jane Jacobs memorial

I wrote a post yesterday about Jane Jacobs’ legacy for online community, and concluded with a proposal for a tag-based memorial:

These still-early days of online community-building amount to explorations of the potential that Jacobs identified: the potential for supporting real human relationships with virtual ecosystems. And in a wonderful tribute to Jacobs’ continued influence, many of these experiments feed back into twenty-first century cities by providing new tools for supporting urban sustainability. I’ve bookmarked a few of my favorite examples on del.icio.us with the tag JaneJacobsArchive; I hope others will contribute their own examples of how the Internet can support the kinds of cities that Jacobs so eloquently advocated.

This idea of a tag-based memorial is kind of a online, populist equivalent to a festschrift — the academic tradition of compiling a volume of articles by different scholars in honor of a colleague’s retirement or passing. Has anyone come across any other examples of tag-based memorials or tributes? I’d be really curious to see others, and to find out what worked (or didn’t) in terms of getting folks to contribute.

April 23, 2006

Impure folksonomies for retailers

Dan Klyn has some practical suggestions for retailers thinking about letting users tag merchandise. Why not pre-populate your catalog with tags drawn from the item descriptions? Why not rank tags higher based on the popularity of the page or item? What do you do about a product that’s tagged “crappy” or “over-priced”? (I think Dan’s answer that last one is that you surface tags based in part on how popular they are.) The result is not a pure folksonomy, but purity isn’t always what we — merchants and shoppers — need.

He also points to Etsy.com as an example of a merchant using tags well.

April 3, 2006

Interview with Gordon Luk (FreeTag)

Nearly ten months ago, at the suggestion of Andy Baio I interviewed Gordon Luk (via IM) about FreeTag, an “Open Source Tagging / Folksonomy module for PHP/MySQL applications” he originally created for Upcoming and announced almost a year ago in his blog.

In the meantime I’ve continually intended to edit the chat transcript into a coherent article a post it here. Unfortunately, a strange thing called “life” has intruded. Then, I ran into Andy in Austin at South By Southwest and my embarrassment over sitting on this dialogue returned to the surface, kicking the to-do back to the top of my list.

I started thinking I should touch base with Gordon again, and find out who else has adopted FreeTag lately and any other news updates or developments but then I realized this was just another form of procrastination. What the web wants me to do is post what I’ve got and then Gordon or anyone else can comment on it, or correct it, or update it, and so on.

So, without further ado, here is my interview with Gordon Luk:

xian: Can you tell me how you got the idea for freetag?

Gordon: Sure! It starts with a discussion of who I eat lunch with, actually. I am lucky enough to work with some really smart guys - among them, Andy Baio, Phil Fibiger, Greg Knauss, Christian Newton, and Jason Stuck.

We got to talking about tagging when the term folksonomy was coined.

I can’t remember exactly who had the idea, but we started discussing cross-site interactions between tags on different platforms.

In what sense?

The idea that you could be browsing puppies on flickr, and perhaps you could extract some of del.icio.us’s puppy-tagged links.

Was Technorati doing their pages yet that show items tagged by several different systems?

At that point, I don’t believe so. We got a few of our other friends involved, including the venerable Leonard Lin. Greg included Leonard Richardson on the email that he sent out that night by mistake, so we got some of his feedback too.

So when did it turn into a plan to actually do something?

Well, first it turned into a wiki.

Naturally…

I started off in the direction of creating a PHP class that would implement a standardized XML-RPC or REST communication layer. Greg was more of a proponent of the actual standard to be implemented by that layer.

At that point, we all got busy and it sat for a couple of months.

During another lunchtime conversation, I came up with the idea for eatlunch.at and made it that weekend.

I wanted to use it as a testbed so I could play with tagging, so instead of building it into the whole site, I made the tagging system generic.

One thing that interests me is the enabling or catalysing idea of not just pumping out yet another site or application but instead producing a plug-in that can be distributed across a whole class of projects.

It seems altruistic in the sense of it’s not yet another system trying to collect my contact info, but on the other hand, I’m surprised people don’t modularize like that more often.

Yeah, that’s absolutely very interesting - I wrote a post not too long ago about how I’m interested in the strange inversion of privacy preferences that we subject ourselves to on social services.

Especially public ones like del.icio.us.

We really wanted to enable cross-communication between sites, because it seemed like such a no-brainer once we started talking about it. Typically, when you’re dealing with hierarchies, every site dev has their own view of the world, and things don’t match too well. With freetagging (the term used back then), it doesn’t really matter, because the classification systems emerge from the utility of the application and data.

It’s interesting how tagging is emerging as a kind of meta-glue for the web (if it is - still not sure).

It’s interesting that tag clouds (and now del.icio.us’s recommended tags) are enforcing community standards for popular tags, because with a distributed system, you’d have that not only on a single site, but you could implement that across a wide range of sites.

There’s a tension there - still not clear where it’s going, but it’s fun to watch it emerge (or in your case, i suppose, help move it along). So, the wiki hosted the debate about how to implement or at what conceptual level to implement the idea?

Yes, it might actually still be around, too. It’s hard to say, because we all worked on it for about a week before getting too busy to do anything about it. It was mostly planning and RFC-style note-taking. It was a lot of design work, no coding involved.

Not even pseudocode?

Well, I guess it depends on your definition of that. I think there was some standard communication XML-RPC samples that were flying around, and there was also some API specs that I wrote up.

so did you just sit down and hack out the first version next?

I actually wrote it the same weekend as I wrote eatlunch.at’s core code. It was pretty crummy at first - had some serious issues with special chars, and just ignored quoted tags entirely, among other problems. But the core was there - the schema and a basic API.

Luckily, i’d been practicing with generalized module development through work. I owe Mike Benoit of phpGACL thanks for helping teach me generalized module style in PHP.

phpGACL is a generalized access control lists module that fits into PHP-MySQL apps. It’s an excellent module for anyone to start with. It’s pretty well separated and very generalized. I’d recommend looking at both that and Freetag, because each does things well in a different way. (I get nerdy when I talk about this stuff, so feel free to let me know if I go too far.)

OK, so was implementing it in Upcoming the next test case after eatlunch.at?

Yes, when Andy asked me if I’d like to help with Upcoming, I was chomping at the bit to implement Freetag and see how well it worked. I implemented the core Freetag API in Upcoming in about an hour and a half.

I had event tagging, listing of tags, and tag clouds all done within that timespan.

It made me really implement the trickier things about writing a tagging system, because Andy’s got such a big user base, I can’t get away with being lazy about certain bugs.

Specifically what did you have to nail down?

I really ended up polishing it up to support quoted tags, better ordering and limits on each API function, and normalization. I also had to rewrite the core to separate raw tags and normalized tags, because Andy wanted it to work like Flickr. But that wasn’t too hard once I understood what it meant.

When developing a generalized API, it’s important to provide as many parameters as possible to your core calls - such as offsets, limits, sort order, and sort direction.

So a limit on each API function in that sense means what exactly?

Such as, show me only 5 tags at once, and start 10 tags down in the list. In that case, 5 is the limit, and 10 is the offset.

I understand normalization in a database context but what does it mean when you talk about normalized tags?

It’s a tricky topic - if you look at flickr and upcoming, here’s what we do when someone tags something as “John’s First Movie!” We take that, and normalize it by removing any non-allowed characters, then we lowercase it. Then we store that as an independent tag in Upcoming.

I’m not sure how Flickr does theirs, but in each case, if you’re not the creator of that tag, you’ll see “johnsfirstmovie”. If you’re the actual creator, theoretically you wanted it to be “John’s First Movie,” at least so you can find it again later. So we keep that as a raw tag.

Unfortunately, FreeTag doesn’t go completely normalized between raw and normalized tags, for performance reasons. So it’s not perfectly normalized, but it’s close.

I adjust most of the API functions to handle that so you don’t get duplicates, but that’s a bit technical, you probably don’t need to worry about that.

Sadly, Delicious doesn’t do that, so I have tags there called “foo and bar”

One of my recent Freetag releases implemented a feature where you can pass in all of your configuration parameters to the constructor of the class. That means you don’t have to go in and edit config files each time you upgrade.

One of the cool things that lets you do is keep around your custom valid characters pattern, so you can pick your normalization scheme for yourself.

That lets you keep dashes, underscores, spaces, or even high ascii (for internationalized sites) in the normalized format, if you want it.

I wonder if the web helps force you to plan ahead that way, as it is such a moving target of an environment. It’s almost never a good idea to nail things down too literally.

It’s one of the biggest challenges of developing a generalized module like Freetag. You really need to think ahead and make sure that it’s as generic as possible, so that people don’t have to hack into it themselves and potentially lose their modifications every time they want to upgrade.

It’s all so meta-

Yeah, it’s definitely pretty meta and kinda hard. I have a newfound respect for open source software maintainers.

Has the Upcoming user base given any feedback to you or Andy?

Yes, they actually ended up filing a bug about the tag normalization on the wiki. I ended up explaining it, and they moved it to its own page.

Meaning they thought the feature was a bug?

Yes, that’s what happened. I know that a lot of people really liked the contributions I made to Upcoming, just based upon the press when we released.

So that is a bit of intelligence into what people expect and what confuses them (I’m thinking like a UI/IA guy now).

Hehe, yeah, it confuses people when their perspective doesn’t match that of others. But I think you’ll see that more and more on the web, especially as sites get more complex.

Yeah, for sure. User-experience is a series of tradeoffs. It’s easy to stand off to one side and say it should be optimized for users just like oneself.

The other major things I’ve worked on with Upcoming have been the REST-like API, and the invite feature.

REST-like, does that mean not 100% RESTful?

Hah, I’m specifically using that word, because I know guys who bring up all the time that our API isn’t fully RESTian. AFAIK, there are very few fully RESTful web applications out there that are popular.

Everyone makes tradeoffs - like what happened with Backpack and their $_GET and google web accel fiasco.

Yeah, fundamentalism is never pretty.

I made sure to use $_POST instead on the state-changing calls, which turned out to be the right move. However, I didn’t design with the verb/noun aspect of REST, so I hear that all the time.

People are always mailing in, who don’t understand POST. It’s hard, because everyone understands how to construct a url and make a GET request.

So as far as making an easy platform for beginners to write apps upon, GET is probably the way to go.

In the beginning, it was written, that the HTTP should have four verbs, and Tim Berners-Lee saw that it was good.

Yes, but not even cURL implements DELETE. That’s why I don’t fix that bug.

Yeah, I think I’d be wary of using DELETE outside of a totally secure web app environment, and even then I’d have second thoughts.

well, I overload POST to DELETE for me, but you’ve got to authenticate, etc. But its’ a tricky subject, and I figure by saying REST-like instead of RESTful, I kind of avoid it.

REST-esque

That’s a good one.

It is interesting that you need to think about these things when you’re developing for such a wide potential base.

Yeah, it’s a lot more challenging, because I really want to do things the right way. That’s why i’m lucky to get emails from people smarter than me, telling me how to do things better.

Ok, so have there been any other (significant) implementations yet? I imagine that Upcoming really promoted the hell out of FreeTag, relatively speaking.

A few pretty cool ones - Blogskins implemented it over on their site really quickly too.

I’ve gotten some emails from people planning on using it, and when those go public I’ll be sure to announce it on the mailing list.

It could really speed up adoption of tagging.

OK, let’s take one step back and let me ask you where you think all this tagging is leading us, with the cross-platform tagging idea or maybe other things (that i can’t really imagine, yet) that might be built on top of a heavily tagged web.

Well, I think we’ll start to see tagging systems interoperate once the first person gets out the gate in implementing a tag communication standard. Maybe that will be me, I’m not sure.

But once that happens, I think we’ll see convergence on a wider scale into a really interesting set of tags.

What will that enable beyond the obvious ability to tag more than one kind of thing with the same gesture?

Really freakin big tag clouds.

I’m being a little facetious, but that is actually where you might see things go.

If you’ve ever seen Flittr, it kind of consolidates tagging systems in a one-off way, taking one tag and finding samples in different systems. It’s just kind of slow, unfortunately.

I’ll check it out - sounds interesting at least as a proof of concept.

I personally don’t have time to do this right now, but it would be awesome to have a tag thunderstorm, where you can browse a global tag cloud aggregated from many sites, and then dig down into individual ones.

That does sound pretty cool! But don’t we already have problems with tag clouds (scaling, imposing norms on people vs. harnessing self-interest…)?

I don’t really mind tag clouds that much. In my API, the function that generates one is called silly_list.

Well, they are sort of a stab at the kinds of interfaces we’ve been waiting for for 20 years or so, with an almost 3-D sense of space, relative importance, closeness, etc.

Yeah, totally. I think sometimes it’s just popular to be contrarian.

I don’t think we’ll see the death of hierarchy anytime soon.

You just have to look at how hard it is sometime to dig data out of niche wikis.

When there aren’t that many people tagging a set of stuff, it’s not really that useful.

Do you think folder-like hierarchies and free-tagging complement each other well?

Absolutely. Both are useful - in some ways, it’s kind of the opposition between Google and Yahoo.

I think tag systems are just the collapsed leaves of individual categorization trees, right? That’s totally my nutshell view of what’s going on.

Sure, in a sense, and they do overlapping well without a lot of either duplication or aliasing.

You’re basically flattening then merging personal hierarchies.

Well this is a lot for me to chew on. Thanks for taking the time out to talk to me.

Thanks for asking me to talk about it!

My pleasure, and we can thank Andy for suggesting it too. I’ll be keeping an eye on your stuff, I’m sure.

Sounds great. It was a lot of fun talking about it, and I’ll look forward to seeing what comes from it!

…and, scene.

Gordon, I apologize for taking so long on this. In the end I figured the conversation works better than any sort of “article” I could have turned it into.