Knowledge and Uncertainty: Classification

James Governor posts an interesting conclusion from a post from Joho the Blog.

Formal taxonomy is a function of scarcity: not of the resources being classified, but of the resources to classify them.

I think there is a further conclusion. The structure of the taxonomy reflects the structure of the social process that creates the taxonomy. So if you have a formal taxonomy produced by a single authority (such as the great Linnaeus himself), you can expect a hierarchy (for some of the reasons explained by Herbert Simon). If you have an informal tagsonomy produced by a heterogeneous network - surprise, surprise - you get a network.

Shifting attention from the end-product to the process often yields new insights. (It's the opposite of reification - I call it ratification.)

The process of classification is in two halves. One half is attaching tags - the other half is using them. There is no point optimizing one half (against a given landscape of available resources) if the other half is impossible or seriously flawed.

Where knowledge is scarce the first half is centralized. One great biologist creates the taxonomy, an elite of followers maintain it (with content additions, but no significant structural changes), and everyone else just uses it. Under certain conditions, using a hierarchical taxonomy is much the simplest and requires the least thought. (Everything YOU need to know is on a single page - if it's not on that page, YOU don't need to know it.)

On the Internet, there is loads of knowledge, but with a vast quality range. Countless classification schemes are invented, presumably most of them meaningful to their creators. The challenge is in using these schemes.

At the present state-of-the-art, I find the usefulness of other people's tags patchy. It helps if I can broadly identify the tagger - is this someone for whom the tag "CA" is likely to mean Computer Associates rather than California or Canada? No doubt the software boffins are working on some clever bit of software that can perform this kind of semantic analysis.

And with sufficiently clever software, all the semantic problems of knowledge and classification will be solved. The Google, the Whole Google, and Nothing But the Google. Igor Palmer and Kingsley Idehen point out some of the flaws of what they call "The World Wide Web of Junk". In my previous analysis on Google, I argued that Perfect Googling gets ever closer to what A.A. Milne called Thinking With the Majority.

Update: Fiona Leslie has just posted some interesting comments on the panlibus blog about tagging the taggers: Tag - You're It! She also reviews Yahoo's new social search. Meanwhile Jackson Miller is sceptical about the new HonorTags proposal.

Update [July 2006]: Seth Godin applauds the fact that the Internet "reorganizes the scattered threads of discourse, creating a few (instead of a million or a billion) reading lists" and says "this satisfies a basic human need ... to do what others are doing, to read what others are reading". But as A.A. Milne wrote, it is the third-class mind that is only happy when it is thinking with the majority. Surely Seth and his readers would not wish to be so characterized?

[RV posts on Google] [del.icio.us links on Google] [Technorati on Google]
Other Technorati Tags: classification knowledge tags taxonomy Yahoo

Knowledge and Uncertainty

Monday, June 20, 2005

Classification

1 comment:

About this blog

Labels (posts up to 2008)

Blog Archive (posts up to 2008)

About Me