The Noisy Channel


Blog Analysis Tools: Not Quite Ready for Prime Time

November 24th, 2008 · 9 Comments · General

Today, I heard about two blog analysis tools. Ever the empiricist, I decided to try them out.

Let’s start with GenderAnalyzer, which claims it can determine the gender of a blog author.

Well, it thinks I’m probably male:

That’s close enough to spare me any major gender identity issues. Well, maybe. Let’s look at a few blogs that are written by women:

Perhaps GenderAnalyzer doesn’t appreciate women in science and technology. Or perhaps it isn’t doing much better than random.

On to Typealyzer, which claims to perform a Myers-Briggs Type Indicator (MBTI) personality test on a blog. Let’s set aside the unscientifc basis of the theory and try it out.

ISTJ, huh. I realize that some of you have never met me, but I assure you than I am an off-the-chart extrovert. If these personality tests have any merit, I’m somewhere between an ENTJ and an ENFJ. And, while Typealizer has a disclaimer that “writing style on a blog may have little or nothing to do with a person´s self-percieved personality”, I assure you that the personality of my blog accurately reflects the personality of its author.

So I wouldn’t put too much stock in blog analysis tools. Perhaps these two aren’t the best examples of the genre. But for now I’d suggest they be used for entertainment purposes only.

9 responses so far ↓

  • 1 Breanne // Nov 26, 2008 at 12:07 am

    Those sites are total junk. I wrote a blog specifically about how bad Typealyzer is:

  • 2 Daniel Tunkelang // Nov 25, 2008 at 11:22 pm

    Well, I see that you do agree they’re good for entertainment. To be clear I was inspired to check these sites out because there is a lot of discussion about applying text mining to blogs. And some serious work too, e.g.,

  • 3 Daniel Tunkelang // Nov 25, 2008 at 11:23 pm

    BTW: Breanne’s comment was posted at 11:07pm. I finally adjusted my blog to account for the time change a few weeks ago, but time stamps aren’t adjusted retroactively.

  • 4 Jason Adams // Nov 26, 2008 at 8:55 am

    I come down on the side of fun for this kind of thing. Yeah most of these things are unscientific and are the intellectual equivalent of peanut brittle blah blah blah, but it amuses me to speculate on just how they are coming to their conclusions. In the case of Typealyzer, it also pointed me to uClassify, which has been interesting to play with. Win-win. And I just like the idea of a classifier sitting in the cloud that you can call through an API.

    Btw, my blog came up as ISTP, though I consistently test as INTP. Not too far off! I wonder if they do 16-way classification or if they combine 4 classifiers — one for each attribute. Or something else. Seems like the combine of 4 would be better for dealing with sparsity.

  • 5 Daniel Tunkelang // Nov 26, 2008 at 9:15 am

    Hey, if you want a free personality test, there’s always the Scientologists. Though the E-meter costs extra.

  • 6 Jason Adams // Nov 26, 2008 at 9:34 am

    Touché! 🙂 Those come with a little more extra baggage than I’m prepared to deal with. No free lunch and all..

  • 7 Daniel Tunkelang // Nov 26, 2008 at 9:40 am

    Are you suggesting I should remove remove Chief Scientologist from my business cards? 🙂

    Seriously, we’ve been doing a lot of work with clarity (a concept related to “query clarity”) at Endeca, so it’s been hard for me to resist the Scientology jokes. Xenu saves!

  • 8 Erica Naone // Dec 3, 2008 at 4:38 pm

    I’m always a little slow getting through the feed reader, but I just saw this. I tried GenderAnalyzer on my mostly blog (which is mostly about literature), and it declared it was highly likely the blog was written by a male — doesn’t appear to be just science and technology that confuses it. TypeAnalyzer was wildly off for me as well. I test INFJ, but my blog came up ISTP.

  • 9 Daniel Tunkelang // Dec 6, 2008 at 10:56 pm

    Come to think of it, I’ve never found a blog that GenderAnalyzer believes to be published by a female.

Clicky Web Analytics