Alerted by Jeff and Iadh, I recently read What Should Blog Search Look Like?, a position paper by Marti Hearst, Matt Hurst, and Sue Dumais. For those readers unfamiliar with this triumvirate, I suggest you take some time to read their work, as they are heavyweights in some of the areas most often covered by this blog.
The position paper suggests focusing on 3 three kinds of search tasks:
- Find out what are people thinking or feeling about X over time.
- Find good blogs/authors to read.
- Find useful information that was published in blogs sometime in the past.
The authors generally recommend the use of faceted navigation interfaces–something I’d hope would be uncontroversial by now for search in general.
But I’m more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data, and that their discussion, based on work by Mishne and de Rijke, that blog search queries differ substantially from web search queries. I don’t doubt the data they’ve collected, but I’m curious if their results account for the rapid proliferation and mainstreaming of blogs. The lines between blogs, news articles, and informational web pages seem increasingly blurred.
So I’d like to turn the question around: what should blog search look like that is not applicable to search in general?
11 replies on “Is Blog Search Different?”
Daniel — The task of blog distillation really does seem different than most traditional ad hoc tasks. In this task, we’re really retrieving individual authors or small groups of authors based on their discussion of a topic. In our SIGIR paper, we really looked closely at the blog distillation queries as compared to other ad hoc task queries, and found the blog search queries were on average shorter, much more general, and typically represented multifaceted information needs.This analysis was done on TREC data, not data collected from a real-world search engine, but I do think the distinctions are valid. The information needs of blog searchers are not equal to information needs of web searchers in general.
“I’m more struck by their criticism that existing blog search engines fail to leverage the special properties of blog data…”Isn’t this where the heavy lifting and hard work is? Instead of indexing a set of pages and determining number of links to/from, now we are asking what are the topics about, are they reliable (however that is defined), how many comments (what if comments are turned off), number of RSS subscribers, attitude of the blogger, tags (do they use tags or not), etc.Faceted search is valuable, however, it appears that there has only been limited work towards building out some of the facets needed for navigation.A follow-up question is if I want to find out about a topic, why shouldn’t I be able to have somewhat of a picture put together of both blogs and usual web sites.
Jon, I have no doubt that TREC tasks differ, but I’m interested in the real-world distinctions. And I grant that, today, blog searchers may be pursuing different information needs than web searchers in general. But I’d think the two are converging. I know for myself that I often am not sure whether to use web search or blog search to satisfy an information need.
David, point taken–I suppose blogs are the leading edge of a wholesale conversion of the web from unstructured to semi-structured content. I guess my question is whether all web pages are starting to look more like blog pages, as everything can have comments,tags, RSS feeds, etc.
How important is blog search, really? I’ve never picked up a new blog and said, “I wonder what this person says about X” and then read through their old posts. How often have others actually had that desire or need?
Finding new, interesting blogs is an interesting problem, but then I almost feel that something like clustering might be really interesting. “Show me 10 blogs that are similar to this blog or set of blogs that i read and enjoy.” That could be cool.
Ah, maybe this is just a misunderstanding. By blog search, I mean searching the blogosphere. Basically, all of the information seeking tasks associated with Web search, but restricted to blog posts. It’s analogous to news search, only that it’s the space of blogs rather than news articles. I agree that searching one person’s blog for specific information isn’t a particularly compelling use case.
If we take the position that blogs are becoming more and more similar to the rest of the web in structure (feeds, comments, tags, links, etc.), than the question becomes more of how can we become more transparent in how to retrieve the data rather than simply a black-box solution.
Maybe there are different criteria for searching blogs than there would be for the rest of the web. For example, number of external links, number of comments, number of posts, etc. In that way it is similar to news search where there are opinions, writing style, etc. as opposed to a corporate or business web site which normally would not have comments, etc.
Clustering would be a way of tackling the problem of showing related blogs.
However, in some ways that work is already done by the links that the blogger has set up. For example, if I look at Daniel’s blogroll, my guess is that those blogs are related in someway to information retrieval or enterprise search. If I like this blog, chances are I may like the other blogs. It isn’t the only way to find similar blogs but it seems to me that it would give you a head start in pointing you in the right direction.
Dave, I think we’re on the same page. As the content on the web becomes more collaboratively written and more elaborately interconnected, I do think that almost all search will benefit from the kinds of techniques that would help blog search. today.
And, of course, I think transparency is a no-brainer.
I’d be really interested to hear what folks have to say about our effort here: http://www.blognetnews.com/search
Probably the most developed section covers the news and public affairs blogs writing about state and local in each of the 50 states.
Dave, I tried it, and it doesn’t seem designed for people like me whose interests are primarily in software and technology issues. Perhaps it is more targeted at a different sort of reader.