Categories
Uncategorized

Information Accountability, Still Unsolved

The top story on Techmeme as I write is Apple Denies Steve Jobs Heart Attack Report: “It Is Not True”. An excerpt:

It is significant that this report appeared on a site owned by CNN.  CNN does not profess to be directly responsible for iReport, but its name is at the top of the site. It’s possible that reports like this will significantly damage CNN’s credibility, and we wouldn’t be surprised if this caused them to pull back from association with “citizen journalism.”

I absolutely agree. We have to work out the social and legal norms of information accountability.

Categories
General

Information Seeking for the Political Process

Yesterday, I participated in a discussion about technology issues facing the next United States administration. The New York CTO Club, which is non-partisan, invited both presidential campaigns to participate, but unfortunately we only had representation from one of them. Still, it was an earnest, informed discussion that excited me despite my deep skepticism about the political process.

One of the issues we discussed was the challenge of communication to inform policy, whether from government to the population at large or vice versa. In particular, we discussed the problem of distilling countless emails to ensure that politicians, whose time is extremely scarce, are aware of the best ideas coming from citizen activists.

The conversation could have been about search and relevance ranking. Concerns ranged from managing near-duplicate documents (people often copy and paste letters from organizations) to anonymous authorship and reputation systems. Indeed, the issue of communicating about policy amounts to a collection of information seeking problems for politicians, non-political staffers, activists, and the general population.

I was happy to see some other folks agreeing that any relevance measure would be suspect, given the adversarial nature of the political process. What may be good enough for casual web search is surely inadequate when policy decisions are at stake. I see the implication as a need for transparent information seeking support systems that offer users control and guidance. Moreover, what is good for policy-driven information seeking seems broadly applicable to information seeking in general.

To be clear, any improvements to our current process of communicating between government and citizenry would be welcome. But we should not cut corners in our aspirations.

Categories
Uncategorized

Classification: Not Just Dewey Decimal

Lynda Moulton at Gilbane wrote a nice post reminding us that there is more to classification that the Dewey Decimal System. She talks about using subject headings as facets, an idea you can see in action at Endeca-powered library catalogs like the Triangle Libraries Research Network and the State Univesity Libraries of Florida.

Categories
Uncategorized

Busy Day, No Time to Post

After keeping up an absurdly high posting rate over the past few days, I feel embarassed to go almost 24 hours without a peep. But, as someone once said, you can either live life or record it, and today I didn’t have any time to record.

Instead, I participated in discussion with a technology advisor to the Obama campaign, heard presentations from Google’s enterprise search team, and served on a panel of IBM alumni to help IBM elaborate its alumni community strategy. All good stuff, and some of it even relevant to this blog. I’ll write more when I have a chance to gather my thoughts.

Categories
Uncategorized

Oodles of Design

As I was testing out various blog search options today, I discovered a cool blog called Oodles of Design. I’m embarassed not to have discovered it sooner, given its focus on faceted search and interaction design. Check it out; I’ve added it to my Blogroll.

Categories
General

Google Blog Search: Not Different Enough

A couple of weeks ago, I was commenting about a recent position paper by Marti Hearst, Matt Hurst, and Sue Dumais and asking whether blog search was fundamentally different from other information seeking tasks on the web.

Well, I read today on ReadWriteWeb that Google launched a new home page for blog search. Of course, I tried it immediately. Indeed, the home page was enticing, a portal style reminiscent of their news home page. But then I quickly realized that all they’d really done was copy their news design and applied it to blogs. Once you search, you’re back to what is essentially a ranked list of results.

It’s a clean, well-engineered implementation, but I was hoping for something different. I know that Google isn’t big on innovative search interfaces, but I had somehow imagined that they’d recognize that blog search really calls for innovation. So do news search and web search, but blogs, as commenters here have pointed out, make the clearest case.

Oh well, an opporunity wasted for Google, an opportunity preserved for someone else. Faceted, exploratory blog search, anyone?

Categories
Uncategorized

Build Your Own “In Quotes”

Last week I posted about Google’s new “In Quotes” beta. Today Bob Carpenter posted about how to extact quotes using LingPipe. Worth a read.

Categories
Uncategorized

Relevance and Blog Traffic

My colleague Oscar Berg at The Content Economy posted about how the name of his blog is drawing a surprising amount of traffic from people looking for content about the economy:

The last weeks the most commonly used search keywords are:

“what’s the worst that could happen with the economy” and “how does the economy crash effect people as individuals¨ (in slightly different variations) .

The search statistics also show that a lot can be done to improve relevancy in search.

Indeed. We really need to do better when it comes to blog search–and, by extension, search for semistructured content on the web.

Categories
Uncategorized

Avrim Blum Google-Hacked?

Maybe it’s just an accident, but I was surprised to see that the top search result on Google for Avrim Blum is not Avrim’s page, but rather John Langford‘s:

No idea whether this a prank or just an accident.

Categories
Uncategorized

Workshop on Empirical Hypothesis Spaces

Thanks to Kristiaan Pelckmans for posting about the upcoming Workshop on Empirical Hypothesis Spaces at NIPS 2008:

This workshop asks for insights how far we may/can push the theoretical boundary of using data in the design of learning machines. Can we express our classification rule in terms of the sample, or do we have to stick to a core assumption of classical statistical learning theory, namely that the hypothesis space is to be defined independent from the sample?

The workshop chairs are an impressive crew: Maria-Florina Balcan, Shai Ben-David, Avrim Blum, Kristiaan Pelckmans, and John Shawe-Taylor.