Categories
Uncategorized

Wolfman vs. Googzilla

What’s not to love about a good fight? Check out David Talbot’s “Wolfram Alpha and Google Face Off” in Technology Review. I don’t come away with a sense that I’ll regularly use either Wolfram Alpha or Google Public Data, but it’s nice to start seeing people off-road with them and compare the results. The Wolfram Alpha launch is supposed to be this month, so presumably we’ll all be able to do that in a matter of weeks, if not days. Google Public Data is available already, intergrated into web search.

Sadly, neither of these guys seems interested in providing a non-NLP interface. In my view, that would be far more useful. But I suppose it’s not what sells papers.

Categories
General

YouTube vs. Unauthorized Advertising

I was intrigued to see a flurry of  posts today about how YouTube is cracking down on unauthorized advertising. Naturally, since YouTube’s raison d’etre is to make videos available for free, they’d like a cut of any advertising revenue associated with the content they serve–particularly since they’re bleeding money to pay for bandwidth. But some uploaders work around Google’s revenue model for YouTube by embedding ads in the videos–in violation of YouTube’s terms of service.

Am I the only person who finds this situation comical, or at least a bit ironic? There’s been a lot of discussion about how newspapers–and publishers in general–are losing revenue because Google is monetizing their audiences through its own ads. But now the tables are turned, and it’s content publishers (though probably not the mainstream media) who are obtaining ad revenue at Google’s expense.

Google is certainly acting within its legal rights; the terms of service make it quite clear that YouTube prohibits unauthorized commercial use, with the noted exception of “uploading an original video to YouTube, or maintaining an original channel on YouTube, to promote your business or artistic enterprise”.  In other words, you can promote yourself, but you can’t sell ads. Still, as Lawrence Lessig says, “code is law“, and Google will have an uphill battle to prevent unauthorized advertising without a lot of collateral damage.

I do hope that Google is experiencing a moment of empathy. As Google defends against this threat to YouTube’s business model, perhaps it will better understand what the newspaper industry is going through.

Categories
General

A Topology of Search Concepts

Vegard Sandvold has an interesting post entitled “Help Me Design a Topology of Search Concepts” in which he visualizes assorted search approaches in a two-dimensional space, the two dimensions being the degree of information accessibility and whether the approach is algorithm-powered or user-powered.

His four quadrants:

  • Low information accessibility + algorithm-powered = simple search (e.g., keyword search)
  • Low information accessibility + user-powered = superficial search (e.g., collaborative filtering)
  • High information accessibility + algorithm-powered = ingenious search (e.g., question answering)
  • High information accessibility + user-powered = diligent search (e.g., faceted search)

I’m not sure how I feel about the quadrant names (though I like how my employer and I are champions of diligence!), but I do like this attempt to lay out different approaches to supporting information seeking, and I like his choice of axes.

More importantly, I hope this analysis helps advance our ability as technologists to match solutions to information seeking problems. Many of us have an intuitive sense of how to do so, but I rarely see principled arguments–particularly from vendors who may be reluctant to forgo any use case that could translate into revenue.

Of course, it would be nice to quantify these axes, or at least to formalize them a bit more rigorously. For example, how do we measure the amount of user input into the process–particuarly for applications that may involve human input at both indexing and query time? Or how do we measure information accessibility in a corpus that might include junk (e.g., spam)?

Still, this is a nice start as a framework, and I’d be delighted to see it evolve into a tool that helps people make technology decisions.

Categories
General

SIGIR ’09 Registration Details

You can now register for SIGIR 2009! Here are the details from the registration page:

Registration fees for ACM members are as follows:

  • $695 for the main three-day conference, including the conference banquet; $395 for students
  • $175 for half-day tutorials
  • $295 for full-day tutorials or two half-day tutorials
  • $150 for workshops
  • $250 for Wednesday’s Industry Track
  • Attendees who are not members of the ACM are charged higher registration fees.
  • Students have special discount conference/tutorial/workshop package options.
  • These early registration rates will end at midnight on May 24, at which point rates will rise, generally by $50. Normal registration rates will end at midnight on July 12, at which point rates will rise again, generally by $50.

I know that some have complained about the increase in price relative to previous years. Unfortunately, I suspect that one of the consequences of our current economic climate is that it’s much harder to subsidize conferences through sponsorship.

What I can say, however, is that the $250 for the one-day Industry Track is a steal–compare that to the fees for other industry conferences, and look at the speaker line-up!. I strongly recommend it for practitioners. Of course, if you can afford to attend the whole conference, even better. This year’s papers and posters particularly emphasize work from industry, and I’m already excited about learning what everyone has been up to!

Categories
General

Conferences, Conferences, Conferences!

Apologies for the lull in blogging this week, but it’s been a busy week in what looks to be a busy spring (and summer!) of conferences related to information access.

This week, I was in Boston, presenting at the Infonortics Search Engine Meeting and the International Association of Scientific, Technical & Medical Publishers Spring Conference.

The Search Engine Meeting was fun, if a bit cozier than in previous years (the recession is definitely taking a toll on travel budgets). The keynote by David Evans, entitled “E-Discovery: A Signature Challenge for Search“, made a phenomenal case for weaning researchers from web seach as the canonical domain for information retrieval, and instead setting our sights on more valuable problems that emphasize recall, require human-in-the-loop processing, and lack training data or established evaluation metrics. He didn’t call it HCIR, but he was certainly preaching it! You can find copies of most of the presentations here.

The STM conference was  a unique experience for me, starting from the keynote by a lobbyist for stronger copyright law. Indeed, the first day of the conference was largely concerned with addressing two threats to STM publishers’ current business models: copyright infringement and open access. Not everyone at the conference saw open access as a threat, and that made for a healthy debate. The second day focused on the present and future of semantic technologies–somewhat more familiar territory for me. I particularly liked a presentation by Priya Parvatikar that explained the semantic web in clear, hype-free terms. In fact, I’m looking forward to re-using it when she or the conference organizers post it!

Meanwhile, there are more conferences coming up! The Enterprise Search Summit takes place May 12-13 at the Hilton New York. I’ll be presenting on a panel about “Emergent Social Search Experiences“. The conference isn’t cheap, but they are offering a great recession-busting special: a freeVIP Pass” that includes admission to the keynotes and showcase. I hope that means I’ll see more of you at the summit in a couple of weeks!

Still to come in June and July:  the 5th Annual Text Analytics Summit, Endeca Discover ’09, SIGMOD 2009, and of course SIGIR ’09.