Month: August 2009

Reminder: HCIR 2009 Submission Deadline is August 24th!

Post author By Daniel Tunkelang
Post date August 3, 2009

Just a quick reminder that the submission deadline for HCIR 2009, the Third Annual Workshop on Human-Computer Interaction and Information Retrieval, is August 24th, which is just 3 weeks away! Please spread the word; I know people can be forgetful during the summer months. The workshop itself will be held on October 23rd, at Catholic University in Washington, DC.

General

SIGIR 2009: Day 3, Industry Track: Vendor Panel

Post author By Daniel Tunkelang
Post date August 3, 2009
2 Comments on SIGIR 2009: Day 3, Industry Track: Vendor Panel

The last session of the SIGIR 2009 Industry Track was the enterprise search vendor panel. Originally, I’d hoped to have CTOs (or the equivalent) from Autonomy, Endeca, and FAST–specifically, Peter Menell (CTO of Autonomy), Adam Ferrari (CTO of Endeca), and Bjørn Olstad (formerly CTO of FAST, now a Microsoft Distinguished Engineer).

Since it would have been inappropriate for me to moderate a panel that included my own manager and representatives of two of Endeca’s competitors, I recruited Liz Liddy, who not only is the chair of SIGIR, but also whom I felt was uniquely qualified to understand both the research and business sides of this field. As if that wasn’t enough, James Allan managed to procure Bruce Croft, whose volume of achievements includes both a Salton Award and a Research in Information Science Award, the highest honors in information retrieval and information science. And I recruited our very own commenter-in-chief Jeremy Pickens to serve as a time-keeper. I wish I’d also enlisted him for the panel I moderated!

That, at least was the plan. Plans, of course, are subject to change. Only a couple of days after I reached out to Bjørn, I saw that he was promoted to replace former FAST CEO John Markus Lervik. He suggested Øystein Torbjørnsen, chief architect of FAST’s core search. Fine by me. Two weeks before the conference, however, Bjørn wrote to inform me Øystein had to back out for personal reasons. But he offered senior product manager (the acquisition shaved a notch off of his former title of VP of Product Management) Jeff Fried as a substitute. All good.

Fortunately, I knew that I could trust my manager to live up to his commitments. I did have to fill out his online registration form on site at 7:30am, but he more than made up for it by buying me drinks after the conference. Two down, one to go.

Then there was Autonomy. Strangely, despite having working in this space for nearly a decade, I’d never actually met anyone from Autonomy. That meant I’d have to cold-call someone to have any chance of getting them to participate in the panel. Who to call? I decided that, since industry conference participation was probably under the umbrella of corporate marketing, I’d try their head of PR, one of the few Autonomy employees whose contact information is published on the open web.

Success! Or so I thought at the time. She told me that Peter Menell would participate in the panel, and, with my roster complete, I proceeded to start publicizing the Industry Track. However, a week and a half before the conference, she calls me to let me know that Peter can’t attend, and that in fact no one from Autonomy is available to take his place.

I was on vacation at the time, but of course conference organizers don’t get to take vacation, at least not when panelists cancel at the last minute. I started mulling over my options.

Including an empty chair on the panel would have given me the satisfaction of exposing Autonomy’s snub (both to me personally and to the information retrieval community), but I realized that wouldn’t be at all fair to the attendees.

Instead, I started going through the short list of possible panelists, favoring those who lived in the Boston area. And then, by fortuitous coincidence, I received an email from Raul Valdes-Perez, Executive Chairman and co-founder of Vivisimo: “If by any chance you need any last-minute help or stand-in, let me know.” I made a mental note to investigate Raul as a possible psychic and then happily welcomed him on board. And breathed a sigh of relief.

For all of the upheaval in bringing the panel together, the actual session went like clockwork. The panelists were respectful, disciplined, and yet not afraid to take risks in their stances. They talked about the challenges of holistically evaluating search applications, the interplay between relevance ranking and faceted search, the pros and cons of federated search, and much more. HCIR was a major theme, though perhaps that’s not surprising given the participants. As usual, Mary McKenna took more detailed notes.

I take pride both in the overall quality of the panel and specifically in the performance of my manager, whose background is more in systems and databases than in information retrieval. Of course, I’m biased, and he does sign my expense forms. 🙂

Regardless, I think I can muster enough objectivity to say that the panel was a huge success. Bruce Croft’s response, which also ended the Industry Track, was a fitting valedictory address: he urged us not to just walk away from the conversation. I am proud of the success of the Industry Track as an event, but I hope it is only the beginning of a deeper mingling of researchers and practitioners.

To James Allan and Jay Aslam, the SIGIR 2009 co-chairs, I thank you for the privilege and opportunity to organize the Industry Track. And, to whoever takes on such a responsibility at future SIGIR conferences, I hope you can benefit from my experience without having to relive it!

General

SIGIR 2009: Day 3, Industry Track: Analyst Panel

Post author By Daniel Tunkelang
Post date August 3, 2009
2 Comments on SIGIR 2009: Day 3, Industry Track: Analyst Panel

The morning sessions of the SIGIR 2009 Industry Track consisted of five individual presentations; the afternoon consisted of two panels. The requirement to synchronize with the research talks led to the allocation of 90 minutes for each panel–which was a bit more than I’d originally planned on (and this change, like many, occurred in the two weeks before the conference). James Allan, one of the SIGIR 2009 co-chairs, suggested that we add an academic responder to each of the panels to account for the additional time, and we went with that approach.

The first of the two afternoon panels consisted of industry analysts: Whit Andrews (Gartner), Sue Feldman (IDC), and Theresa Regli (CMS Watch). I moderated the panel–or, more accurately, attempted to moderate it. Marti Hearst served as the academic responder.

The panel opened with each of the three panelists making an opening statement, sharing their perspectives about the key business concerns and trends in the search industry. I asked them to talk about enterprise search in the broadest sense of the term–search applications that companies buy or build)–rather than in the narrow sense of no-frills intranet search.

It became immediately clear from their opening statements that the panelists had wildly different perspectives and styles. While I think the term “food fight” that I heard bandied around afterward is a bit of an exaggeration, they certainly engaged in a heated debate.

One topic that attracted particular controversy was how enterprise search applications should assign relevance to search results. Whit suggested that, in an enterprise setting, the main objective function is the profitability of the enterprise, and that relevance should essentially be money driven. Sue and Theresa disagreed sharply, mainly arguing that relevance should be user-controlled.

I’m probably oversimplifying their arguments, and in any case shouldn’t take sides in a debate among analysts! Still, Whit probably won’t be surprised that my sympathies generally lie with the users. That said, Whit is right that enterprise search companies sell to enterprises, not directly to customers, and those enterprises (like web search companies that sell to advertisers) may have interests that aren’t always aligned with those of users. At Endeca, we advise our customers on how to configure and communicate a relevance ranking strategy, but ultimately our customers make their own decisions. After all, it’s their site and their money.

And that leads to the other topic that caught my attention and came up during Marti’s responder session: the question of how analyst firms make money. All three of the panelists were open about how their employers make money, whether from enterprise buyers, vendors, or some combination thereof. My personal preference would be that analysts make money primarily from enterprise buyers–but of course I work for a frugal vendor. I’ve heard from a variety of sources that Endeca “doesn’t spend enough” on analyst services–or on corporate marketing in general. Since neither vendors nor analyst firms open up their books, I can only speculate. Fortunately, it’s clear the analysts on the panel not only have integrity, but also have strong enough views that they can’t probably couldn’t be swayed by money or pressure.

As organizer of the track, I inrended for the panel to offer an audience of mostly academic types a chance to see people whose opinions influence tens (if not hundreds) of millions of dollars in purchasing decisions. I hope I accomplished that. I’m especially grateful to these highly billable analysts for freely sharing their time and ideas. Neither SIGIR nor I could possibly have afforded their market rates!

For other perspectives, take a look at Theresa’s blog post, “Know Your Relevance“, or Mary McKenna’s summary post.

General

SIGIR 2009: Day 3, Industry Track: Nick Craswell

Post author By Daniel Tunkelang
Post date August 2, 2009
8 Comments on SIGIR 2009: Day 3, Industry Track: Nick Craswell

One of the things I didn’t consider when I signed on to organize the SIGIR 2009 Industry Track was that I’d have to replace speakers and panelists on less than two weeks’ notice. But what I couldn’t even have imagined was replacing a speaker on less than 24 hours’ notice!

Tuesday morning, the second day of the conference and the day before the Industry Track, I woke up to an email from Tip House of the OCLC, whom I’d planned to have speak about his experiences developing Worldcat.org, the world’s largest bibliographic database. Unfortunately, he had fallen ill and would not be able to make it to the conference.

I was determined not to have a hole in the program. I immediately sent an email to the Director of Search at LinkedIn, whom I had just met at the poster session the previous evening, hoping he might have a presentation tucked away about LinkedIn’s recent launch of faceted people search. I turned to Twitter–which actually earned me a plausible suggestion.

But it was during the morning coffee break that serendipity struck. As I walked by the Bing exhibitor table, I saw Jan Pedersen, Chief Scientist of Core Search at Microsoft, chatting with Peter Bailey, an applied researcher on the Bing team. I turned to them and, in my most charming voice, asked if they might be interested in having someone on their team talk about Bing the next day. They took a few minutes to think it over, and then replied in the affirmative, producing Nick Craswell, also an applied researcher. Problem solved, and I can proudly say that I Binged for it!

Nick talked about how query modeling, focusing issues like query ambiguity, session context, and temporal query dynamics (particularly seasonality). He talked a bit about a technique that involved random walks on click logs–a technique I remember striking me when I first heard him talk about it at ECIR 2008.

The talk was a bit raw–understandably so given the short notice. But it was great to see a major web search practitioner connecting information retrieval research to actual product. Yes, there were the standard caveats about not revealing secret sauce, but the talk was open and substantive. Indeed, I hope Nick will be able to share the slides!

UPDATE: Nick emailed me the slides and gave me permission to post them here.

Uncategorized

An Apology to Vijay Gill

I don’t know if Google’s Vijay Gill reads this blog. But a post of his just caught my attention, and I feel I owe him an apology.

A little over a month ago, I wrote a post entitled “Even Google Should Beware Of Hubris“. I stand by much of that post. But I specifically said:

And, just a few days ago, Google’s senior manager of engineering and architecture punctuated a panel discussion at the Structure 09 conference–where he was sharing a stage with a counterpart from Microsoft–with the punchline “If you Bing for it, you can find it.”

Apparently I shouldn’t believe everything I read in The Register. Vijay Gill, the manager quoted above, wrote a post on his blog that appeared shortly after The Register article (and after my post), entitled “Google Does Not Mock Bing“. Here’s the most relevant paragraph:

I wasn’t mocking Bing when I said “Bing for it, you can find it.” I meant that seriously, in the spirit of giving props to a competitor, and a good one at that. Najam and I have been friends since before Google had a business plan, and I have the greatest respect for him and for Microsoft as a company. The Microsoft approach has some good points, which work for their business plan. I was speaking of one particular approach, among several others, which can solve the same problem. There was no undercutting anything, there are two approaches and thats that.

Vijay, if you’re reading this, I’m sorry for taking so long to notice your clarifying post. I hope most of your fellow Googlers are as respectful of your competitors.

General

SIGIR 2009: Day 3, Industry Track: Evan Sandhaus

Post author By Daniel Tunkelang
Post date August 2, 2009

Back to our regularly scheduled blogging about the SIGIR 2009 Industry Track. For those who haven’t been reading along, we covered the first three talks:

Matt Cutts (Google): Web Spam and Adversarial IR: The Road Ahead
danah boyd (Microsoft Research): The Searchable Nature of Acts in Networked Publics
Vanja Josifovski (Yahoo! Research): Ad Retrieval – A New Frontier of Information Retrieval

As you can see, that covered the three major web search engine companies, at least at the time of the conference. Sure, danah’s talk wasn’t exactly what people might have expected, but I had some creative license as an organizer, and the audience loved her talk. Besides, as we’ll get to in the next post, Microsoft had other opportunities to present representatives of its more conventional information retrieval divisions.

The next speaker, according to the original plan, was to be Tom Tague, who leads the Open Calais project at Thomson Reuters. Unfortunately, a week before SIGIR, I found out that he would be unable to make it. One of his colleagues offered to present in his stead less than 24 hours after his cancellation, but by then I’d already found a replacement on my own: Evan Sandhaus, Semantic Technologist in the New York Times Research and Development Labs, agreed to talk about “Corpus Linguistics and Semantic Technology at the New York Times”.

You can get a good idea of his talk from these slides–or, better, yet, from this video. Both are from the closing keynote that he and his colleague Rob Larson delivered at the 2009 Semantic Technology Conference.

I won’t try to recapture Evan’s fascinating narrative about the history of information storage and retrieval at the New York Times. Rather, I’ll skip to the parts that should matter most to information retrieval researchers and practitioners: the availability of the New York Times Annotated Corpus through the Linguistic Data Consortium (LDC), and the New York Times’s intention to contribute to the Linked Data Cloud.

For me personally, the annotated corpus is the bigger deal. It represents 1.8 million articles written over 20 years. It is annotated both with manually-supplied summaries and tags–the latter drawn from a controlled vocabulary of people, organizations, locations and topic descriptors–and with algorithmically-supplied tags that are manually verified. My colleagues and I at Endeca have been working with the annotated corpus, and it is a delight. I hope that the information retriwval community will make heavy use of this wonderful new resource.

General

Are Academic Conferences Broken? Can We Fix Them?

Post author By Daniel Tunkelang
Post date August 2, 2009
35 Comments on Are Academic Conferences Broken? Can We Fix Them?

I’d hoped to get through all of the SIGIR 2009 Industry Track before blogging about anything else (such as Yahoo! search going bada-Bing), but clearly I’m taking too long. So I’m following Daniel Lemire’s suggestion that I post a recent comment on Lance Fortnow’s blog (actually a response to his CACM column entitled “Time for Computer Science to Grow Up“) here at The Noisy Channel.

It’s nice to see this piece joining a growing chorus questioning the way we conflate the distinct concerns of disseminating knowledge, establishing professional reputation, and building community. This problem is not unique to computer science, but we are certainly in a position to lead by example in addressing it.

In age where distribution is nearly free, I agree that we should move the filtering role from content publishers to content consumers. There’s no economic reason today why scholarship (or purported scholarship) shouldn’t be published online. Of course, the ability to publish digital content for free (or close to free) does not imply anyone will (or should) read what you write. The blogosphere offers an instructive example: the overwhelming majority of blogs attract few (if any) readers. I suspect that the same holds true for arXiv.org. Of course, peer-reviewed content may not fare that much better, particularly given the proliferation of peer-reviewed venues. Regardless, it makes no sense for publishers to act as filters in an age of nearly-free digital distribution.

That brings us to the question of how researchers should establish their professional reputation–and, in the case of academics, obtain tenure and promotion. Today, they have to publish in peer-reviewed journals and conferences. Even if we accept the weaknesses of the current peer-review regime, we should be able to separate content assessment from distribution. The peer-review process (and review processes in general) should serve to endorse content–and ideally even to improve it–rather than to filter it.

Finally, conferences should primarily serve to build community. I find the main value of conferences and workshops to be face-to-face interaction, and I’ve heard many people express similar sentiments. Part of the problem is that so few presenters at conferences invest in (or have the skills for) delivering strong presentations. But more fundamentally it’s not even clear that the presentations are the point of a conference–after all, an author’s main motive for submitting an article to a conference seems to be getting it into the proceedings.

Here are some questions I’d like to suggest we consider as a community:

What if presentation at a conference were optional, and an author’s decision to present had no effect on inclusion in the proceedings? Would there be significantly fewer presentations? Would those fewer presentation be of higher quality?

What if the process of peer-reviewing conference submissions required the submission of presentation materials rather than (or in addition to) a paper? Would the accepted presentations be of higher quality? Would researchers invest more in presentation skills? What would happen to strong researchers without such skills?

Can we update the traditional conference format to foster more productive interaction among researchers? For example, should we have more poster sessions and fewer paper presentations?

I’d love to see the computer science community take the lead in evolving what increasingly feel like dated procedures for disseminating knowledge, establishing professional reputation, and building community. I’ve tried to do my small part, co-organizing workshops on Human-Computer Interaction and Information Retrieval (HCIR) that emphasize face-to-face interaction and organizing the SIGIR 2009 Industry Track as a series of invited talks and panels from strong presenters. But I’m encouraged to see “establishment” types like Moshe Vardi and Lance Fortnow leading the charge to question the status quo.