Day 2 of the NSF Sponsored Symposium on Semantic Knowledge Discovery, Organization and Use at NYU brought out representatives of the titans of web search:
- Yahoo: Patrick Pantel (who actually just joined Yahoo) regaled us with entertaining tales “Of Search and Semantics”, whisking us through the history of search, arguing that semantics are making a commercial impact today, and then describing some current research at Yahoo. He elaborated a fair amount on SeeLEx, which stands for seed list expansion and is, in own own words, Yahoo’s version of Google Sets. But, unlike Google Sets, SeeLEx offers at least some transparency into the basis for similarity. I couldn’t fnd anything published other than the slides in my notebook from the symposium, but this is interesting work.
- Google: Marius Pasca delivered an excellent talk, though its title of “Web Search as an Online Word Game for Knowledge Discovery” was a bit misleading. Unfortunately, neither of the Google speakers provides slides for the symposium notebook–I hope that isn’t a mandate from Google’s corporate policy. In any case, the talk was along the lines of his AAAI 2008 paper, “Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction“. It presented an intriguing approach for inferring class atrributes based on a distributional analysis of query space–an approach that reminded me of his earlier CIKM paper on “Acquisition of categorized named entities for web search“. Unfortunately neither of these papers is available without the appropriate memberships, but perhaps Marius will send them to you if you ask nicely.
- Microsoft: Bill Dolan, Principal Researcher and manager of MSR’s Natural Language Processing Group, asked “Where does NLP stop and AI Begin?” He took us through the history of the MindNet project, and ruefully explained that it “worked beautifully” but “just not very often”. He made a compelling argument that semantic knowledge discovery researchers need to take a step back from AI-hard problems and focus on problems like paraphrase that are more amenable to the kind of progress we’ve seen in areas like machine translation.
Their presentations were followed by a three-hour (!) poster / demo session. As one of the demo presenters, I had 90 seconds in the poster boaster session to pitch my demo, and then attendees could wander around the posters and demos during the two-hour lunch break. I had some great conversations with the folks who did swing by, but I’m not sure this was the ideal format for conducting such a session.
The session after the demos was a bit of a blur for me, but the final discussion session was very engaging. One of the hot topics in the semantic knowledge community, much as it is in the information retrieval community, is the need for query logs–and, more generally, for good data–to conduct research. Having representatives of the three major web search engines there made the conversation certainly more interesting.
All in all, an excellent symposium, and I’m very grateful to Satoshi Sekine for organizing it.