Author: Daniel Tunkelang

High-Class Consultant.

Microsoft Delivers a Bundle of Joy

Post author By Daniel Tunkelang
Post date February 10, 2009
8 Comments on Microsoft Delivers a Bundle of Joy

The news, which has been an open secret among enterprise search insiders for a while, is finally official: Microsoft announced at FASTForward ’09 that they will be bundling the FAST enterprise search product with their SharePoint product. To be more precise, it will be available as a $25K /server add-on for SharePoint customers. Considering the total cost of a typical SharePoint deployment, that’s practically free. Or, as Mark Bennett put it, an “aggressive price”.

Who are the winners and losers here? Unclear. Is this initiative aimed at Autonomy, Endeca, or Google? At all of the above? What are the implications for FAST’s legacy ESP 5.x customers, who have been promised 10 years of support?

My take: Microsoft sees SharePoint + FAST as a response to Autonomy + Interwoven, or vice versa–Microsoft’s acquisition predates Autonomy’s by a year. Both companies seem to believe that, in order to sell a search engine, they need to bundle it into a content management system.

Not everyone believes that. Google and Endeca are focusing on search as the main prize, not to mention a slew of smaller vendors. We may all disagree on the right way to crack the search nut, but we haven’t given up on building a better nutcracker.

Did Microsoft pay $1.3B just to upgrade SharePoint Search? I’m only a researcher, so perhaps I don’t appreciate the economics of the transaction. I’m just glad not to be the guy who will ever have to justify that decision to my manager, let alone a board of directors.

Uncategorized

Writing a Book on Faceted Search

Post author By Daniel Tunkelang
Post date February 10, 2009
15 Comments on Writing a Book on Faceted Search

I’ve been slowly telling friends and family about my upcoming project, but now that it’s published online, I thought I’d share the news more publicly: I’m writing a book about faceted search. It will be part of the Morgan & Claypool series of Synthesis Lectures on Information Concepts, Retrieval, and Services, edited by none other than Gary Marchionini.

I discovered this series a few months ago, when Gary recruited me to review another book in the series, a lecture on exploratory search by Ryen White and Resa Roth. Perhaps my most significant contribution was to suggest the subtitle they ultimately used.

In any case, I was impressed with the book and more generally with the series, so I humbly asked Gary what might be involved in contributing to it. The next thing I knew, I was signing a contract to write a book on faceted search!

I’m very excited about this book, even though I’m aware that it will take a toll on the rest of my personal and professional life as I work on it. I’m grateful to Gary and to the folks at Morgan & Claypool for giving me the chance to write it. I do ask your indulgence as readers if I slow down a bit as I divert some of my cycles to writing rather than blogging. I’ll try to make up for it by sharing some of my observations about the differences between blogging and old-fashioned writing.

General

So you think you can run a search conference…

Post author By Daniel Tunkelang
Post date February 9, 2009
9 Comments on So you think you can run a search conference…

Today is the first day of FASTForward ’09, the annual user conference hosted by FAST (the enterprise search company acquired by Microsoft last year). I thought it would be a good day to reflect on the variety of search-related conferences and user groups that are competing for our attention and wallets these days.

First, there are the vendor user conferences. These include FASTForward, Endeca Discover, and a slew of events hosted by smaller vendors. These are great events for those vendors’ current customers, and occasionally are used as sales tools to persuade select prospective customers. At their best, they emphasize knowledge sharing among customers, as well as substantive presentations about the vendors’ products and services. At their worst, they offer a mixture of propaganda and entertaining (but not necessarily relevant) guest lectures. And they aren’t cheap: it’s $795 to attend Discover and $1,695 for FastForward–plus travel and lodging! Still, if you learn something that saves you a few days of consulting services, you’ll get your money’s worth.

Then there are the vendor-independent industry conferences. A few that are coming up:

The good news about these is that, because they are vendor-independent conferences, you’re likely to hear a variety of perspectives. If they’ve selected their speakers well, you’ll learn about the different technologies even the philosophies underlying those technologies. If not, then be braced for a bunch of warmed-over sales pitches. Because that’s the bad news: it’s hard to run a for-profit search conference and actually make a profit–and no, these usually aren’t cheap either. The main sponsors are usually vendors and consultancies, which is both expected and appropriate. The sponsorship model only becomes a problem when the speakers earn their slots through sponsorship rather than through the merits of their content.

And then there are the academic conferences, particularly CIKM and SIGIR. The good news with these is that the content is top-notch: peer-reviewed presentations from top researchers around the world, hailing from both academia and industry–the latter usually representing the major industry research labs. They are also relatively inexpensive, since they are run by non-profit organizations and rely heavily on volunteers. The bad news is that research isn’t always immediately relevant to practice. Indeed, some of the presentations will make your head spin twice: first, as you focus to understand them, second, as you struggle to figure out how to apply their results or key insights to your real-world problems.

In my opinion, all of these leave a gap–a need for conferences that bring the rigor and seriousness of academia to bear on content that is relevant to industry practitioners. I’m hoping that the SIGIR ’09 Industry Track helps fill this gap. But I further hope that everyone in the conference-organizing business, regardless of their business model, shares the aspiration to deliver quality content that has real impact on the practical world of search.

Uncategorized

Note to Bloggers: Don’t Quit Your Day Job

Post author By Daniel Tunkelang
Post date February 8, 2009
6 Comments on Note to Bloggers: Don’t Quit Your Day Job

Dan Lyons, better known to most as Fake Steve Jobs, wrote an article in Newsweek today entitled “Time to Hang Up the Pajamas“, or “Growing Rich by Blogging Is a High-Tech Fairy Tale”.

An excerpt:

My first epiphany occurred in August 2007, when The New York Times ran a story revealing my identity, which until then I’d kept secret. On that day more than 500,000 people hit my site—by far the biggest day I’d ever had—and through Google’s AdSense program I earned about a hundred bucks. Over the course of that entire month, in which my site was visited by 1.5 million people, I earned a whopping total of $1,039.81. Soon after this I struck an advertising deal that paid better wages. But I never made enough to quit my day job.

Read the whole post, especially if you’ve ever entertained fantasies of blogging to generate a primary income. I’m not saying you can’t use a blog to promote yourself and cultivate a reputation that you can monetize. But I think it’s unlikely that you’ll make more money from Google AdSense than Fake Steve Jobs.

Uncategorized

WikiDashboard: Visualizing Wikipedia Edits

Post author By Daniel Tunkelang
Post date February 8, 2009

Ed Chi, a senior research scientist at the Palo Alto Research Center (PARC), recently delivered a presentation at MIT about WikiDashboard, a tool that he and PARC colleague Bongwon Suh developed in order to visualize the dynamic nature of Wikipedia’s collaborative editing process. Erica Naone, a regular here at The Noisy Channel, wrote a nice article about it in Technology Review, entitled “Who’s Messing with Wikipedia?“.

I like Ed Chi’s work, and we talked about the WikiDashboard project when I visited him at PARC just over a year ago. But, as I was quoted in the article, I do wonder what problem this visualization aims to solve. A picture, it is said, is worth a thousand words, but this feels too much like looking at a thousand words. I hope that Ed and the team at PARC invest in distilling a more consumable signal out of this wealth of data that can be applied to solve real problems.

I also hope, that as Rob Miller points out in the article, the collection and publication of such measurements does not simply enourage people to game them.

Uncategorized

Comments I Read: Jeremy Pickens

Post author By Daniel Tunkelang
Post date February 7, 2009
8 Comments on Comments I Read: Jeremy Pickens

Jeremy Pickens doesn’t have a blog–as far as the blogosphere goes, he is homeless. Or rather, he likes to hang out at my house–which is great, because he’s the kind of guest who brings over good wine and then helps you with the cooking. He is by far the most active contributor to the comment threads here at The Noisy Channel. If you are reading this blog through an RSS reader and skipping the comments, here is a taste of what you’ve been missing:

I met Jeremy a few years ago–at RIAO 2007 in Pittsburgh if I recall correctly. He co-authored a paper on “Collaborative Exploratory Search” presented at the inaugural HCIR workshop that same year.

As his home page at FXPAL tells us:

Jeremy’s major research themes, since joining FXPAL in 2005, include Music Information Retrieval, Video Information Retrieval and Collaborative Exploratory Search (Collaborative Information Seeking). He earned his Ph.D. from the University of Massachusetts Amherst at the Center for Intelligent Information Retrieval (CIIR). Jeremy did his post-doctoral work at King’s College in London from 2004-2005.

Jeremy is too modest to claim credit for his outsize contributions to this blog, so I thought I’d break convention and allow his collective comments to qualify as a “blog I read”. They are certainly worth reading. and I hope he keeps contributing once he does have a blog of its own–which is inevitable.

General

What Would Google Do? / What Does Google Do?

Post author By Daniel Tunkelang
Post date February 5, 2009
15 Comments on What Would Google Do? / What Does Google Do?

This evening, I had the opportunity to hear Jeff Jarvis talk about his recently published book, “What Would Google Do?“. That opportunity was briefly in doubt: 277 people signed up for the event at the Daylife office, which had planned for a capacity of 150. Fortunately for me, my friend Ken Ellis let me in early, and I was not turned away at the door. Which is fortunate for you, since it means I have something to blog about!

Jarvis was entertaining, as expected. He is an excellent speaker, both when he’s delivering prepared material and when he’s put on the spot by aggressive audience members, who were in no short supply.

I perhaps deserve credit (responsibility?) for inciting the mob by asking the first question, suggesting that Google was the opposite of transparent (one of the most “Googly” qualities in his enumeration) and that, if we were to learn anything from Google, it was that success is best achieved through benign dictatorship. In fact, I told Jarvis that I thought he’d already seen the light on this issue.

Jarvis didn’t even flinch. First, he made clear that he was more interested in “the idea of Google” than the company itself. Second, he argued that Google has made the world transparent, even if Google isn’t always transparent itself. Finally, he suggested that being in continuous beta was a form of transparency. I didn’t have a chance to follow up after that, but others did, and I was happy to see that the crowd, on the whole, seemed unpersuaded by the culty premise of “the idea of Google”.

But what I enjoyed far more that the advertised event was the heated conversation I had, following Jarvis’s presentation. Bob Wyman, an engineer at Google, offered a full-throated defense of as many of Google’s technology and business decisions as I could question. While we ultimately agreed to disagree, I credit him for making a serious case. Unfortunately, I couldn’t take notes and uphold my side of the argument at the same time, so I’ll apologize in advance for any details I’ve lost or garbled in my good-faith recollection.

Here is what I recollect from our discussion:

Our biggest point of contention was about the black-box nature of Google’s approach to relevance. Bob was quite familiar with the analogy to security through obscurity, but he objected that, before the discovery of public key cryptography, security through obscurity was the best game in town. In other words, the folks working on relevance ranking algorithms are still waiting for the equivalent of Diffie-Hellman or RSA.
He rejected attention bond mechanisms as gameable, though I don’t recall any explanation as to why. It’s possible that he simply wasn’t familiar with them, and that my explanation didn’t do justice to the concept.
He insisted that I wasn’t giving enough credit to Google for its experimentation–specifically, that I underestimated how much variation there was in result ranking based on the collection of simultaneous experiments running at any given time.
He felt I was being unreasonable to expect Google to disclose more about its retrieval approach, not only because it would help spammers, but also because it would unnecessarily give users more to think about.
Finally, he felt that almost any clever idea would break down because of the combined constraints imposed by the hordes of spammers, the scale of the data, and the challenges of freshness.

Bob defended Google well, and I can’t say that either of us “won” the fight. Indeed, so much hinges on whether you can believe, as he does, that Google is only limited by what technology makes possible and what its engineers can implement. He rejects my assertion that Google’s has crippled the user experience because of some philosophical predilection towards black box approaches. In fact, he maintains that Google is incredibly open for a company of its size.

Unfortunately, the truth resides in the ultimate black box: I can’t evaluate Google’s motivations from the outside. Bob invited me to work at Google to help solve the problem from the inside (I don’t think he meant that as a literal offer), but that not here or there. I think it’s only fair to judge a company from the outside. If Google wants to fix the misimpressions that I and others hold, it can certainly do so by providing more information. Absent such information from the source, I have to fall back on the public data and my powers of reasoning.

But let me say this clearly: I believe that most–perhaps all–Googlers mean well. Google’s China policy notwithstanding, I don’t think that Google is an evil company. On the whole, Google has done much more good than bad, and I believe that doing the right thing is a core company value, even if Google does not always live up to its aspirations.

My only problem is that, in the one area I’m most passionate about, it seems that Google is holding the world back. Google has become the legacy system that HCIR has to beat.

A parting joke, courtesy of Ken Ellis:

Q: What would Google do if they were a restaurant?

A: They would build a search engine and an internet ad auction system.

Uncategorized

ACM Recommendations on Open Government

Post author By Daniel Tunkelang
Post date February 5, 2009
1 Comment on ACM Recommendations on Open Government

The ACM U.S. Public Policy Committee just published its Recommendations on Open Government:

Data published by the government should be in formats and approaches that promote analysis and reuse of that data.

Data republished by the government that has been received or stored in a machine-readable format (such as as online regulatory filings) should preserve the machine-readability of that data.

Information should be posted so as to also be accessible to citizens with limitations and disabilities.

Citizens should be able to download complete datasets of regulatory, legislative or other information, or appropriately chosen subsets of that information, when it is published by government.

Citizens should be able to directly access government-published datasets using standard methods such as queries via an API (Application Programming Interface).

Government bodies publishing data online should always seek to publish using data formats that do not include executable content.

Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.

I’ve advocated for such openness myself, and I delighted that the ACM, which represents the concerns of me and tens of thousands of computer science professionals, is taking a stand on this important policy issue.

General

The Banality of Crowds

The other day, my wife told me a story that struck me as a great parable about exploratory search–that is, if true stories qualify as parables.

It was lunch time and she was facing a problem familiar to many of us city dwellers (well, New Yorkers at least): she wanted to find some place interesting to eat among the overwhelming set of options. She decided on a new approach–her version of asking “What Would Google Do?“.

She saw a couple exiting an office building and decided to follow them (at a discreet distance) to their lunch spot. She ended up at…McDonald’s. And no, I didn’t make this up, much as I might have tried!

While pop philosophers and psychologists have made much of the “wisdom of crowds“, there’s a dark side: crowds rarely come up with anything interesting. Outsourcing your decisions to a crowd may yield a satisficing decision, but don’t get your hopes up for more.

Granted, there’s more than one way to crowdsource: no one says that you have to go with a majority vote, or to follow someone at random. But leveraging the wisdom of a crowd in a more meaningful way requires a means to comprehend the diversity of views among the individuals that comprise that crowd.

In other words, you need exploratory search.

Uncategorized

Matt Cutts: Google Still Has Big Ideas

Post author By Daniel Tunkelang
Post date February 3, 2009
39 Comments on Matt Cutts: Google Still Has Big Ideas

While I strive to be fair and balanced in my coverage of companies–especially those that in any way compete with Endeca–somehow I seem to come down hard on Google.

But today I’m glad to have the opportunity to point readers to a post by Matt Cutts, the head of Google’s Webspam team (and a speaker at the SIGIR ’09 Industry Track!), defending Google against the oft-repeated charge (this time by Om Malik) that Google has run out of big ideas.

I do note that, other than the deep web research, which I covered in an earlier post, I don’t see much about how Google is innovating in search. Perhaps Google is done with search, and is focusing its innovation efforts elsewhere? While I’m personally interested in solving the open problems in the search space, I don’t doubt that investigating alternative energy is imporant too.