Categories
General

The Wolfram Cometh

If you’re curious about Wolfram Alpha but tired of reading second-hand reports and third-hand hysteria about it, then be assured that your wait for first-hand access is almost over. Their blog reports that they will launch today.

UPDATE: The Wolfram Alpha site says:

Watch a live webcast of the Wolfram|Alpha system being brought online for the first time. Friday, May 15, beginning at 7pm CST

It’s been an interesting pre-launch hype cycle, particularly since I’ve gotten to watch it from a pretty good seat–a perk of being an obsessive a prolific blogger.

The initial marketing and buzz offered a level of hyperbole comparable to the hype that surrounded the publication of Wolfram’s New Kind of Science (appreciated NKS by Wolfram and fans) seven years ago. Regulars may recall that I responded by calling it “A New Kind of Marketing (NKM)“.

Apparently sensitive to the dangers of being hyped up (and thus set up to fail) as a “Google killer”, Wolfram Alpha’s marketing team reached out to influencers (and a few little people like me), offering demos and explanations. In fact, I think they were doing an admirable job of mitigating the original damage–up until April 28th. That day, as Wolfram himself was giving a public demonstration of Wolfram Alpha at Harvard, Google releasing Google Public Data to the general public. Ouch. To add insult to injury, Google’s Matt Cutts says the timing was a coincidence–an accidental upstaging!

In any case, the release of Google Public Data amplified the pressure on Wolfram Alpha. A week later, the latter was offering preview access to reporters and bloggers, presumably knowing that the testers would compare the two offerings side by side. Meanwhile, Google has continued raising the stakes through announcements like this week’s preview of Google Squared. I doubt the two companies are as focused on one another as the blogosphere makes them out to be, but it’s certainly an entertaining David vs. Goliath story (albeit where the David has a Goliath-sized ego).

And today is public launch day for Wolfram Alpha. I wish them luck! But I’m pretty sure that many of the people who’ve been waiting for this access will be disillusioned as they struggle with the  NLP interface. None of the marketing team’s attempts at expectation management can mitigate the frustration of an undocumented, brittle interface. Ah well. I did try to tell them. I hope they can make it past that initial blow and then reconsider their approach to the interface.

Categories
General

Reprising the Enterprise Search Summit

‘Tis the season of search conferences, at least for me, and I spent the last couple of days attending the Enterprise Search Summit in New York. I enjoyed it immensely–it’s one of the better networking events in the industry. It was great to catch up with analysts, consultants, information architects, and even competitors–or, as Nate put it, “respected foes”. I’m grateful to Michelle Manafy for putting it all together, and to Will Evans for getting me included on a social search panel.

I found the informal conversations and the final discussion session to be the most valuable activities during the two days. But I also appreciated a number of the presentations, particularly Jared Spool‘s keynote on “Search, Scent, and the Happiness of Pursuit”. Perhaps unsurprisingly, I found a few of the talks disappointingly shallow or salesy. I wish that more vendors and consultants realized that people expect substance from a 45-minute talk, and that the best sales pitch is to deliver that substance. I’m determined to see that all the talks at the SIGIR ’09 Industry Track are substantive; I hope I’ve recruited a line-up that will deliver!

One thing I also liked about the conference is that that the various participating vendors (and there were a lot of us!) were very collegial. There was, however, an exception that I feel compelled to point out. An article on internetnews.com sums it up: “Google Talks Enterprise Search, Bashes Microsoft“, in which Nitin Mangtani, lead product manager for Google enterprise search, said:

One way of doing enterprise search would be to start something in 2001 that didn’t work. You could then do a complete overhaul in 2003, which also didn’t work. In 2007, you could launch a rip-and-replace system and then … you could acquire a large, random, non-integrated system. I’m not going to name any specific company.

I thought that was catty, especially since the reference was obvious to an audience of enterprise search professionals. Moreover, Mangtani’s characterization of the enterprise search space was, to put it diplomatically, interesting. He described Google’s approach to enterprise search as being distinctive because of their attention to structured and semi-structured data. If you’re familiar with both Google‘s and Endeca‘s offerings (or FAST‘s for that matter), I think you’ll share my surprise at this particular characterization. I don’t want to commit the same offense I’m criticizing. I’ve had cordial exchanges with a few of the Google enterprise folks (including Mangtani), and I think Google has a respectable offering in the enterprise space. But they should be big enough (and mindful enough of Google’s corporate reputation) to treat their competitors with respect–and they might do a bit more homework on competitive analysis.

In any case, I can’t complain about Endeca’s visibility among participants and speakers. It seemed that I was hearing Endeca mentioned in every other talk attended–quite a feat considering that none of the talks were by Endeca partisans and that, until recently, Endeca’s marketing department studiously avoided using the term “enterprise search”! Even now, the official corporate positioning is that Endeca enables search applications. I understand the distinction, and in any case I probably shouldn’t use my blog to second-guess my colleagues in marketing. Still, it’s clear that many people looking for what they consider to be enterprise search want what Endeca has to offer, and I’m not one to let a vocabulary problem get in the way of selling software and meeting customers’ needs!

All in all, two days well spent, though I’m glad to get back to my blog–and to my day job. At least I get a small break before the the Text Analytics Summit on June 1-2!

Categories
Uncategorized

Is Google Diving Head First Into HCIR?

I’ve been at the Enterprise Search Summit all day, so I didn’t have the chance to pore through the buzz about Google’s Searchology announcements. But I did see a snippet that struck me as very unusual language for Google:

Our first announcement today is a new set of features that we call Search Options, which are a collection of tools that let you slice and dice your results and generate different views to find what you need faster and easier. Search Options helps solve a problem that can be vexing: what query should I ask?

Google, focusing on query refinement and elaboration? I’m all ears! In fact, eyes and ears–here is the video tour:

Well, on second thought, it’s a bit of a rehash of features they’d already rolled out, and that I personally didn’t find overwhelming (see here and here). Still, I’m pleased that their marketing language is embracing HCIR–that’s a big step for a company that has perhaps done more than anyone to emphasize the primacy of relevance ranking in the search interface. Even if they’re only taking baby steps at this point, I am cautiously optimistic that they will build on them.

Categories
General

Catching Up With Hunch

Last week, I stopped by the Hunch office to learn more about what they’re doing, as well as to contribute my own thoughts about socially enhanced decision making. I consider Hunch, like Aardvark, to be an example of social search, but I recognize that I use the term in a broad sense. Perhaps, as Jeremy suggests, it’s better to think of social search and collaborative search being different aspects of multi-person search.

In any case, Hunch is doing some interesting things. Their mission, roughly speaking, is to become a Wikipedia for decision making. They are inspired by human computation success stories like 20Q.net and presumably the ESP Game. Their general approach is to learn about people by asking them multiple-choice questions that help cluster them demographically (“Teach Hunch About You”), and then to create customized decision trees to help people find their own answers to questions. The questions themselves are crowd-sourced from users (though now they are vetted first in a “workshop”).

They’re learning as they go along. For example, they’ve recognized that it’s important to distinguish between objective questions (e.g., concerning the price of a product) and questions of taste (e.g., what is art?). They’re also experimenting with interface tweaks, including giving users more control over what information their algorithms use to rank potential answers, and allowing users to short-circuit the decision tree at any time by skipping to the end.

Perhaps of particular interest to readers here, they’ve made an API available, which you can also play with in a widget on their blog.

As I told my friend at Hunch, I’m still skeptical about decision trees. Maybe I’m a bit too biased toward faceted search, but I don’t like having such a rigid decision making process. Apparently they’re not wedded to decision trees, but they are understandably concerned about creating a richer interface that might turn off or  intimidates ordinary users. I can’t deny that decision trees are simple to use, and I can’t argue with their 77% success rate.

Still, the rigidity of a decision tree leaves me a bit cold. Even if it leads me to the right choice, it doesn’t give me the necessary faith in that choice. Transparency helps, and I like that you can click on “Why did Hunch pick this?” to see what in your question-specific or personal profile led Hunch to recommend that answer. But I’d like more freedom and less hand-holding.

I still have a handful of invites; let me know if you’re interested. As usual, first come, first serve.

Categories
Uncategorized

Designing for Faceted Search

While I was inundated with conferences a couple of weeks ago, I missed s a nice article by Stephanie Lemieux at User Interface Engineering (a site I recommend in general) entitled “Designing for Faceted Search“. It briefly explains faceted search and offers some usability tips. It’s not quite as comprehensive as my upcoming book, but it’s also free and is somewhat less than 100 pages.

Of course, I’m delighted that she uses a couple of Endeca-powered examples (NCSU Libraries, Buzzilions). She also cites the Financial Times, but links to the ft.com (which I believe is powered by FAST, a subsidiary of Microsoft) rather than the recently launched Newssift, which uses Endeca.

Just one quibble: she says that “Just 3 facets with 5 terms each can represent 243 possible combinations.” I suspect she transposed the 3 and the 5. The right number of combinations is 125 = 5^3, since a combination represents 3 independent selections from 5 possible choices.

Categories
General

The Twouble with Twitter Search

There has been a flurry of reports about Twitter search–whether about Twitter’s plans to improve their search functionality or about alternative ways to search Twitter. But Danny Sullivan makes a great point in a recent post about Google:

Ironically, Google gets a taste of its own medicine with Twitter. It still can’t access the “firehose” of information that Twitter has, in order to build a decent real-time search service. If it can’t strike a deal, expect to hear the company start pushing on how “real-time data” should be open.

Of course, that logic applies not only to Google, but also to anyone with aspirations to build a better mousetrap for Twitter search. As things stand, applications can’t do much better than post-processing native Twitter search results–which makes it hard to offer any noticeable improvement on them. If Twitter offered full Boolean set retrieval (e.g., if a search for star trek returned the set of all tweets containing both words), then applications could implement lots of interesting algorithms and interfaces  on top of their API. I’d love to work on exploratory search applications myself! But the trickle that Twitter returns is hardly enough.

I believe this limitation is by design–that Twitter knows the value of such access and isn’t about to give it away. I just hope Twitter will figure out a way to provide this access for a price, and that an ecology of information access providers develops around it. Of course, if Google or Microsoft buys Twitter first, that probably won’t happen.

Categories
General

What Should I Say About Social Search?

I’ll be at the Enterprise Search Summit in New York next week, participating on a panel Tuesday morning to discuss “Emergent Social Search Experience”. Our game plan as a panel is to discuss what social search is, why it matters, and how to implement it.

Obviously these are broad questions, but here are my rough notes:

WHAT: Social search means many things, but they have one common thread: improving information seeking through the knowledge and efforts other people. Back in the mid 90s, researchers distinguished between semantic and social navigation as the ability to explore information based on its objective, semantic structure, versus choosing a perspective based on the activity of another person or group of people. Perhaps the earliest instance of social search was collaborative filtering, still popular today as driver for product recommendations on sites like Amazon. But social search is much more than collaborative filtering. Building on the 90s vision of social navigation, we can give users full control over a social lens through which to view information, e.g., show me the local restaurants where women in my mom’s demographic like to eat brunch. Social search also includes explicit and implicit collaborative approaches, such as finding an expert to help you with a search, or building shared knowledge management artifacts that increase the collective efficiency of information seeking.

WHY: The “why” of social search depends on the specific aspect of social search that we’re discussing. But the common theme is this: we all know that, for a large swath of information needs, we prefer to turn to a person than to ask a machine. Sometimes that’s appropriate, and it’s a question of finding the right person to ask. But often we have no need to bother any one; we just want to borrow someone else’s perspective—or to assemble a composite perspective. There’s an efficiency gain of not reinventing the wheel, as well as an upside of discovering people (or information by way of those people) that may be valuable to you in ways you didn’t anticipate.

HOW: Again, it depends on the aspect of social search. We need rich knowledge representations that treat both information and people as first-class objects, and interfaces that let people seamlessly use both. Endeca does this by supporting record relationship navigation for multiple entity types (e.g., documents, people), as do interfaces like David Huynh’s Freebase Parallax. To facilitate collective knowledge management, we need to make contribution both easy and rewarding: the reason people don’t contribute to such systems today is that they are onerous and don’t work. Some of the work Endeca has done with folksonomies is encouraging: we found that we can productively recycle folksonomies (or even search logs) in combination with automatic text mining techniques. Finally, we need to rethink our attitudes toward privacy, anonymity, and reputation. Consumer social networks like Facebook and Twitter have shown us that users are willing to forgo privacy in order to gain social benefits. Wikipedia has shown us that a group of strangers can assemble a valuable collective knowledge store. But Wikipedia, product reviews, blog comments, etc. have shown us that the default of anonymity can undermine the trust we have in these socially constructed artifacts. As we evolve these tools—and as we work to apply them within the enterprise, we need to simultaneously work to evolve our social norms.

Those are my thoughts. But, in the spirt of social search, I’d love to reach out to experts here for ideas. If you were attending a panel about social search, specifically in the context of an event target to enterprise search practitioners, what would you want to hear about? For that matter, if you were participating on such a panel, what would you talk about? Bear in mind that the audience will consist of practitioners, not researchers, and I’ll only have one third of a 45-minute session–some of that reserved for Q&A.

Categories
Uncategorized

More Thoughts on Image Retrieval

After my recent posts about Google’s similarity browsing for images, a colleague reached out to me to educate me about some of the recent advances in image retrieval. This colleague is involved with an image retrieval startup and felt uncomfortable posting comments publicly, so we agreed that I would paraphrase them in a post under my own name. I thus accept accountability for the post, but cannot take credit for expertise or originality.

Some of the discussion in the comment threads mentioned scale-invariant feature transform (SIFT), an algorithm to detect and describe local features in images. What I don’t believe anyone mentioned is that this approach is patented–certainly a concern for people with commercial interest in image retrieval.

There’s also the matter of scaling in a different sense–that is, handling large sets of images. People interested in this problem may want to look at “Scalable Recognition with a Vocabulary Tree” by David Nistér and Henrik Stewénius. They map image features to “visual words” using a hierarchical k-means approach. While mapping image retrieval to text retrieval approaches is not new, their large-vocabulary approach was novel and made significant improvement to scalability, as well as being robust to occlusion, viewpoint and lighting change. The paper has been highly cited.

But there are problems with this approach in practice. For example, images from cell phone cameras are low-quality and blurry, and Nistér and Stewénius’s approach is unfortunately not resilient to blur. Accuracy and latency are also challenges.

In general, some of the vision literature about which are the best features to use don’t seem to work so well outside the lab, and the reason may be that images used for such experiments in the literature are of much higher quality than those in the field–particuarly for cell phone images.

An alternative to SIFT is “gist”, an approach based on global descriptors. This approach is not resilient to occlusion or rotation, but it does scale much better than SIFT, and may serve well for some duplicate detection–a problem that, in my view, is a deal-breaker for applications like similarity browsing–and which certainly is a problem for Google’s current approach.

In short, image retrieval is still a highly active area, and different approaches are optimized for different problems. I was delighted to have a recent guest post from AJ Shankar of Modista about their approach, and I encourage others to contribute their thoughts.

Categories
General

Playing With Wolfram Alpha

Woo hoo, I have preview access to Wolfram Alpha! I’ve only had a short time to play with it, but I can already report that my experience confirms my previously expressed expectations: the NLP is very brittle, but there’s great potential for structured queries on quantitative data. Here is an example use case that, in my view, shows Wolfram Alpha’s strengths:

Wolfram Alpha

This bit of analysis tells a great story: Microsoft has almost three times as much revenue as Google, but Google has about 50% higher revenue per employee. Meanwhile, Yahoo is in third place on revenue,  number of employees, and revenue per employee. Ouch.

As I said, this query shows Wolfram Alpha favorably. What you don’t see are the false starts it took me to get this query to work. The NLP interface, in my view, is a really bad idea. Instead, Wolfram Alpha should be helping users generate good structured queries–and, better yet, helping other businesses build such queries through APIs. Wolfram Alpha could deliver an excellent plug-in for Excel, if they can expose a workable query API. I have no idea whether the company is able or willing to go down this path, but I hope someone there is listening to this free advice.

I can’t share my account, but I’m willing to take suggestions for queries through the comment thread, and I’ll try my best to share what I learn.

Categories
Uncategorized

Got Hate Tweets?

I had the novel experience today of discovering that someone set up a Twitter account for the sole purpose of harassing me personally. I’m not sure what exactly I did to deserve this honor, but I’m amused by the personal attention, since I’m hardly a Twitter celebrity. Perhaps it’s someone I know, conducting a social experiment to see how I’ll react. Ah, the wonder of online anonymity.