Author: Daniel Tunkelang

High-Class Consultant.

Follow-Up Podcast for UIE Seminar on Faceted Search (Free!)

Post author By Daniel Tunkelang
Post date September 21, 2009
1 Comment on Follow-Up Podcast for UIE Seminar on Faceted Search (Free!)

Last month, Pete Bell and I presented a virtual seminar on faceted search for Jared Spool’s User Interface Engineering (UIE). Whether or not you attended the seminar, you can listen to a free podcast in which we answer some of the questions we didn’t get to during the seminar. If you still have an unanswered question, I encourage you to ask it in the comment thread, and I’ll do my best to answer it!

Uncategorized

Live Tweeting from Transparent Text Symposium

Post author By Daniel Tunkelang
Post date September 21, 2009

As promised, I’ll blog about the two-day Transparent Text symposium when it’s over and I have a chance to collect and express my thoughts. But for now you can follow the live Twitter stream at #tt09.

Uncategorized

Project Gaydar: A Reminder That Privacy Isn’t Binary

Post author By Daniel Tunkelang
Post date September 20, 2009
2 Comments on Project Gaydar: A Reminder That Privacy Isn’t Binary

There’s a nice article in the Boston Globe about “Project Gaydar“, a project to predict who is gay based on statistically analyzing their Facebook networks. They’ve only done ad hoc validation of their predictions, but claim that their results seem accurate. The involvement of distinguished MIT professor Hal Abelson (at least to the point where he’s quoted in the article) lends credibility to their effort.

I’m glad to finally see a real world example of the issues I blogged about last year in a post entitled “Privacy and Information Theory“:

The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.

I’m curious to see if this project advances the conversation. At the very least, I’m gratified to see my abstract ramblings validated by a real-world example!

General

T2: Judgment Day for Twine?

Post author By Daniel Tunkelang
Post date September 19, 2009

Nova Spivack, CEO and founder of Radar Networks, just released a preview (see above) announcing Twine 2.0, a semantic search engine to be released later this year. As Erick Schonfeld points out on TechCrunch, Twine hasn’t managed to attract broad adoption. I tried it briefly when it came out, and I have to confess that I never understood it.

But I can certainly see the appeal of delivering faceted search for the web to support exploratory information seeking. It’s the dream that’s been driving Bing, Freebase, not to mention smaller efforts like Kosmix. It’s hard, to be sure. But, as Sarah Lacy tells us, startups are supposed to be changing the world–and established companies can play too.

The demo video is appealing, but I’ll believe it when I can off-road on it–and on more than just recipes and restaurants, two highly structured domains that already well covered by sites like Food Network and Yelp. Twine doesn’t necessarily have to cover all domains to be useful–perhaps a “short snout” approach like Bing’s will be good enough to drive adoption.

In any case, I’m impressed with Twine’s ambition. But ambition isn’t enough–especially given the increasing number of people and companies who share it. If Nova really wants to build a “World Wide Database“, then he’ll have to do more than swing for the fences and miss. I’ll be waiting for a beta invite, and I’ll let you know what I find out.

General

Transparent Text Symposium

One of the unexpected benefits of accepting an invitation to speak at SIGMOD 2009 was an invitation from fellow participant Martin Wattenberg to attend the upcoming Transparent Text symposium at the IBM Center for Social Software:

The Transparent Text symposium is a free event that will focus on ways to make large collections of documents understandable to laypeople and experts alike. We are interested in approaches that shed light on unstructured text, ranging from novel statistical techniques to web-based crowdsourcing.

The speaker list is impressive, ranging from familiar (at least to me) interface experts Ben Fry and Marti Hearst to social scientist Gary King and Sunlight Foundation Executive Director Ellen Miller. IBM also contributed some of its own researchers to the program, including David Ferrucci, who has been leading the Jeopardy project. There’s even an “Ignite-style” session where all attendees will have the opportunity to give five-minute presentations.

I’m looking forward to the eclectic mix of speakers and attendees. As Chris Dixon recently reminded us, it’s important to introduce some randomization into our intellectual diets so that we don’t get stuck in a rut of local optimization. While an event with a theme of transparency and interacting with textual information is hardly a detour for me, I am excited about the opportunity to hear a diversity of new perspectives on this topic. There will be videos of the speakers posted after the event, as well as a Twitter stream at #tt09.

Of course, I’ll blog about what I learn and recycle it in the discussion activities at the HCIR workshop next month.

Uncategorized

Blogs I Read: The Haystack Blog

Post author By Daniel Tunkelang
Post date September 17, 2009
6 Comments on Blogs I Read: The Haystack Blog

It’s been quite the week in tech business news, with Adobe acquiring Omniture, Google acquiring reCAPTCHA and being rumored (falsely) to acquire Brightcove, Facebook announcing that is has over 300M users and is cash-flow positive, and Twitter closing a new round of funding at a $1B valuation. Recession? What recession?

But sometimes I like to get away from all that and turn back to my roots inside the ivory tower. And that leads me to one of my favorite university blogs: the Haystack Blog.

The Haystack Blog is published by faculty and grad students in the MIT Computer Science and AI Lab (CSAIL)–specifically those in the Haystack group. Principal Investigator (and occasional dance instructor) David Karger is its most prolific blogger–you might have read some of his SIGIR 2009 posts or his debate with Stefano Mazzocchi about how to properly use RDF. But other people’s posts are just as interesting–check out the most recent post by Eirik Bakke about bridging the gap between spreadsheets and relational databases.

I wish that more universities and departments would encourage their faculty and students to blog. As Daniel Lemire has pointed out, it’s a great way for academic researchers to get their ideas out and build up their reputations and networks. He should know–he leads by example. Likewise, Haystack is setting a great example for university blogs, and is a credit to MIT and CSAIL.

General

Udorse: Give Product Placement a Chance

Post author By Daniel Tunkelang
Post date September 15, 2009
4 Comments on Udorse: Give Product Placement a Chance

Those of you who don’t live and breathe the software startup scene might be oblivious that a substantial fraction of Silicon Valley is following TechCrunch50, an annual competition hosted by TechCrunch. As if it weren’t enough to have A-list judges like Marissa Mayer and Paul Graham, there’s even the fortuitous timing of Intuit acquiring 2007 TC50 winner Mint for a respectable $170M.

Here in New York, I have to confess that I haven’t had my eyes glued to the proceedings. But I have been looking at some of the entries, and one that at least stands out as distinctive is Udorse (and no, I’m not just biased because they’re local). Their premise is simple: democratize product placement through “visual endorsement”. Everyone who shares photos can embed a “udorsement” and can either pocket the advertising revenue or donate it to charity. More details from TechCrunch (naturally) and VentureBeat.

Perhaps your reaction is like mine, uncertain whether to be awed or horrified by this simple concept. Indeed, given my penchant for using ad blockers, you might think I’d be ideologically against product placement.

But I’m not, as long as it’s transparent–and, as far as I can tell, Udorse passes that test. In theory, this is advertising done right: content creators monetizing their own content by advertising goods and services they believe in–and putting their own credibility on the line to do so.

Of course, it might turn out very differently in practice. Any way of making money online brings out the worst in people, and I’m sure we’ll see lots of people try to game this service if it takes off. Meanwhile, people like me will probably block the “udorsements” like any other ads.

Or maybe not. I certainly don’t block emails from friends recommending the products they like, and I actually wish it were easier to benefit from their sincere opinions. If Udorse succeeds in a way that feels like word-of-mouth marketing, I’ll be thrilled. I think it’s a long shot, but I’m at least intrigued by their approach.

ps. No, I wasn’t payed to write this post, nor do I have any stake in Udorse. I at least have to keep my record clean for the Ethics of Blogging panel next week!

Uncategorized

Bing Visual Search Beta

Bing launched a Visual Search beta today that is fun to play with. The name may be a bit misleading–this isn’t an image search engine, let alone one that allows you to find images based on visual similarity. Rather, it’s a graphically intensive (don’t forget to install Silverlight!) way to explore a small data collection.

I agree with Elisabeth Osmeloski at Search Engine Land that the galleries included with this beta launch emphasize novelty over utility. Still, it’s nice to see a visual faceted search application for exploring the periodic table. And it’s an interesting example of micro-IR.

General

Is Bing Optimizing for the Short Snout?

Post author By Daniel Tunkelang
Post date September 14, 2009
11 Comments on Is Bing Optimizing for the Short Snout?

In a post about Bing on CNET today, Rafe Needleman comments that “it makes business sense to pour resources into popular searches. Optimizing for the short snout pays.”

First, it’s an interesting counterpoint to the conventional wisdom that search (if not the future of business as we know it) is all about the “long tail“. But second and more importantly, it’s an intriguing claim about Bing’s strategy for differentiating itself from Google.

Needleman goes on to say:

I’d wager that this is how Bing is making its gains in market share. Latest Nielsen data says Bing gained 22 percent month-over-month in August, taking it to 10.7 percent of all U.S. searches. People probably try Bing for a travel or product search (where there’s also a cash-back financial kicker) and remember their good experience, and then they try it for more obscure searches and find it good enough. It highlights, I believe, an important flaw in Google’s historic strategy of indexing the entire Web equally well and making the user interface fast and consistent above all, as opposed to specializing as dictated by the query.

While I’ve never heard this claim about Bing before, it is consistent with something I’ve noticed–and which Nick Craswell said when he talked about Bing at SIGIR 2009. In the upper left area that Bing calls the table of contents (TOC), Bing selectively presents a refinement interface based on the entity type it infers for the search query. For example, a search for Argentina returns options that include Argentina Map, Argentina Tourism, and Argentina Culture; while a search for Abraham Lincoln returns options that include Abraham Lincoln Speeches and Abraham Lincoln Facts.

It’s a nifty feature, even if marketers and reporters have struggled to label it. But, as Needleman says, it does indeed focus on the short snout. For example, there are no TOC options when you search for faceted search, since the technical term doesn’t match a recognized entity type. Searches for names of auto companies, such as Toyota, yield a rich set of options, while those for scooter companies like Vespa do not. Similarly, searches for celebrities receive VIP treatment, as compared to searches for ordinary people that just return a list of search results.

All in all, I’m inclined to agree with Needleman that Bing is focusing on the short snout–and I love that phrase to describe it. The open question is whether he’s right that users “remember their good experience, and then they try it for more obscure searches and find it good enough”. It would be great to see data to confirm or refute that hypothesis.

General

Micro vs. Macro Information Retrieval

Post author By Daniel Tunkelang
Post date September 12, 2009
7 Comments on Micro vs. Macro Information Retrieval

The Probably Irrelevant blog has been quiet for a while, but I was happy to see a new post there by Miles Efron about “micro-IR“. He characterizes micro-IR, as distinct from macro or general IR, as follows:

In ad hoc (text) IR a principal intellectual challenge lies in modeling ‘aboutness.’ In micro-IR settings, the creativity comes into play in posing a useful (and tractable) question to answer. The engineering comes easily after that.
The constrained nature of micro-IR applications leads to a lightweight articulation of information need. There is a tight coupling here between task, query, and the unit of retrieval, a dynamic that I think is compelling. Pushing this a bit farther, we might consider the simple act of choosing to use a particular application from those apps on a user’s palette as part of the information need expression.
The tight coupling of task to data to ‘query’ enables a strong contextual element to inform the interaction. Context constitutes the foreground of the micro-IR interaction.

He then asks: “is micro-IR something at all? Is it actually related to IR?” Fernando Diaz answers that “the only difference between micro and macro IR is text.” Jinyoung Kim adds that in micro-IR “the context (searcher goal) is known, with domain-specific notion of relevance (goodness) and similarity measures.”

I hadn’t thought of making this particular distinction, but I like it. While I prefer to think about distinguishing the needs of information seekers–rather than the characteristics of search applications–I would be the first to argue that a well-designed search application caters to particular user needs. Indeed, I think the definition of a good micro-IR application implies that it addresses a highly constrained space of information needs. Just as importantly, micro-IR applications can often assume that their users are highly familiar with the information space the applications address, and thus that those users need less of the basic orienteering support that can be critical for success using macro-IR systems. That said, micro-IR users have (or should have) higher expectations of support for more sophisticated information seeking.

The other day, I speculated about why Google holds back on faceted search. I feel that the distinction between macro- and micro-IR is in the same vein: micro-IR settings (e.g., site search, enterprise search,vertical search) drive needs for more richer interfaces and support for interaction, while macro-IR application developers (e.g., general web search) worry mostly about producing a reasonable answer for the query–and often lead users to micro-IR destinations that offer their own support for information seeking within their constrained domains.

In short, it’s a nice way to think about the IR application space, and it’s increasingly relevant (no pun intended!) as we see a proliferation of micro-IR applications. And it’s great to see activity on the Probably Irrelevant blog after all these months of radio silence!