Month: September 2009

Transparent Text Symposium: Day 1

Post author By Daniel Tunkelang
Post date September 21, 2009
8 Comments on Transparent Text Symposium: Day 1

Wow, what an intense day at the Transparent Text symposium! I won’t try to give detailed summaries of the talks–videos will be posted after the conference, and you can get a pretty good picture from the live tweet stream at #tt09. Instead, I’ll try to capture my personal highlights and reactions.

I’ll start with Deputy U.S. CTO Beth Noveck‘s keynote about the Open Government Initiative. First, the very existence of such an initiative is incredible, given the culture of secrecy traditionally associated with Washington. Second, I like the top priority of releasing raw data so that other people can work on analyzing it, visualizing it, and generally making it more accessible either to the general public or to particular interest groups. This is very much what I had in mind in January when I posted “Information Sharing We Can Believe In” and I’m glad to see tangible progress. I was never a big fan of faith-based initiatives. 🙂

The next session was a group of talks about watchdogs and accountability–people looking at how to ensure government transparency from the outside. New York Times editor Aron Pilhofer and software developer Jeremy Ashkenas talked about DocumentCloud, an ambitious project to enable exploratory search for news documents on the open web. Sunlight Foundation co-founder and executive director Ellen Miller offered a particularly compelling example of the power of visualization: a graph correlating the campaign contributions and earmark associated with a congressman under investigation. But my favorite presenter in this section was ProPublica‘s Amanda Michel, whose thoughts about a “human test of transparency” are worth a talk in themselves. For now, I recommend you look at the two projects she discussed: Stimulus Spot Check and Off the Bus.

After lunch, we shifted gears from government transparency to more of a focus on text. The first of the two afternoon sessions was entitled “Analyzing the Written Record” and featured Matthew Gray from Google Books, Tom Tague from Open Calais (a free text annotation service that almost all of the previous speakers raved about), and Ethan Zuckerman from Harvard’s Berkman Center. All of the talks were solid, but Ethan’s was outstanding. I blogged about his Media Cloud project back in March, but it’s come a long was in the past six months and is doing something I’ve been waiting years to see someone do: comparing how different news organizations select and cover news.

The final session was about visualization. David Small offered a presentation about literally transparent text that was, in the words of Marian Dörk, “refreshingly non-utilitarian and visually stimulating”. Ben Fry showed the power of visualizing changes in a document over time–specifically, a project called “the preservation of favoured traces” that illustrates the evolution of Darwin’s On the Origin of Species. But, as expected, IBM’s Many Eyes researchers Fernanda Viégas and Martin Wattenberg stole the show with an incredibly informative and entertaining presentation about the visualization of repetition in text. No summary can do it justice, so I urge you to watch the video when it is available.

After all that, we enjoyed a nice reception at the IBM Center for Social Software. I’m incredibly grateful to IBM for organizing and sponsoring this event, and to Martin Wattenberg for being so kind as to invite me. I’ll try to earn my keep in my 5 minutes at the “Ignite-style” session tomorrow morning.

Uncategorized

Follow-Up Podcast for UIE Seminar on Faceted Search (Free!)

Post author By Daniel Tunkelang
Post date September 21, 2009
1 Comment on Follow-Up Podcast for UIE Seminar on Faceted Search (Free!)

Last month, Pete Bell and I presented a virtual seminar on faceted search for Jared Spool’s User Interface Engineering (UIE). Whether or not you attended the seminar, you can listen to a free podcast in which we answer some of the questions we didn’t get to during the seminar. If you still have an unanswered question, I encourage you to ask it in the comment thread, and I’ll do my best to answer it!

Uncategorized

Live Tweeting from Transparent Text Symposium

Post author By Daniel Tunkelang
Post date September 21, 2009

As promised, I’ll blog about the two-day Transparent Text symposium when it’s over and I have a chance to collect and express my thoughts. But for now you can follow the live Twitter stream at #tt09.

Uncategorized

Project Gaydar: A Reminder That Privacy Isn’t Binary

Post author By Daniel Tunkelang
Post date September 20, 2009
2 Comments on Project Gaydar: A Reminder That Privacy Isn’t Binary

There’s a nice article in the Boston Globe about “Project Gaydar“, a project to predict who is gay based on statistically analyzing their Facebook networks. They’ve only done ad hoc validation of their predictions, but claim that their results seem accurate. The involvement of distinguished MIT professor Hal Abelson (at least to the point where he’s quoted in the article) lends credibility to their effort.

I’m glad to finally see a real world example of the issues I blogged about last year in a post entitled “Privacy and Information Theory“:

The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.

I’m curious to see if this project advances the conversation. At the very least, I’m gratified to see my abstract ramblings validated by a real-world example!

General

T2: Judgment Day for Twine?

Post author By Daniel Tunkelang
Post date September 19, 2009

Nova Spivack, CEO and founder of Radar Networks, just released a preview (see above) announcing Twine 2.0, a semantic search engine to be released later this year. As Erick Schonfeld points out on TechCrunch, Twine hasn’t managed to attract broad adoption. I tried it briefly when it came out, and I have to confess that I never understood it.

But I can certainly see the appeal of delivering faceted search for the web to support exploratory information seeking. It’s the dream that’s been driving Bing, Freebase, not to mention smaller efforts like Kosmix. It’s hard, to be sure. But, as Sarah Lacy tells us, startups are supposed to be changing the world–and established companies can play too.

The demo video is appealing, but I’ll believe it when I can off-road on it–and on more than just recipes and restaurants, two highly structured domains that already well covered by sites like Food Network and Yelp. Twine doesn’t necessarily have to cover all domains to be useful–perhaps a “short snout” approach like Bing’s will be good enough to drive adoption.

In any case, I’m impressed with Twine’s ambition. But ambition isn’t enough–especially given the increasing number of people and companies who share it. If Nova really wants to build a “World Wide Database“, then he’ll have to do more than swing for the fences and miss. I’ll be waiting for a beta invite, and I’ll let you know what I find out.

General

Transparent Text Symposium

One of the unexpected benefits of accepting an invitation to speak at SIGMOD 2009 was an invitation from fellow participant Martin Wattenberg to attend the upcoming Transparent Text symposium at the IBM Center for Social Software:

The Transparent Text symposium is a free event that will focus on ways to make large collections of documents understandable to laypeople and experts alike. We are interested in approaches that shed light on unstructured text, ranging from novel statistical techniques to web-based crowdsourcing.

The speaker list is impressive, ranging from familiar (at least to me) interface experts Ben Fry and Marti Hearst to social scientist Gary King and Sunlight Foundation Executive Director Ellen Miller. IBM also contributed some of its own researchers to the program, including David Ferrucci, who has been leading the Jeopardy project. There’s even an “Ignite-style” session where all attendees will have the opportunity to give five-minute presentations.

I’m looking forward to the eclectic mix of speakers and attendees. As Chris Dixon recently reminded us, it’s important to introduce some randomization into our intellectual diets so that we don’t get stuck in a rut of local optimization. While an event with a theme of transparency and interacting with textual information is hardly a detour for me, I am excited about the opportunity to hear a diversity of new perspectives on this topic. There will be videos of the speakers posted after the event, as well as a Twitter stream at #tt09.

Of course, I’ll blog about what I learn and recycle it in the discussion activities at the HCIR workshop next month.

Uncategorized

Blogs I Read: The Haystack Blog

Post author By Daniel Tunkelang
Post date September 17, 2009
6 Comments on Blogs I Read: The Haystack Blog

It’s been quite the week in tech business news, with Adobe acquiring Omniture, Google acquiring reCAPTCHA and being rumored (falsely) to acquire Brightcove, Facebook announcing that is has over 300M users and is cash-flow positive, and Twitter closing a new round of funding at a $1B valuation. Recession? What recession?

But sometimes I like to get away from all that and turn back to my roots inside the ivory tower. And that leads me to one of my favorite university blogs: the Haystack Blog.

The Haystack Blog is published by faculty and grad students in the MIT Computer Science and AI Lab (CSAIL)–specifically those in the Haystack group. Principal Investigator (and occasional dance instructor) David Karger is its most prolific blogger–you might have read some of his SIGIR 2009 posts or his debate with Stefano Mazzocchi about how to properly use RDF. But other people’s posts are just as interesting–check out the most recent post by Eirik Bakke about bridging the gap between spreadsheets and relational databases.

I wish that more universities and departments would encourage their faculty and students to blog. As Daniel Lemire has pointed out, it’s a great way for academic researchers to get their ideas out and build up their reputations and networks. He should know–he leads by example. Likewise, Haystack is setting a great example for university blogs, and is a credit to MIT and CSAIL.

General

Udorse: Give Product Placement a Chance

Post author By Daniel Tunkelang
Post date September 15, 2009
4 Comments on Udorse: Give Product Placement a Chance

Those of you who don’t live and breathe the software startup scene might be oblivious that a substantial fraction of Silicon Valley is following TechCrunch50, an annual competition hosted by TechCrunch. As if it weren’t enough to have A-list judges like Marissa Mayer and Paul Graham, there’s even the fortuitous timing of Intuit acquiring 2007 TC50 winner Mint for a respectable $170M.

Here in New York, I have to confess that I haven’t had my eyes glued to the proceedings. But I have been looking at some of the entries, and one that at least stands out as distinctive is Udorse (and no, I’m not just biased because they’re local). Their premise is simple: democratize product placement through “visual endorsement”. Everyone who shares photos can embed a “udorsement” and can either pocket the advertising revenue or donate it to charity. More details from TechCrunch (naturally) and VentureBeat.

Perhaps your reaction is like mine, uncertain whether to be awed or horrified by this simple concept. Indeed, given my penchant for using ad blockers, you might think I’d be ideologically against product placement.

But I’m not, as long as it’s transparent–and, as far as I can tell, Udorse passes that test. In theory, this is advertising done right: content creators monetizing their own content by advertising goods and services they believe in–and putting their own credibility on the line to do so.

Of course, it might turn out very differently in practice. Any way of making money online brings out the worst in people, and I’m sure we’ll see lots of people try to game this service if it takes off. Meanwhile, people like me will probably block the “udorsements” like any other ads.

Or maybe not. I certainly don’t block emails from friends recommending the products they like, and I actually wish it were easier to benefit from their sincere opinions. If Udorse succeeds in a way that feels like word-of-mouth marketing, I’ll be thrilled. I think it’s a long shot, but I’m at least intrigued by their approach.

ps. No, I wasn’t payed to write this post, nor do I have any stake in Udorse. I at least have to keep my record clean for the Ethics of Blogging panel next week!

Uncategorized

Bing Visual Search Beta

Bing launched a Visual Search beta today that is fun to play with. The name may be a bit misleading–this isn’t an image search engine, let alone one that allows you to find images based on visual similarity. Rather, it’s a graphically intensive (don’t forget to install Silverlight!) way to explore a small data collection.

I agree with Elisabeth Osmeloski at Search Engine Land that the galleries included with this beta launch emphasize novelty over utility. Still, it’s nice to see a visual faceted search application for exploring the periodic table. And it’s an interesting example of micro-IR.

General

Is Bing Optimizing for the Short Snout?

Post author By Daniel Tunkelang
Post date September 14, 2009
11 Comments on Is Bing Optimizing for the Short Snout?

In a post about Bing on CNET today, Rafe Needleman comments that “it makes business sense to pour resources into popular searches. Optimizing for the short snout pays.”

First, it’s an interesting counterpoint to the conventional wisdom that search (if not the future of business as we know it) is all about the “long tail“. But second and more importantly, it’s an intriguing claim about Bing’s strategy for differentiating itself from Google.

Needleman goes on to say:

I’d wager that this is how Bing is making its gains in market share. Latest Nielsen data says Bing gained 22 percent month-over-month in August, taking it to 10.7 percent of all U.S. searches. People probably try Bing for a travel or product search (where there’s also a cash-back financial kicker) and remember their good experience, and then they try it for more obscure searches and find it good enough. It highlights, I believe, an important flaw in Google’s historic strategy of indexing the entire Web equally well and making the user interface fast and consistent above all, as opposed to specializing as dictated by the query.

While I’ve never heard this claim about Bing before, it is consistent with something I’ve noticed–and which Nick Craswell said when he talked about Bing at SIGIR 2009. In the upper left area that Bing calls the table of contents (TOC), Bing selectively presents a refinement interface based on the entity type it infers for the search query. For example, a search for Argentina returns options that include Argentina Map, Argentina Tourism, and Argentina Culture; while a search for Abraham Lincoln returns options that include Abraham Lincoln Speeches and Abraham Lincoln Facts.

It’s a nifty feature, even if marketers and reporters have struggled to label it. But, as Needleman says, it does indeed focus on the short snout. For example, there are no TOC options when you search for faceted search, since the technical term doesn’t match a recognized entity type. Searches for names of auto companies, such as Toyota, yield a rich set of options, while those for scooter companies like Vespa do not. Similarly, searches for celebrities receive VIP treatment, as compared to searches for ordinary people that just return a list of search results.

All in all, I’m inclined to agree with Needleman that Bing is focusing on the short snout–and I love that phrase to describe it. The open question is whether he’s right that users “remember their good experience, and then they try it for more obscure searches and find it good enough”. It would be great to see data to confirm or refute that hypothesis.