Author: Daniel Tunkelang

High-Class Consultant.

An Open Letter to the USPTO

Post author By Daniel Tunkelang
Post date September 25, 2010
112 Comments on An Open Letter to the USPTO

Following the Supreme Court’s decision in Bilski v. Kappos, the United States Patent and Trademark Office (USPTO) plans to release new guidance as to which patent applications will be accepted, and which will not. As part of this process, they are seeking input from the public about how that guidance should be structured. The following is an open letter than I have sent to the USPTO at Bilski_Guidance@uspto.gov. More information is available at http://en.swpat.org/wiki/USPTO_2010_consultation_-_deadline_27_sept and http://www.fsf.org/news/uspto-bilski-guidance. As with all of my posts, the following represents my personal opinion and is not the opinion or policy of my employer.

To whom it may concern at the United States Patent Office:

Since completing my undergraduate studies in mathematics and computer science at the Massachusetts Institute of Technology (MIT) and my doctorate in computer science at Carnegie Mellon University (CMU), I have spent my entire professional life in software research and development. I have worked at large software companies, such as IBM, AT&T, and Google, and I also was a founding employee at Endeca, an enterprise software company where I served as Chief Scientist. I am a named inventor on eight United States patents, as well as on eighteen pending United States patent applications. I played an active role in drafting and prosecuting most of these patents. I have also been involved in defensive patent litigation, which in one case resulted in the re-examination of a patent and a final rejection of most of its claims.

As such, I believe my experience gives me a balanced perspective on the pros and cons of software patents.

As someone who has developed innovative technology, I appreciate the desire of innovators to reap the benefits of their investments. As a founding employee of a venture-backed startup, I understand how venture capitalists and other investors value companies whose innovations are hard to copy. And I recognize how, in theory, software patents address both of these concerns.

But I have also seen how, in practice, software patents are at best a nuisance and innovation tax and at worst a threat to the survival of early-stage companies. In particular, I have witnessed the proliferation of software patents of dubious validity that has given rise to a “vulture capitalist” industry of non-practicing entities (NPEs), colloquially known as patent trolls, who aggressively enforce these patents in order to obtain extortionary settlements. Meanwhile, the software companies where I have worked follow a practice of accumulating patent portfolios primarily in order to use them as deterrents against infringement suits by companies that follow the same strategy.

My experience leads me to conclude that the only beneficiaries of the current regime are patent attorneys and NPEs. All other parties would be benefit if software were excluded from patent eligibility. In particular, I don’t believe that software patents achieve either of the two outcomes intended by the patent system: incenting inventors to disclose (i.e., teach) trade secrets, and encouraging investment in innovation.

First, let us consider the incentive to disclose trade secrets. In my experience, software patents fall into two categories. The first category focuses on interfaces or processes, avoiding narrowing the scope to any non-obvious system implementation details. Perhaps the most famous example of a patent in this category is Amazon’s “one-click” patent. The second category focuses on algorithm or infrastructure innovations that typically implemented as inside of proprietary closed-source software. An example in this category is the patent on latent semantic indexing, an algorithmic approach used in search and data mining applications. For the first category, patents are hardly necessary to incent disclosure, as the invention must be disclosed to realize its value. Disclosure is meaningful for patents in the second category, but in my experience most companies do not file such patents because they are difficult to enforce. Without access to a company’s proprietary source code, it is difficult to prove that said source code is infringing on a patent. For this reason, software companies typically focus on the first category of patents, rather than the second. And, as noted, this category of innovation requires no incentive for disclosure.

Second, let us ask whether software patents encourage investment in innovation. Specifically, do patents influence decisions by companies, individual entrepreneurs, or investors to invest time, effort, or money in innovation?

My experience suggests that they do not. Companies and entrepreneurs innovate in order to further their business goals and then file patents as an afterthought. Investors expect companies to file patents, but only because everyone else is doing it, and thus patents offer a limited deterrent value as cited above. In fact, venture capitalists investing in software companies are some of the strongest voices in favor of abolishing software patents. Here are some examples:

Chris Dixon, co-founder of software companies SiteAdvisor and Hunch and of seed-stage venture capital fund Founder Collective, says:

Perhaps patents are necessary in the pharmaceutical industry. I know very little about that industry but it would seem that some sort of temporary grants of monopoly are necessary to compel companies to spend billions of dollars of upfront R&D.

What I do know about is the software/internet/hardware industry. And I am absolutely sure that if we got rid of patents tomorrow innovation wouldn’t be reduced at all, and the only losers would be lawyers and patent trolls.

Ask any experienced software/internet/hardware entrepreneur if she wouldn’t have started her company if patent law didn’t exist. Ask any experienced venture investor if the non-existence of patent law would have changed their views on investments they made. The answer will invariably be no (unless their company was a patent troll or something related).

http://cdixon.org/2009/09/24/software-patents-should-be-abolished/

Brad Feld, co-founder of early-stage venture capital firms Foundry Group, Mobius Venture Capital and TechStars, says:

I personally think software patents are an abomination. My simple suggestion on the panel was to simply abolish them entirely. There was a lot of discussion around patent reform and whether we should consider having different patent rules for different industries. We all agreed this was impossible – it was already hard enough to manage a single standard in the US – even if we could get all the various lobbyists to shut up for a while and let the government figure out a set of rules. However, everyone agreed that the fundamental notion of a patent – that the invention needed to be novel and non-obvious – was at the root of the problem in software.

I’ve skimmed hundreds of software patents in the last decade (and have read a number of them in detail.) I’ve been involved in four patent lawsuits and a number of “threats” by other parties. I’ve had many patents granted to companies I’ve been an investor in. I’ve been involved in patent discussions in every M&A transaction I’ve ever been involved in. I’ve spent more time than I care to on conference calls with lawyers talking about patent issues. I’ve always wanted to take a shower after I finished thinking about, discussing, or deciding how to deal with something with regard to a software patent.

I’ll pause for a second, take a deep breath, and remind you that I’m only talking about software patents. I don’t feel qualified to talk about non-software patents. However, we you consider the thought that a patent has to be both novel AND non-obvious (e.g. “the claimed subject matter cannot be obvious to someone else skilled in the technical field of invention”), 99% of all software patents should be denied immediately. I’ve been in several situations where either I or my business partner at the time (Dave Jilk) had created prior art a decade earlier that – if the patent that I was defending against ever went anywhere – would have been used to invalidate the patent.

http://www.feld.com/wp/archives/2006/04/abolish-software-patents.html

Fred Wilson, managing partner of venture-capital firm Union Square Ventures:

Even the average reader of the Harvard Business Review has a gut appreciation for the fundamental unfairness of software patents. Software is not the same as a drug compound. It is not a variable speed windshield wiper. It does not cost millions of dollars to develop or require an expensive approval process to get into the market. When it is patented, the “invention” is abstracted in the hope of covering the largest possible swath of the market. When software patents are prosecuted, it is very often against young companies that independently invented their technology with no prior knowledge of the patent.

http://www.unionsquareventures.com/2010/02/software-patents-are-the-problem-not-the-answer.php

In summary, software patents act as an innovation tax rather than a catalyst for innovation. Perhaps it is possible to resolve the problems of software patents through aggressive reform. But it would be better to abolish software patents than to maintain the status quo.

Sincerely,

Daniel Tunkelang

General

Search at the Speed of Thought

Post author By Daniel Tunkelang
Post date September 20, 2010
74 Comments on Search at the Speed of Thought

A guiding principle in information technology has been to enable people to perform tasks at the “speed of thought”. The goal is not just to make people more efficient in our use of technology, but to remove the delays and distractions that make us focus on the technology rather than the tasks themselves.

For example, the principle motivation for the faceted search work I did at Endeca was to eliminate hurdles that discourage people from exploring information spaces. Most sites already offered user the ability to perform this exploration through advanced or parametric search interfaces–indeed, I recall some critics of faceted search objecting that it was nothing new. But there’s a reason that most of today’s consumer-facing sites place faceted search front and center while still relegating advanced search interfaces to an obscure page for power users. Faceted search offers users the fluidity and instant feedback that makes exploration natural for users. Once you’re used to it, it’s hard to live without it, whether your looking for real estate (compare Zillow.com to housing search on craigslist), library books (compare the Triangle Research Libraries Network to the Library of Congress), or art (compare to art.com to artnet).

Why is faceted search such a significant improvement over advanced or parametric search interfaces? Because it supports exploration at the speed of thought. If it takes you several seconds–rather than a single click–to refine a query, and if you have to repeatedly back off from pages with no results (aka dead ends), your motivation to explore a document collection fades quickly. But when that experience is fluid, you explore without even thinking about it. That is the promise (admittedly not always fulfilled) of faceted search.

Microsoft Live Labs director Gary Flake offered a similar message in his SIGIR 2010 keynote. He argued that we needed to replace our current discrete interactions with search engines into a mode of continuous, fluid interaction where the whole of data is greater than sum or parts. While he offered Microsoft’s Pivot client as an example of this vision, he could also have invoked the title of a book that Bill Gates wrote in 1999: Business @ the Speed of Thought. Indeed, anyone who has ever worked on data analysis understands that you ask fewer questions when you know you’ll have to wait for answers. Speed changes the way you interact with information.

And at Google, speed has been an obsession since day one. It makes the top 3 on the “Ten things we know to be true” list:

3. Fast is better than slow.

We know your time is valuable, so when you’re seeking an answer on the web you want it right away – and we aim to please. We may be the only people in the world who can say our goal is to have people leave our website as quickly as possible. By shaving excess bits and bytes from our pages and increasing the efficiency of our serving environment, we’ve broken our own speed records many times over, so that the average response time on a search result is a fraction of a second. We keep speed in mind with each new product we release, whether it’s a mobile application or Google Chrome, a browser designed to be fast enough for the modern web. And we continue to work on making it all go even faster.

People have made much of Google VP Marissa Mayer’s estimate that Google Instant will save 350 million hours of users’ time per year by shaving two to five seconds per search. That’s an impressive number, but I personally think it understates the impact of this interface change. Rather, I’m inclined to focus on a phrase I’ve seen repeatedly associated with Google Instant: “search at the speed of thought”.

What does that mean in practice? I see two major wins from Google Instant:

1) Typing speed and spelling accuracy don’t get in the way. For example, by the time you’ve typed [m n], you see results for M. Night Shyamalan, a name whose length and spelling might frustrate even his fans. A search for [marc z] offers results for Facebook CEO Mark Zuckerberg. Admittedly, the pre-Instant type-ahead suggestions already got us most of the way there, but the feedback of actual results offers not just guidace but certainty.

2) Users spend less–and hopefully no time–in a limbo where they don’t know if the system has understood the information-seeking intent they have expressed as a query. For example, if I’m interested in learning more about the Bob Dylan song “Forever Young“, I might enter [forever young] as a search query–indeed, the suggestion shows up as soon as I’ve typed in “fore”. But a glance at the first few instant results for [forever young] makes it clear that there are lots of songs by this title (including those by Rod Stewart and Alphaville–as well as the recent Jay Z song “Young Forever” that reworks the latter). Realizing that my query is ambiguous, I type the single letter “d” and instantly see results for the Dylan song. Yes, I could have backed out from an unsuccessful query and then tried again, but instant feedback means far less frustration.

Google Instant also makes it a little easier for users to explore the space of queries related to their information need, but exploration through instant suggestions is very limited compared to using related searches or the wonder wheel–let alone what we might be able to do with faceted web search. I’d love to see this sort of exploration become more fluid, but I recognize the imperative to maintain the simplicity of the search box. Good for us HCIR folks to know that there’s still lots of work to do on search interface innovation!

But, in short, speed matters. Instant communication has transformed the way we interact with one another–both personally and professionally. Instant search is more subtle, but I think it will transform the way we interact with information on the web. I am very proud of my colleagues’ collective effort to make it possible.

Uncategorized

New Web Site for HCIR Workshop

Post author By Daniel Tunkelang
Post date September 11, 2010

In 2007, I persuaded MIT graduate students Michael Bernstein and Robin Stewart (who was interning at Endeca that summer) to help organize the first Workshop on Human-Computer Information and Information Retrieval (HCIR 2007), which we held at MIT and Endeca. Its success convinced us to keep going, and we enjoyed record attendance at this year’s HCIR 2010, held at Rutgers University.

As the workshop has grown, we as organizers have realized that we need to invest a little in its online presence. A first step in that direction is a new site for the workshop: http://hcir.info/. The site contains all of the proceedings from the four annual workshops in one place. It is powered by Google Sites, which will make it easy for a bunch of us (and perhaps some of you) to collaboratively maintain it.

I hope everyone here finds the new site useful. Please feel free to come forward with ideas for improving it! But be warned–if you have a great idea, I might ask you to implement it yourself.

General

David Petrou Presents Google Goggles at NY Tech Meetup

Post author By Daniel Tunkelang
Post date September 9, 2010

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=nytmsearchbysightgooglegoggles-100908232259-phpapp01&rel=0&stripped_title=search-by-sight-google-goggles

Image recognition is one of those problems that has presented long-standing challenges to computer scientists, despite being taken for granted by science fiction writers. Google Goggles represents one of the most audacious efforts to implement image recognition on on a massive scale.

Tonight, I had the pleasure of watching my colleague, David Petrou, present a live demo of Goggles to about 800 people who filled the NYU Skirball Center to attend the NY Tech Meetup. Many thanks to Nate Westheimer and Brandon Diamond for giving Google the opportunity to present this cool technology to a very engaged audience and in particular to show off some of the technology that Googlers are building here in New York City.

You can’t see the live demo in the slides, so I encourage you to view a recording of the presentation here.

Also, if you’re in the New York area and interested in hearing about upcoming Google NYC events, please sign up at http://bit.ly/googlenycevents.

General

Slouching Towards Creepiness

Post author By Daniel Tunkelang
Post date September 7, 2010
13 Comments on Slouching Towards Creepiness

One of the perks of blogging is that publishers sometimes send me review copies of new books. I couldn’t help but be curious about a book entitled “The Man Who Lied to His Laptop: What Machines Teach Us About Human Relationships“–especially when principal author Clifford Nass is the director of the Communications between Humans and Interactive Media (CHIMe) Lab at Stanford. He wrote the book with Corina Yen, the editor-in-chief of Ambidextrous, Stanford’s journal of design.

They start the book by reviewing evidence that people treat computers as social actors. Nass writes:

to make a discovery, I would find any conclusion by a social science researcher and change the sentence “People will do X when interacting with other people” to “People will do X when interacting with a computer”

They then apply this principle by using computers as confederates in social science experiments and generalizing conclusions about human-compter interaction to human-human interaction. It’s an interesting approach, and they present results about how people respond to praise and criticism, similar/opposite personalities, etc. You can get a taste of Nass’s writing from an article he published in the Wall Street Journal entitled “Sweet Talking Your Computer“.

The book is interesting and entertaining, and I won’t try to summarize all of its findings here. Rather, I’d like to explore its implications.

Applying the “computers are social actors” principle, they cite a variety of computer-aided experiments that explore people’s social behaviors. For example, they cite a Stanford study on how “Facial Similarity Between Voters and Candidates Causes Influence” , in which secretly morphing a photo of a candidate’s face to resemble the voter’s face induces a significantly positive effect on the voter’s preference. They also cite another experiment on similarity attraction that varies a computer’s “personality” to be either similar or opposite to that of the experimental subject. A similar personality draws a more positive response than an opposite one, but the most positive response comes from the computer starts off with an opposite personality and then adapts to conform to the personality of the subject. Imitation is flattery, and–as yet another of their studies shows–flattery works.

It’s hard for me to read results like these and not see creepy implications for personalized user interfaces. When I think about the upside of personalization, I envision a happy world where we see improvement in both effectiveness and user satisfaction. But clearly there’s a dark side where personalization takes advantage of knowledge about users to manipulate their emotional response. While such manipulation may not be in the users’ best interests, it may leave them feeling more satisfied. Where do we draw the line between user satisfaction and manipulation?

I’m not aware of anyone using personalization this way, but I think it’s a matter of time before we see people try. It’s not hard to learn about users’ personalities (especially when so many like taking quizzes!), and apparently it’s easy to vary the personality traits that machines project in generated text, audio, and video. How long will it before people put these together? Perhaps we are already there.

O brave new world that has such people and machines in it. Shakespeare had no idea.

General

HCIR 2010: Bigger and Better than Ever!

Post author By Daniel Tunkelang
Post date August 27, 2010
6 Comments on HCIR 2010: Bigger and Better than Ever!

Last Sunday was HCIR 2010, the Fourth Annual Workshop on Human-Computer Interaction and Information Retrieval, held at Rutgers University in New Brunswick, collocated with the Information Interaction in Context Symposium (IIiX 2010).

With 70 registered attendees, it was the biggest HCIR workshop we have held. Rutgers was a gracious host, providing space not only for the all-day workshop but also for a welcome reception the night before.

And, based on an informal survey of participants, I can say with some semblance of objectivity that this was the best HCIR workshop to date.

The opening “poster boaster” session was particularly energetic. There was no award for best boaster, but Cathal Hoare won an ovation by delivering his boaster as a poem:

If a picture is worth a thousand words

Surely to query formulation a photo affords

The ability to ask ‘what is that’ in ways that are many

But for years we have asked how can-we

Narrow the search space so that in reasonable time

We can use images to answer questions that are yours and mine

In my humble poster I will describe

How recent technology and users prescribe

A solution that allows me to point and click

And get answers so that I don’t feel so thick

About my location and my environment

And to my touristic explorations bring some enjoyment

Now if after all that you feel rather dazed

Please come by my poster and see if you are amazed….

As in past years, we enlisted a rock-star keynote speaker–this time, Google UX researcher Dan Russell. His slides hardly do justice to his talk–especially without the audio and video–but I’ve embedded them here so that you can get a flavor for his presentation on how we need to do more to improve the searcher.

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir-keynote-talk-russell-aug-22-2010-100827000301-phpapp01&stripped_title=dan-russell-search-quality-and-user-happiness

We accepted six papers for the presentation sessions–sadly, one of the presenters could not make it because of visa issues. The five presentations covered a variety of topics relating to tools, models, and evaluation for HCIR. The most intriguing of these (to me, at least) was a presentation by Max Wilson about “casual-leisure searching”–which he argues breaks our current models of exploratory search. Check out the slides below, as well as Erica Naone’s article in Technology Review on “Searching for Fun“.

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir2010pres-100824083643-phpapp02&stripped_title=hcir2010-casualleisure-search

As always, the poster session was the most interactive. Part of the energy came from HCIR Challenge participants showing off their systems in advance of the final session that would decide which of them would win. In any case, I felt like a heel having to walk through the hall of poster three times in order to herd people back to their seats.

Which brings us to the Challenge. When I first suggested the idea of a competition or challenge to my co-organizers back in February, I wasn’t sure we could pull it off. Indeed, even after we managed to obtain the use of the New York Times Annotated Corpus (thank you, LDC!) and a volunteer to set up a baseline system in Solr (thank you, Tommy!), I still worried that we’d have a party and no one would come. So I was delighted to see six very credible entries competing for the “people’s choice” award.

All of the participants offered interesting ideas: custom facets, visualization of the associations between relevant terms, multi-document summarization to catch up on a topic, and combining topic modeling with sentiment analysis to analyzing competing perspectives on a controversial issue. The winning entry, presented by Michael Matthews of Yahoo! Labs Bareclona, was the Time Explorer. As its name suggests, it allows users see the evolution of a topic over time. A cool feature is that it parses absolute and relative dates from article test–in some cases references to past or future times outside the publication span of the collection. Moreover, the temporal visualization of topics allows users to discover unexpected relationships between entities at particular points in time, e.g., between Slobodan Milosevic and Saddam Hussein. You can read more about it in Tom Simonite’s Technology Review article, “A Search Service that Can Peer into the Future“.

In short, HCIR 2010 will be a tough act to follow. But we’re already working on it. Watch this space…

General

Exploring Nuggetize

I’ve been exchanging emails with Dhiti co-founder Bharath Mohan about Nuggetize, an intriguing interface that surfaces “nuggets” from a site to reduce the user’s cost of exploring a document collection. Specifically Nuggetize targets research scenarios where users are likely to assemble a substantial reading list before diving into it. You can try Nuggetize on the general web or on a particular site that has been “nuggetized”, e.g., a blog like this one or Chris Dixon’s.

I’m always happy to see people building systems that explicitly support exploratory search (and am looking forward to seeing the HCIR Challenge entries in a week!). Regular readers may recall my coverage of Cuil, Kosmix, and Duck Duck Go. And of course I helped build a few of my own at Endeca. So what’s special about Nuggetize?

Mohan describes it as a faceted search interface for the web. I’ll quibble here–the interface offers grouped refinement options, but the groups don’t really strike me as facets. Moreover, the interface isn’t really designed to explore intersections of the refinement options–rather, at any given time, you see the intersection of the initial search and a currently selected refinement. But it is certainly an interface that supports query refinement and exploration.

The more interesting features are the nuggets and the support for relevance feedback.

The nuggets are full sentences, and thus feel quite different from conventional search-engine snippets. Conventional snippets serve primarily to provide information scent, helping users quickly determine the utility of a search result without the cost of clicking through to it and reading it. In contrast the nuggets are document fragments that are sufficiently self-contained to communicate a coherent thought. The experience suggests passage retrieval rather than document retrieval.

The relevance feedback is explicit: users can thumbs-up or thumbs-down results. After supplying feedback, users can refresh their results (which re-ranks them) and are also presented with suggested categories to use for feedback (both positive and negative). Unfortunately, the research on relevance feedback tells us that, helpful as it could be to improving user experience, users don’t bite. But perhaps users in research scenarios will give it a chance–especially with the added expressiveness and transparency of combining document and category feedback.

Overall it is a slick interface, and it’s nice seeing the various ideas Mohan and his colleagues put together. There’s certainly room for improvement–particularly in the quality of the categories, which sometimes feel like victims of polysemy. Open-domain information extraction is hard! Some would even call it a grand challenge.

Mohan reads this blog (he reached out to me a few months ago via a comment), and I’m sure he’d be happy to answer questions here.

General

Taking Blekko out for a Spin

Post author By Daniel Tunkelang
Post date August 6, 2010
9 Comments on Taking Blekko out for a Spin

http://player.ooyala.com/player.js?embedCode=90cmtrMTom9vae2YoUwJrngW3UCgI2Zu&deepLinkEmbedCode=90cmtrMTom9vae2YoUwJrngW3UCgI2Zu

If you’re a search engine junkie like me, you’ve probably heard about Blekko, a search engine that has been percolating for over two years and recently launched a private beta. If not, I encourage you to watch the TechCrunch video I’ve embedded above. You can join the beta by following them on Twitter. I did that earlier this week, and my invitation arrived via a direct message the next day.

Blekko’s main differentiating feature is that it supports “slashtags”. These aren’t the same as the Twitter microsyntax proposed by Chris Messina and named by Chris Blow. Rather, they are a way for users to “spin” their search results using a variety of filters. For example, [climate /liberal] and [climate /conservative] return very different results, because they are restricted to different sets of sites.

In addition to providing a set of curated slashtags, Blekko allows users to define their own slashtags by specifying the sets of sites to be included. There’s a social aspect here too: you can use (and follow) other users’ slashtags. Blekko also has some special slashtags that don’t act as site filters, e.g., /date shows recent results and /seo offers indexing information about web sites.

Blekko emphasizes two characteristics that I find very appealing: transparency and user control. While they do not disclose their relevance ranking algorithm, they do expose some of the information they use to compute it. More significantly, their emphasis on slashtags de-emphasizes default ranking, but rather encourages users to take more responsibility in the information seeking process. Very HCIR!

I like the concept. But I’m not sure how I feel about the execution. I have three main concerns.

First, the set of slashtags is somewhat haphazard–to be expected in a beta, but I’m not sure how it will evolve. I’d love to see a vocabulary collectively (and transparently) curated like Wikipedia, but I fear it will look more like social tagging site Delicious, which is a case study in the “vocabulary problem“. As any information scientist can tell you, managing vocabularies is hard!

Second, I’m not sure if site filters are the right model. What happens to sites with heterogeneous content? Or to sites that have one-hit wonders and therefore are unlikely to show up in any slashtags? I’d prefer to see the sites used as seeds to train classifiers that could then be applied to the entire index. Something a bit more like what Miles Efron implemented in this research–only on a much larger scale and applied at a page rather than site level.

Third, I think there’s a third ingredient that is essential to complement transparency and user control: guidance. As a user, I need to know what slashtags would lead me to interesting results, and ideally I’d want some kind of preview to make exploration as low-cost as possible.

I know I’m asking for a lot–especially from an ambitious startup that has just launched its private beta. But I think the stakes are high in this space, and going easy on a newcomer is no favor. I offer the tough love of a critic who would really like to see this kind of vision succeed.

General

HCIR 2010 Accepted Papers

The 4th Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2010) is coming up on August 22 in New Brunswick, NJ, taking place immediately after the Information Interaction in Context conference (IIiX 2010). That’s just a few weeks away!

If you are are interested in attending and haven’t already registered, please let me know as soon as possible via email or Twitter (speaking of which, follow the #hcir2010 hash tag). We’re making the remaining slots available to the community on a first-come, first-serve basis.

Google user experience researcher Dan Russell will be delivering this year’s keynote on “Why is search sometimes easy and sometimes hard? Understanding serendipity and expertise in the mind of the searcher“.

Here is the list of accepted papers:

Oral Presentations

VISTO: for Web Information Gathering and Organization
Anwar Alhenshiri, Carolyn Watters, and Michael Shepherd (Dalhousie University)
Time-based Exploration of News Archives
Omar Alonso (Microsoft Corporation), Klaus Berberich (Max-Planck Institute for Informatics), Srikanta Bedathur (Max-Planck Institute for Informatics), and Gerhard Weikum (Max-Planck Institute for Informatics)
Combining Computational Analyses and Interactive Visualization to Enhance Information Retrieval
Carsten Goerg, Jaeyeon Kihm, Jaegul Choo, Zhicheng Liu, Sivasailam Muthiah, Haesun Park, and John Stasko (Georgia Institute of Technology)
Impact of Retrieval Precision on Perceived Difficulty and Other User Measures
Mark Smucker and Chandra Prakash Jethani (University of Waterloo)
Exploratory Searching As Conceptual Exploration
Pertti Vakkari (University of Tampere)
Casual-leisure Searching: The Exploratory Search Scenarios that Break our Current Models
Max L. Wilson (Swansea University) and David Elsweiler (University of Erlangen)

HCIR Challenge Reports

Search for Journalists: New York Times Challenge Report
Corrado Boscarino, Arjen P. de Vries, and Wouter Alink (Centrum Wiskunde and Informatica)
Exploring the New York Times Corpus with NewsClub
Christian Kohlschütter (Leibniz Universität Hannover)
Searching Through Time in the New York Times
Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs)
News Sync: Three Reasons to Visualize News Better
V.G. Vinod Vydiswaran (University of Illinois), Jeroen van den Eijkhof (University of Washington), Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research)
Custom Dimensions for Text Corpus Navigation
Vladimir Zelevinsky (Endeca Technologies)
A Retrieval System Based on Sentiment Analysis
Wei Zheng and Hui Fang (University of Delaware)

Research Posters

Improving Web Search for Information Gathering: Visualization in Effect
Anwar Alhenshiri, Carolyn Watters, and Michael Shepherd (Dalhousie University)
User-oriented and Eye-Tracking-based Evaluation of an Interactive Search System
Thomas Beckers and Norbert Fuhr (University of Duisberg-Essen)
Exploring Combinations of Sources for Interaction Features for Document Re-ranking
Emanuele Di Buccio (University of Padua), Massimo Melucci (University of Padua), and Dawei Song (The Robert Gordon University)
Extracting Expertise to Facilitate Exploratory Search and Information Discovery: Combining Information Retrieval Techniques with a Computational Cognitive Model
Wai-Tat Fu and Wei Dong (University of Illinois at Urbana-Champaign)
An Architecture for Real-time Textual Query Term Extraction from Images
Cathal Hoare and Humphrey Sorensen (University College Cork)
Transaction Log Analysis of User Actions in a Faceted Library Catalog Interface
Bill Kules (The Catholic University of America), Robert Capra (University of North Carolina at Chapel Hill), and Joseph Ryan (North Carolina State University Libraries)
Context in Health Information Retrieval: What and Where
Carla Lopes and Cristina Ribeiro (University of Porto)
Tactics for Information Search in a Public and an Academic Library Catalog with Faceted Interfaces
Xi Niu and Bradley M. Hemminger (University of North Carolina at Chapel Hill)

Position Papers

Understanding Information Seeking in the Patent Domain and its Impact on the Interface Design of IR Systems
Daniela Becks, Matthias Görtz, and Christa Womser-Hacker (University of Hildesheim)
Better Search Applications Through Domain Specific Context Descriptions
Corrado Boscarino, Arjen P. de Vries, and Jacco van Ossenbruggen (Centrum Wiskunde and Informatica)
Layered, Adaptive Results: Interaction Concepts for Large, Heterogeneous Data Sets
Duane Degler (Design for Context)
Revisiting Exploratory Search from the HCI Perspective
Abdigani Diriye (University College London), Max L. Wilson (Swansea University), Ann Blandford (University College London), and Anastasios Tombros (Queen Mary University London)
Supporting Task with Information Appliances: Taxonomy of Needs
Sarah Gilbert, Lori McCay-Peet, and Elaine Toms (Dalhousie University)
A Proposal for Measuring and Implementing Group’s Affective Relevance in Collaborative Information Seeking
Roberto González-Ibáñez and Chirag Shah (Rutgers University)
Evaluation of Music Information Retrieval: Towards a User-Centered Approach
Xiao Hu (University of Illinois at Urbana Champaign) and Jingjing Liu (Rutgers University)
Information Derivatives: A New Way to Examine Information Propagation
Chirag Shah (Rutgers University)
Implicit Factors in Networked Information Feeds
Fred Stutzman (University of North Carolina at Chapel Hill)
Improving the Online News Experience
V. G. Vinod Vydiswaran (University of Illinois) and Raman Chandrasekar (Microsoft Research)
Breaking Down the Assumptions of Faceted Search
Vladimir Zelevinsky (Endeca Technologies)
A Survey of User Interfaces in Content-based Image Search Engines on the Web
Danyang Zhang (The City University of New York)

You can also download the full proceedings here.

General

Overcoming Spammers in Twitter

Post author By Daniel Tunkelang
Post date August 2, 2010
3 Comments on Overcoming Spammers in Twitter

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=ceri2010-gayobrenes-imagenes-100615061415-phpapp02&stripped_title=overcoming-spammers-in-twitter-a-tale-of-five-algorithms

As I blogged a few months ago, University of Oviedo professor Daniel Gayo-Avello published a research paper entitled “Nepotistic Relationships in Twitter and their Impact on Rank Prestige Algorithms“, in which he concluded that TunkRank was the best of the measures he studied for ranking Twitter users. I recently discovered that he and David Brenes posted slides from their presentation at CERI 2010 on “Overcoming Spammers in Twitter”. Enjoy!