Categories
Uncategorized

UIE Virtual Seminar on Faceted Search

My colleague, Endeca co-founder Pete Bell, and I are giving a virtual seminar on faceted search next week for User Interface Engineering (UIE). It’s on Thursday, August 20th at 1:30PM EST. The regular price is $129, but Noisy Channel readers who are interested in attending can get a $30 discount by using TUNKELANG (yes, all caps) as a promo code. Attendees also receive a free copy of my book, Faceted Search.

Whether or not you can attend, I do encourage you to check out the UIE site. It’s got a lot of free, useful content, and Jared Spool is definitely someone worth following if you are interested in web usability.

Categories
Uncategorized

Will Browsers Ship With Ad Blockers?

A while ago, I wrote a post entitled “Think Evil” in which I mused that:

A few years ago, when it became clear that Microsoft was losing the search wars to Google–but when they hadn’t lost much browser market share to Firefox–I thought they should have used a scorched earth strategy of including an ad-blocker in Internet Explorer. The ad blocker would be on by default and would block all ads, including sponsored links from search engines. Actually, I can’t bring myself to consider this particular approach evil–from my perspective, the means would justify the end.

I guess I’m not the only person with such musings. In a post with the descriptive (if uncreative) title “In five years all browsers will block internet advertisements by default“, Orin Thomas argues:

People have become conditioned to accessing content for free on the Internet and people also don’t want to see advertisements on the Internet. At some point in the not too distant future, Ad blocking will become a necessary browser feature like Tabs are today. Any browser that does not include the feature will suffer a dramatic downturn in market share as people move to platforms that “block those darn advertisements”. Within five years, all browsers will block advertisements by default because, in the end, it is a feature that most people want.

I’d like to believe that he’s right, but I’m pretty sure I made similar claims at least five years ago, and I’m not aware of even a niche browser that ships with a built-in ad blocker.

I’m curious what readers think. Is it a matter of time before we see another arms race, like we had a few years ago over pop-up ads? Or, as one of the commenters responded to Thomas , is it just a matter of equilibrium, where advertisers produce ads that users don’t want to block?

Indeed, are we already at that equilibrium? Is the lack of traction for easily available ad blockers a sign that people don’t mind ads, and that the ad-supported ecosystem can easily afford to ignore outliers like me who religiously use Adblock Plus and CustomizeGoogle to block all ads?

Categories
Uncategorized

Guest Post: Rich Marr, Media As a Search Term

The following is a guest post by Rich Marr. Rich is the Director of Engineering at Pixsta, where he’s been working on Empora.com, a consumer-facing site that enables browsing of fashion products  according to image similarity (much like Modista).  Pixsta is a growing start-up focused on turning our R&D team’s ongoing search and image processing work into workable products. The post is entirely his, with the exception of links that I have added so that readers can find company sites and Wikipedia entries.

It’s an often-heard urban myth that Eskimos have many words for snow, but that we only have one.  This idea rings true because there’s value in being able to make precise distinctions when dealing with something important to you.  You can find specialised vocabularies in cultures and sub-cultures all over the world, from surfers to stock brokers. When there’s value in describing something, you’ll usually find someone has created a word to do the job.

In search we often come across problems caused by insufficient vocabulary.  People have an inconvenient habit of describing things in different ways, and some types of document are just plain difficult to describe.

This vocabulary problem has spawned armies of semantic search start-ups, providing search results based on inferred meaning rather than keyword matching, but semantic systems are text-driven which means there are still vocabulary problems, for example you might overhear some lyrics and then use a search engine to look up who wrote the song but how would you identify a piece of unknown instrumental music?  Most people don’t have the vocabulary to describe music in a way that can identify it.  This type of problem is addressed by search tools that use Media As a Search Term, which I’ll abbreviate to MAST.

MAST applications attempt to fill the vocabulary gap by extracting meaning from the query media. These apps break down into two rough groups, one concerned with identification (e.g. SnapTell, TinEye, MusicBrainz, Shazam, and the field of biometrics) and the other concerned with similarity search (e.g. Empora, Modista, Incogna, and Google Similar Images).

These apps use media-specific methods to interpret objects and extract data in a meaningful form for the given context.  Techniques used here include wavelet decomposition, Fourier transforms, machine learning techniques, and a whole load of good old-fashioned pixel scraping.

The interpreted data available is then made available in a searchable index, usually either a vector space that judges similarity using distance, or a conventional search index containing domain-specific ‘words’ extracted from the media collection. Both of these indexing mechanisms are a known quantity to the programmers which leaves the main challenge as the extraction of useful meaning, conceptually similar to using natural language processing (NLP) to interpret text.

The challenge of extracting useful meaning is based largely around establishing context, i.e. what exactly the user intends when they request an item’s identity, or want to see a ‘similar’ item. What properties of a song identify it as the same? Should live versions of the same song also match studio versions? Is the user more interested in the shape of a pair of shoes, or the colour, or the pattern?

Framed in the context the difficulties of NLP it’s clear that there’s not likely to be an immediate leap in the capabilities of these apps but rather a gradual evolution. That said, these technologies are already good enough to surprise people and they’re quickly finding commercial use, which adds more resources and momentum. As our researchers chip away at these big challenges you’ll find MAST systems appearing in more and more places and becoming more and more important to the way people acquire and manage information.

Categories
Uncategorized

Reminder: HCIR 2009 Submission Deadline is August 24th!

Just a quick reminder that the submission deadline for HCIR 2009, the Third Annual Workshop on Human-Computer Interaction and Information Retrieval, is August 24th, which is just 3 weeks away! Please spread the word; I know people can be forgetful during the summer months. The workshop itself will be held on October 23rd, at Catholic University in Washington, DC.

Categories
Uncategorized

An Apology to Vijay Gill

I don’t know if Google’s Vijay Gill reads this blog. But a post of his just caught my attention, and I feel I owe him an apology.

A little over a month ago, I wrote a post entitled “Even Google Should Beware Of Hubris“. I stand by much of that post. But I specifically said:

And, just a few days ago, Google’s senior manager of engineering and architecture punctuated a panel discussion at the Structure 09 conference–where he was sharing a stage with a counterpart from Microsoft–with the punchline “If you Bing for it, you can find it.”

Apparently I shouldn’t believe everything I read in The Register. Vijay Gill, the manager quoted above, wrote a post on his blog that appeared shortly after The Register article (and after my post), entitled “Google Does Not Mock Bing“. Here’s the most relevant paragraph:

I wasn’t mocking Bing when I said “Bing for it, you can find it.” I meant that seriously, in the spirit of giving props to a competitor, and a good one at that. Najam and I have been friends since before Google had a business plan, and I have the greatest respect for him and for Microsoft as a company. The Microsoft approach has some good points, which work for their business plan. I was speaking of one particular approach, among several others, which can solve the same problem. There was no undercutting anything, there are two approaches and thats that.

Vijay, if you’re reading this, I’m sorry for taking so long to notice your clarifying post. I hope most of your fellow Googlers are as respectful of your competitors.

Categories
Uncategorized

Heading to SIGIR

Hope to see lots of you at SIGIR! Sounds like there are already great tutorials underway. I’ll get there tonight for the reception, where they will announce the triennial Gerard Salton Award winner (who will deliver tomorrow’s opening keynote). I’m looking forward to the paper, poster, and demo presentations, and of course to the Industry Track on Wednesday. Unfortunately, I have to return to my day job on Thursday, so I won’t be able to attend any of the workshops.

If you’re attending, I hope you’ll find me and say hi–after over a year of blogging, there are far too many people I’ve gotten to know but never met face to face! If you’re not attending, then I encourage you to follow the coverage on Twitter. Since there seems to be some confusion about which hashtag to use, I suggest you follow sigir OR sigir09 OR sigir2009 (yes, there is sometimes value to favoring recall). I promise to blog about it when I get back, but I hope you’ll forgive me if The Noisy Channel is a bit quiet over the next few days.

Categories
Uncategorized

Faceted Search Book Is Shipping

Amazon and Barnes & Noble are both shipping the faceted search book, so hopefully all of our pre-orders are finally leaving the warehouses. My apologies for the delays, and my thanks to everyone who has been so patient. Apparently a few people had trouble using the publisher’s site; at this point I suggest using Amazon or BN, since both offer competitive prices.

Categories
Uncategorized

Taking Time Off

I’ll be offline for about a week, returning on July 13th. No, I’m not going to Argentina or even hiking the Appalachian trail, but I am going off the grid to spend quality time with my wife and daughter. See you all soon!

Categories
Uncategorized

Looking for a IR / Data Mining Job?

No, I’m not recruiting for my team–though I’m always open to research collaborations. But I wanted to call readers’ attention to at least two places that are hiring folks with expertise in information retrieval.

The first is Panjiva, a startup that I’m advising. You can read more about them here. They are looking for a hands-on developer (yes, someone who can code) with background in information retrieval or data mining. The job is in Cambridge, MA, and they want someone local. Check out their jobs page.

The second is Twitter. Yes, you’ve heard of them. What you might not know if that they’re aggressively hiring in their search group. Apparently the company is growing–I’d thought they were at ~30 people, but I just did a reference call for someone and learned that they’ve doubled in the past few months. I have no stake in Twitter except as a user, but I’d love to see them improve their search capabilities. So, if you’re in or near San Francisco and looking for a search job on the bleeding edge, check it out.

Other folks who are trying to hire people with search / information retrieval background: I encourage you to post opportunities in the comments!

Categories
Uncategorized

Off to SIGMOD

I hope you’ve been enjoying my recent posting frenzy, because the noise will be a bit think in the next couple of weeks. I’m about to head to Providence, where I’ll be attending SIGMOD 2009 and presenting an invited talk on “Design for Interaction“. Yes, database people care about HCIR too! I don’t know what sort of spare time or connectivity I’ll have, but I’ll try to sneak in a blog post or two. Then next week I’ll be on vacation!