Harry McCracken at Technologizer just posted “A Brief History of Google Killers“, in which he enumerates fourteen companies that “were supposed to do away with the Web’s biggest brand”. He forgot a few–I’d love to see a a more comprehensive list (e.g., where’s Dipsie?). Still, it’s an informative and entertaining analysis, and would be Google executioners would do well to read the lessons. I have a “So You Want To Kill Google” post in the virtual queue, but this will have to tide you over until I have time to write it.
Category: Uncategorized
Curt Monash has a nice post that turns around the question of innovating business models for online publishing . He considers the reasons that people consume information, and uses that as the basis for evaluating the potential of the various business models (e.g., freemium, metered) available to the companies that produce it.
It’s a long post, so I’ll excerpt his conclusions:
- “Freemium” models, in which one gives away some good information but charges for the best stuff, can succeed.
- Charging by some kind of usage metric doesn’t make sense.
- Grand cosmic all-you-can-consume-of-all-but-the-most-highly-valuable-information subscriptions — e.g., an “ASCAP for news” — could be marketable.
Monash doesn’t bring up the possibility of monetizing participation–a route that I think a number of publishers should consider. But he covers a lot of ground, and would-be saviors of the publishing industry would do well do read his sober, comm0n-sense analysis before latching onto a new business model as a get-saved-quick scheme.
Wolfram Alpha is live, though it is experiencing some strain under load. Lots of reactions on Techmeme, both commenting on the brief launch delay and expressing mixed reactions to the service itself.
I encourage you all to try it, at least when it recovers from the initial load. If nothing else, I need for everyone here to keep me honest after I’ve been spouting my opinions about Wolfram Alpha for the past few weeks!
I’ve been at the Enterprise Search Summit all day, so I didn’t have the chance to pore through the buzz about Google’s Searchology announcements. But I did see a snippet that struck me as very unusual language for Google:
Our first announcement today is a new set of features that we call Search Options, which are a collection of tools that let you slice and dice your results and generate different views to find what you need faster and easier. Search Options helps solve a problem that can be vexing: what query should I ask?
Google, focusing on query refinement and elaboration? I’m all ears! In fact, eyes and ears–here is the video tour:
Well, on second thought, it’s a bit of a rehash of features they’d already rolled out, and that I personally didn’t find overwhelming (see here and here). Still, I’m pleased that their marketing language is embracing HCIR–that’s a big step for a company that has perhaps done more than anyone to emphasize the primacy of relevance ranking in the search interface. Even if they’re only taking baby steps at this point, I am cautiously optimistic that they will build on them.
Designing for Faceted Search
While I was inundated with conferences a couple of weeks ago, I missed s a nice article by Stephanie Lemieux at User Interface Engineering (a site I recommend in general) entitled “Designing for Faceted Search“. It briefly explains faceted search and offers some usability tips. It’s not quite as comprehensive as my upcoming book, but it’s also free and is somewhat less than 100 pages.
Of course, I’m delighted that she uses a couple of Endeca-powered examples (NCSU Libraries, Buzzilions). She also cites the Financial Times, but links to the ft.com (which I believe is powered by FAST, a subsidiary of Microsoft) rather than the recently launched Newssift, which uses Endeca.
Just one quibble: she says that “Just 3 facets with 5 terms each can represent 243 possible combinations.” I suspect she transposed the 3 and the 5. The right number of combinations is 125 = 5^3, since a combination represents 3 independent selections from 5 possible choices.
More Thoughts on Image Retrieval
After my recent posts about Google’s similarity browsing for images, a colleague reached out to me to educate me about some of the recent advances in image retrieval. This colleague is involved with an image retrieval startup and felt uncomfortable posting comments publicly, so we agreed that I would paraphrase them in a post under my own name. I thus accept accountability for the post, but cannot take credit for expertise or originality.
Some of the discussion in the comment threads mentioned scale-invariant feature transform (SIFT), an algorithm to detect and describe local features in images. What I don’t believe anyone mentioned is that this approach is patented–certainly a concern for people with commercial interest in image retrieval.
There’s also the matter of scaling in a different sense–that is, handling large sets of images. People interested in this problem may want to look at “Scalable Recognition with a Vocabulary Tree” by David Nistér and Henrik Stewénius. They map image features to “visual words” using a hierarchical k-means approach. While mapping image retrieval to text retrieval approaches is not new, their large-vocabulary approach was novel and made significant improvement to scalability, as well as being robust to occlusion, viewpoint and lighting change. The paper has been highly cited.
But there are problems with this approach in practice. For example, images from cell phone cameras are low-quality and blurry, and Nistér and Stewénius’s approach is unfortunately not resilient to blur. Accuracy and latency are also challenges.
In general, some of the vision literature about which are the best features to use don’t seem to work so well outside the lab, and the reason may be that images used for such experiments in the literature are of much higher quality than those in the field–particuarly for cell phone images.
An alternative to SIFT is “gist”, an approach based on global descriptors. This approach is not resilient to occlusion or rotation, but it does scale much better than SIFT, and may serve well for some duplicate detection–a problem that, in my view, is a deal-breaker for applications like similarity browsing–and which certainly is a problem for Google’s current approach.
In short, image retrieval is still a highly active area, and different approaches are optimized for different problems. I was delighted to have a recent guest post from AJ Shankar of Modista about their approach, and I encourage others to contribute their thoughts.
Got Hate Tweets?
I had the novel experience today of discovering that someone set up a Twitter account for the sole purpose of harassing me personally. I’m not sure what exactly I did to deserve this honor, but I’m amused by the personal attention, since I’m hardly a Twitter celebrity. Perhaps it’s someone I know, conducting a social experiment to see how I’ll react. Ah, the wonder of online anonymity.
Wolfman vs. Googzilla
What’s not to love about a good fight? Check out David Talbot’s “Wolfram Alpha and Google Face Off” in Technology Review. I don’t come away with a sense that I’ll regularly use either Wolfram Alpha or Google Public Data, but it’s nice to start seeing people off-road with them and compare the results. The Wolfram Alpha launch is supposed to be this month, so presumably we’ll all be able to do that in a matter of weeks, if not days. Google Public Data is available already, intergrated into web search.
Sadly, neither of these guys seems interested in providing a non-NLP interface. In my view, that would be far more useful. But I suppose it’s not what sells papers.
If you’re attending the Infonortics Search Engine Meeting in Boston next week, please let me know! I’ll be there all day Monday and Tuesday, and I’ll be talking on Tuesday afternoon about “Enabling the Information Seeking Process”.
I’m sticking around in Boston to attend the International Association of Scientific, Technical & Medical Publishers Spring Conference, where I’ll be presenting “Exploring Semantic Means” on Thursday morning. My presentation there will similar to the one I delivered at the New York Semantic Web Meetup, but I’m also hoping to sneak in a live demo!
Hopefully I’ll see some of you there! I also apologize in advance if my blogging is a bit thin over the next several days. I’ll post reactions to the two conferences as soon as I have the time to gather them.
Slight Change to the HCIR ’09 CFP
I hope you all are gearing up for HCIR 2009! Those who have not yet read the call for participation or looked at the web site can safely ignore this message, which announces what we hope is a minor change for participants.
After receiving feedback to the CFP, we (Bill, Ryen, and I) decided to request more substantive position papers (up to 4 pages), and to select four to six of these for presentation in a workshop panel. We will still have a morning “poster boaster” session for all other participants, and we strongly encourage all attendees (including those on the panel) to present posters.
I hope the change does not cause any confusion or inconvenience–in any case, the submission date is still four months away! We made the change after being convinced that having a few peer-reviewed presentations would help continue the tradition that people bring their best work to the workshop, as they have done in previous years. But I do want to reinforce that, as we announced in the CFP, we will adjust the schedule to allow for more interaction time relative to previous years. We do recognize that the discussions are the best part of the workshop.
In any case, if you have any questions or concerns, please let me know! This is still a young workshop, and we’d like to make sure we are listening to the community we serve.