Google Similar Images: A Glitch?

After my post yesterday about Google’s new image similarity search, a colleague sent me this link:

Mmm, blackberries!

Oops! But, more importantly, it argues strongly that Google isn’t just using image content to perform the similarity search. In fact, I’d assume from this example that text content can easily overwhelm the contribution of visual similarity. Very interesting…

By Daniel Tunkelang

High-Class Consultant.

22 replies on “Google Similar Images: A Glitch?”

It probably uses some kind of semantic hash of the pictures along with the fulltext match of the keywords in the query (in the html page context, image filename or text of the links pointing to the image from other pages):

Semantic hashing using Deep Belief Networks:

Click to access semantic_final.pdf

Semantic Hashing using stacked autoencoders:

Click to access ranzato-icml08.pdf

Semantic Hashing using Spectral Hashing:

Click to access spectralhashing.pdf


That makes sense. I’m just surprised because, in a TechCrunch interview, Google engineering director Radhika Malpani said it was based on “analyzing the comment of the image”. I guess that doesn’t rule out use of text–and, at the end of the interview, she does say that Google generally approaches image search by all means necessary, including the use of textual metadata. Still, I feel a bit bait-and-switched here, and I think the general perception was that this new image search application was based solely on visual content.


Well hold on, lets look at the pictures for a second.

they are all blackish blobs on a white background. they are all covered in bobbles (the buttons the cell phone ARE bobbly). even the icons on the screens add to a bobbly effect. and the variation in colour on the bobble buttons do look a little like the light reflection on the fruit bobbles. most of the phones are at the same slant as the fruit. I’d say that they look more like that piece of fruit than the other images i get for the search ‘blackberry fruit‘.

maybe it is just by image alone?


I’m guessing they are at least using image labeling from their version of the ESP Game (this is a gut feeling from trying a bunch of queries), and it just so happens that “blackberry” is such a high-information term that it tricked them.


Here’s another fun example where text overwhelms visual similarity. Again, it’s consistent with my ESP Game intuition–basically, that humans would think the same word to as the best label for unrelated images.


cueflik is great. Id love to see that sort of interaction on google image search.

Again on your latest example, they are all mostly white background, circular, yellow and black. I guess they are all images under the search term of ‘kiwi’ though.

You might notice as well that live search’s similar image search has a similar but different result.


Indeed, I see nothing wrong with combining evidence. I just would have liked a clearer explanation upfront. Maybe there was one any I missed it. Not to beat a dead horse, but transparency goes a long way to build trust.


Here’s a funny case of the word associated overwhelming the image — A search for george retrieves a non-famous example of a person with that name — But clicking the accompanying “similar images” pulls up mostly pictures of GEORGE BUSH, instead of finding faces like the non-famous George.


Got to this story a bit late. It would be pretty accurate to say that the revised Google Lab’s image search prototype is not a huge leap from what went before. Techcrunch has a couple of posts to confirm this.

Have a look at our (beta) image search at:

Drag ONE or MORE images from the left-hand set of images into the search oval. The results will appear on the right.

Add additional images to the search by dragging and dropping more images. Remove images from the query by clicking on the image. Click the search oval to clear the search and start a new one.

Isn’t this what image search should be all about ie. dealing with an image as an item? Imagine being on any web page and dragging and dropping images from that page into the search oval to find similar images.



Dinesh, I’m not familiar enough with the state of the art to evaluate the Google Labs experiment relative to it. But I did take a look at Xyggy. It’s intriguing, but hard to evaluate without being able to pick from outside the 3,000 images in your set.

And of course I’d really like similarity browsing do clustering, with as much transparency as possible about what holds together a cluster. But I realize this may be hard to do for visual similarity.


Daniel, Thanks for checking out the image search demo at Xyggy. There are in fact 32,000 images to choose from.

Image features can range from the simple (eg. color and texture) to the complex (edge, sift, and so on). The feature vector could also include, if available and required, text tags. This means there is a lot of flexibility in the kinds of images services that can be built.

Given a query of image items, it is possible to cluster the results by specific features (eg. color, texture etc.).

For large-scale data sets, query results would likely contain many similar images. Depending on a users needs you want them to be able to cluster the results between very similar and not very similar. We can do this.

The (beta) Xyggy image search service is one example of “item search” where the query contains items (not keywords) and the results are items. An item could be a patent, a news article, a piece of music, a medical record and so on.

Wrt images, our next step is to use a much larger image set in the tens of millions if not larger. The ultimate goal is image search from any web page by simply dragging and dropping images into the search oval. To us this is how people do image search in real life and how we should do it in the digital world as it is natural and intuitive.



Sorry for losing a digit there! In any case, it would be nice if, as a user, I could supply my own image as input or, as in Google Similar Images, use an image I retrieve through search as a starting point for exploration. I imagine the latter is more plausible than the former, if you need to index all images in the collection.


Both are possible. Indexed images will be faster to return results than non-indexed images. This can be mitigated with appropriate use of backend hardware. Starting with an image retrieved through (text) search isn’t an issue and in fact, the first demo we built showed this.

Note also that we support multiple images per query.


Sorry, but the first demo is now on my laptop only. We are working on rolling out a few live services after which we want to enhance the image service with a significantly larger data set. We can consider offering both query by text and by image.


Comments are closed.