The Noisy Channel

Google Image Search Gets Style

Post author By Daniel Tunkelang
Post date December 19, 2008
18 Comments on Google Image Search Gets Style

Clip art

Line drawing

Google announced today that its image search now supports search-by-style. As someone who regularly uses Google’s image search to find fodder for my presentations, I am excited about this enhancement. Moreover, I think it’s a clever application of the various image analysis algorithms Google has been developing.

They now include a drop-down that allows you to restrict searches to images from news content, faces, clip art, line drawings, and photo content. It’s not 100% accurate, but it’s not bad.

What is unfortunate is that the interface, whether you’d like to explore images by style or by size, doesn’t give you any sort of preview of the content in each category. I at least find it annoying to have to keep clicking to explore the space. But this is at least a baby step towards supporting exploratory search, in a domain that cries out for it.

By Daniel Tunkelang

High-Class Consultant.

18 replies on “Google Image Search Gets Style”

live.com image search has been doing this for a while.

LikeLike

Live’s image search certainly looks more like a traditional faceted search interface, absent query previews. And their find-similar functionality is something Google doesn’t have, though its output is a bit erratic.

But they’re missing at least a couple of things Google gets right. I really like the way that Google’s image captions provide context. And Google’s style choices are more helpful than Live’s. A typical image search use case for me is finding online comics for presentations. Google nails it with “line drawings”. Nothing comparable on Live (or on Yahoo’s image search).

LikeLike

As a techie/engineer/scientist, I am also excited by such a development. Very cool technology indeed.

As a social observer/commenter, I come up with a big, fat question mark. What is the real, intended use of this technology? We are information retrieval scientists, right? So what is the information need of a user searching for a particular style, for example line drawings or clip art?

As far as I can tell, the main use of such a technology is to be able to find images, for inclusion in flyers, newsletters, etc. Blogs.

Users specifying “clip art” are not doing so, because they are doing research into clip art. They are doing so, because they want to find images that they can reuse.

And so the serious issue that raises is now whether this Google tool is wholeheartedly promoting copyright violation and theft. Because who does a clip art search, that is not actually looking to reuse clip art? And given that most of the clip art images on the web are probably not under CC license, that means that Google has developed a tool specifically geared toward violating copyright.

So I’m of mixed opinion on this. One the one hand, I applaud the baby step toward exploratory search, and love the multimedia aspect to the IR. On the other hand, I struggle to see utility that goes beyond outright copyright violation. At least with web search, most of the use is to find, read, and move on. Most web searches aren’t done with the user intent of full-scale content copying and reuse. But clip-art search? It does seem like that’s the primary use.

LikeLike

Jeremy, I think you’re right that a common (though not the only) use case for clip art search is reuse in other publications. Though it’s not clear to me that this reuse violates copyright. In my most recent presentation, for example, I cite all of my image sources with links. I’m not a lawyer, but this feels within the spirit of fair use, particularly given that the images have been published to make them freely available to the general public–with their own URIs.

LikeLike

I am not a lawyer either, but I agree with your usage. But is your use of such images in a presentation really a common use, really what most people using the service do?

My feeling (with no evidence at all to back this up, mind you 🙂 is that the common usage of clip art for use in church bulletins, company memos, family newsletters, etc. And in those cases, I’ll betcha (give you 20:1 odds) that most people do not include citations for their included clip art.

And I don’t quite know if I buy the “own URI” argument. Many photos on stock photo sites have their own URI, but that doesn’t mean the stock photo site intends for those photos to be “freely available to the general public”.

LikeLike

Unattributed copying is certainly plagiarism and is at least unethical, if not illegal.

On the legal front and specifically regarding URIs, I did find this tidbit from a discussion on Ars Technica about the legal battle between Google and “men’s” magazine Perfect 10:

On the question of whether Google was liable for framing third-party web sites containing infringing content, the district court had sided with Google, and Ikuta upheld that part of the ruling. She noted that Google merely provided the user’s browser with the address of the web site containing the infringing material; the material itself passed directly from the third-party web site to the user without ever coming into contact with Google’s servers.

http://arstechnica.com/news.ars/post/20070517-google-v-perfect-10-appeals-court-affirms-that-thumbnails-are-fair-use.html

LikeLike

So what’s the difference between Google and Kazaa? Kazaa also does not host the actual infringing content; it only links to 3rd party sites. And yet Kazaa was ruled not legal.

There is a parallel here that I am trying to draw, and that is between the notion of a general information search utility whose purpose including information seeking behaviors of all kinds, and a specifically-targeted, highly-filtered engine focused on finding one type of content — which content is primarily of an illegal or infringing usage.

Google web search, Google image search, are general information tools.

Kazaa is a specific tool, geared toward the dissemination of mostly copyrighted, improper use materials.

My point is that by adding a “clip art” filter onto general image search, you are turning Google Image Search into Kazaa. You’re turning it from a tool for general information seeking into a tool that is specifically geared toward finding material whose primary use is infringement-oriented.

That’s where I think the Perfect 10 case is not completely applicable, and where I still have a big, fat question mark.

LikeLike

Let’s put it this way: It would be very easy for Google to do what Baidu did, and create a specific music audio filter to its search engine, exposed at the top level, for users to search the web for mp3 music files.

And yet they do not create such an easy-to-use, readily accessible filter. Why not? There is certainly user demand for such a filter. And certain Google is not responsible for what users search for, nor what webmasters post on webmaster pages. The Perfect 10 case made that clear, right?

So why doesn’t the Google mp3 search engine exist? Why isn’t there a Google “onebox” link to an mp3 file, when I search for a song?

Now, think about the answer to that question, in relation to the “clip art” image filter search. What’s the difference? I see little.

LikeLike

The Grokster decision I believe you’re referring to is:

“We hold that one who distributes a device with the object of promoting its use to infringe copyright, as shown by clear expression or other affirmative steps taken to foster infringement, is liable for the resulting acts of infringement by third parties.”

http://en.wikipedia.org/wiki/MGM_Studios,_Inc._v._Grokster,_Ltd.#The_Court.27s_decision

What the Supreme Court rules on was “Grokster’s alleged business model of actively inducing infringement”, which is a looser test than the previous “Betamax” legal standard that a product “need merely be capable of substantial noninfringing uses” in order to be legal.

http://en.wikipedia.org/wiki/Sony_Corp._v._Universal_City_Studios

I think that, even with the looser Grokster standard, Google is in the clear. There’s also a big difference between helping people find content that authors have published on the web on freely accessible web sites vs. helping people find illegal copies of content that the authors did not distribute on the open web–and having it on record that your business model is to do so. To the extent that Google is helping people find pirated content, there’s a stronger case against them, but I still doubt they’re “actively inducing” infringement.

I’m not saying there aren’t gray areas. But I think it’s a stretch to equate Google to Kazaa or Baidu.

LikeLike

Well, I was actually referring to Kazaa’s case, not Grokster’s, but let’s go with that. I think some flavor of “actively inducing to infringement” is the key here. General image search isn’t active inducement. Lots of people need to find images of recent news events, which is a common and non-inducing behavior. I’m not claiming general search is comparable to Kazaa.

But when someone goes online and specifically searches for “clip art”, it is because their goal is to copy and reuse it in very specific ways (most often in publications, without attribution). When the search engine then supports this known behavior, by building a specific system geared toward finding that clip-art, the image search engine changes from general informational search, to “active inducement”. That much doesn’t seem like such a stretch to me.

I do agree with you that there is a big difference between helping people find content that authors have published on the web on freely accessible web sites vs. finding illegal copies.. but only when the intent of the people who have published that content on freely accessible web sites do so with the explicit purpose of letting users reuse their content sans attribution. Here is a website that I found using the Google Clip-Art search, for “halloween”:

http://www.uic.edu/depts/envh/Departmental/Newsletter/ObserverSept2006p3.htm

Is the clip art on this page really intended by the author to be freely reusable by the general public? It does not appear to be. But the clip-art search engine led the user straight to this page, and it did so knowing that the reason why the user was searching for clip art was to copy and reuse it.

So if putting certain intent-based media filters onto general image search does not constitute “actively inducing”, I again have to ask: Why does Google still not have a “music mp3” filter built into its general web search engine? It is a very easy machine learning problem to distinguish spoken audio from music audio. The right engineer could whip up a solution in a few weeks. And such a filter would be very useful to have; there is a lot of user demand. Again, we are actually talking here about music files published on the open web, not in some private protocol darkweb. So why doesn’t Google provide a music mp3 filter, the same way they provide a clip-art image filter?

I think it is because such a filter, even though it is on the open web, is by its nature “actively inducing” in spirit, don’t you think?

So why is clip-art different? It can’t be because there is a lot of free clip art online. There is a lot of free music online, too, provided by musicians who want to be discovered and who want to have their content disseminated. And still, there is no “mp3 music” filter on the general google search engine.

Don’t get me wrong: I think the technology itself is cool. I just find this particular clip art usage to be more toward the Kazaa side of gray.

LikeLike

I love this topic, by the way. So I’m not arguing “against” you as much as I just like the discussion 🙂

LikeLike

I suspect that the notion of inducing infringement goes far enough to cover a search engine that returns links to illicit copies, but not so far as to cover a search engine that returns links to legitimately published content that consumers are likely to then use illegally.

I think it would be different if there were a lot of clip art that was illegally reproduced and then published online. If that is the case, then I suspect it falls into the same category as music.

LikeLike

No, there is another issue that we keep missing here; I don’t know if I’ve properly framed it. It’s not just that some links are illicit. It’s not just that consumers will illegally use legal content. It’s not even the ratio of legally to illegally reproduced content.

Is that the very nature of the search itself is set up for a usage that is known, a priori, to be illicit.

(Point 1) The purpose of “clip art” is for full-reproduction inclusion in publications, i.e. copy and paste. That’s what “clip art” is.

(Point 2) (Almost) everyone knows that (almost) no one cites clip art sources. I cannot remember the last church bulletin or company newsletter that I read with a clip art citation. It is de facto social knowledge that clip art is never cited.

(Point 3) Google’s “clip art” filter is explicitly designed to help people find “clip art”.

So if it is well-known that clip art is never cited, and it’s furthermore the case that non-cited clip art is an infringing use, and it’s furthermore the case that someone introduces a tool explicitly designed to find material that will be used in this way, then we have (it seems) a clear case of a tool designed to actively induce infringement.

None of this holds for the other categories.. the image search filters that let you restrict to “Face” and to “News”, etc. Those are not active inducement to infringement, no matter if users use them illegally, no matter if most/all/onlysome of the content is illegally reproduced. None of that matters, because there isn’t a broad social recognition that the way Face images or News images will be used are in a non-citational, fully-reproduced manner.

Clip art is different than the other categories, because of this known usage. That’s my main point here.

LikeLike

I think it would be different if there were a lot of clip art that was illegally reproduced and then published online. If that is the case, then I suspect it falls into the same category as music.

Before I arrived at this same conclusion, I would have to wonder whether most of the music that is published online is illegally reproduced or not. The “long tail” tells me that there is probably a greater number of songs legally published online by “garage bands” and “myspace bands”, than there are illegally reproduced “britney spears” songs.

So if sheer quantity of one type of material versus the other is the core of the argument, then maybe there really should be an “mp3 music” engine.

But sheer quantity is not my core argument. My core argument is intentionality. Users searching for “clip art” have a known, infringing intentionality. The search engine is also then explicitly designing a tool to meet that known intentionality. Put those two together, and you have the active infringement. In my mind, it’s the intentionality that is more important than the quantity.

LikeLike

I get what you’re after, that Google’s offering clip art search targets an audience that wants to commit copyright infringement. I’m not sure I agree. But even if that is the main or a significant use case, I believe the legality of the materials themselves is what determines Google’s legal culpability.

Ethically? I suppose that if they expect their users to mainly use this feature to facilitate illegal activity, then they’re on shaky grounds.

Perhaps by analogy, consider that radar detectors are legal in many states, even though it’s hard to imagine anyone using them for a reason other than not getting caught speeding.

LikeLike

Good, at least we’ve reached understanding, though not agreement. 🙂

But even if that is the main or a significant use case, I believe the legality of the materials themselves is what determines Google’s legal culpability.

Again, I have to go back to the music example: I’ll bet in term of sheer volume, there is more “legal” music on the web, more music posted on websites by the struggling artist, college band, bedroom folk guitarist, aspiring rap duo, than there is illegally posted music. The “long tail” pretty much guarantees this. For every illegally posted Christina Aguilera, there are hundreds of aspiring nobodys, legally posting their own songs.

Yet why isn’t there a “mp3 music” filter on Google’s main web search? Wouldn’t the same argument hold, about the legality of the majority of the materials themselves determining Google’s legal culpability?

My guess as to why there is no mp3 music filter, despite the plethora of garage and bedroom legal music, is that Google knows, and knows that everyone else knows that Google knows, that most users, most of the time, would focus their searches on the “illegal” uses of such a system.

Therefore, despite the fact that the majority of the music is posted legally, just like the majority of clip-art is posted legally, it is the known behavior of the user that prevents Google from moving forward. I know you don’t agree; I just have to iterate one more time that all the evidence I see boils down to user intentionality.

I like your analogy to radar detectors. Another (of course not perfect) analogy is the selling of bongs. Could someone conceivably be buying the bong to smoke tobacco or apple peels? I’m sure that’s a possibility. But everybody knows that everybody knows that’s not why folks buy bongs. Wink and a nudge, selling bongs is legal, just as you suspect Google Clip Art Search will remain legal. But ethically, like you say, I do believe that Google believes that their users are mainly using this feature to facilitate illegal activity. Just like bong sellers really know what their users are mainly doing, as well.

LikeLike

But now you’ve lost me. You say that “it is the known behavior of the user that prevents Google from moving forward” but they did more forward on the clip art., even though you “believe that Google believes that their users are mainly using this feature to facilitate illegal activity.” Which is it?

LikeLike

It’s a double standard, is what it is.

Like said, there is enough “legal” material with both clip art and with web music, enough so that both media types could be protected under the law. On the other hand, both media types are sufficiently infused with illicit user intent, so much so that it’s unethical and dodgy, if not should be illegal.

So you very rightly wonder, then, what the difference between the two is.. why they move forward with one and not the other.

Either they should be ideologically consistent and move forward with both, or they should be ideologically consistent and move forward with neither.

There is a third variable here, beyond quantities, beyond user intent. And that is the RIAA.

Google moves forward on the clip art, despite knowing that the primary use of the tool is “bong-like”, because there is no large professional organization threatening to sue them.

Google doesn’t move forward on the mp3 music, despite knowing that there is a vast quantity of “legal” music that can be found on garage and bedroom band web pages, because of the RIAA.

So despite my desire for ideological consistency, the approach Google seems to be taking is “We can get away with one; we can’t easily get away with the other. So we’ll do what we can get away with.

That’s the only logical conclusion I am able to figure out. The driving ideology is a “what can we get away with” ideology, rather than a “what is right” ideology.

LikeLike

Comments are closed.