Author: Daniel Tunkelang

High-Class Consultant.

Challenge: Blog + Twitter vs. Aardvark

Post author By Daniel Tunkelang
Post date March 14, 2009
11 Comments on Challenge: Blog + Twitter vs. Aardvark

I asked Aardvark the following question this afternoon:

Trying to track down an animated short where a bunch of critters invent a machine to discover where they are, only to learn that they are a dream inside someone’s head. They ultimately turn into pink flamingos as the dream evolves. I remember them all chanting “Flick the switch!” when the invention is unveiled. No luck tracking it down with my web searching skills. 😦

The correct answer came within 6 hours. I’m curious if anyone who reads this message will find it independently–without using Aardvark themselves. If not, I’ll be forced to give Aardvark a very glowing review for answering a question that has been plaguing me for years. One way or the other, I’ll post the answer tomorrow night.

Added: Check out the rematch!

General

Clay Shirky: Save Society, Not Newspapers

Post author By Daniel Tunkelang
Post date March 14, 2009

There is so much writing about the impeding demise of the newspaper industry that’s it’s becoming easy to tune it out. But it’s refreshing to see a cutting analysis like the one Clay Shirky makes in “Newspapers and Thinking the Unthinkable“.

He starts with an anecdote about how, in the early 90s, the Knight-Ridder newspaper chain was fighting the unauthorized online distribution of a Dave Barry column. He quotes Gordy Thompson, who then managed internet services at the New York Times: “When a 14 year old kid can blow up your business in his spare time, not because he hates you but because he loves you, then you got a problem.” His most cutting remark:

The newspaper people often note that newspapers benefit society as a whole. This is true, but irrelevant to the problem at hand; “You’re gonna miss us when we’re gone!” has never been much of a business model.

His main point is this: “Society doesn’t need newspapers. What we need is journalism.” And he simply doesn’t see newspapers surviving as the way to deliver journalism.

I’m not so sure, but I think Shirky has his priorities right. We should be worrying about the end, not the means. I don’t subscribe to his and Jeff Jarvis’s faith-based optimism; I’m not convinced that the market demands it enough. Witness the challenge of sustaining other public goods, such as education and energy conservation. My own view is that journalism’s best hope is monetizing participation. I’m actually in the middle of writing an article about it; I’ll let you know if and when it gets published.

General

Vivisimo, Please Keep It Real

Post author By Daniel Tunkelang
Post date March 13, 2009
22 Comments on Vivisimo, Please Keep It Real

Let me preface this post with a clear disclaimer: I am the Chief Scientist of Endeca, a leading enterprise search vendor, but the views I express on this blog, including those about Endeca’s customers, partners, and competitors, are my own.

And one of my strong personal opinions is that marketing campaigns should be honest. One of the blogs I read is Search Done Right, a corporate blog maintained by Vivisimo. Regardless of my opinions about Vivisimo’s technology, I am a fan of their marketing department. They’ve achieved visibility disproportionate to their market share–in no small part by promoting interface innovation. I also have met their CEO and CTO, and I think they’re both great guys.

But today, I saw a post entitled “New Enterprise Search Pundit on the Scene?“, which I copy in full below:

When I logged into my Vivisimo email this morning, I had a message from a guy named Stan. He must have gotten my contact info from this blog or the LinkedIn Enterprise Search Professionals group as someone interested in search and sent me the manifesto below since I don’t recall meeting him at a recent Gartner PCC show or an Enterprise Search Strategy (ESS) event. At any rate, I bookmarked his site and will let you know when it goes live – should be interesting. Maybe he has the makings of a search industry pundit in him…

I don’t know about you guys, but I’m getting really sick of not being able to find things at my company. I mean, I have a hard enough time finding my coffee cup each morning because someone in Marketing (I know it’s you, Jan) keeps moving it. But when it comes to data, it’s just impossible to find anything. Think about it for a second. On when I’m on the Internet, I do a Google search for “product requirements document” and get more than 22 million results in less than half a second. Try doing that on your intranet. I mean seriously. Can you even find a search box on your intranet? My last company didn’t even have one.

And when there is a box, what does it really search? A couple of intranet pages? What happens if you have multiple intranets? I bet your search only looks at one. Then you have to use a separate desktop search. And another search for email. And another for your external website. All of this searching is stopping us from getting any work done.

Then I heard about this enterprise search thing. Frankly, it sounds too good to be true. Searching across multiple repositories from a single search box. Presenting results into topical clusters. Tagging and rating documents to impact future search relevancy. Sharing results with other users. All of this while respecting my individual security rights. Seems like pie in the sky to me. Is anybody actually doing this, or is it just some marketing hype?

So I’m starting my own website, www.meetstan.com, to figure things out for myself. The site will be up March 16. Come by and tell me what you think.

Curious, I went to Whois.com and looked up meetstan.com. As you can verify for yourself, the registrant is none other than Vivisimo.

Am I naive to be shocked at this sort of marketing gimmick? Perhaps. But I’m sensitive because there aren’t that many people who understand enterprise search, and there’s a lot of concern about analysts offering less than independent opinions. I assume that Vivisimo isn’t planning to the site to promote a shill analyst, but rather is using this blog post to create pre-launch buzz around a marketing portal.

Please, Vivisimo, don’t play these sorts of games. Given how uniformed so many people are about enterprise search, many people are likely to take a hoax like this seriously. That’s bad for Vivisimo’s reputation, and for the field as a whole. I’m sure this was an innocent mistake. I hope it will be quickly resolved, and that you will go back to promoting your technology and vision without resorting to such gimmicks.

I also encourage anyone from Vivisimo to comment here and offer clarification.

General

I Have 10 Aardvark Invites

Post author By Daniel Tunkelang
Post date March 12, 2009
16 Comments on I Have 10 Aardvark Invites

The kind folks at Aardvark appreciated my write-up and sent me an invite, and, as per the usual viral rules, that means I can invite 10 more people. Let me know if you’re interested via the comments. I might be offline much of the day, but I’ll process requests in the order received.

General

Is the Aardvark a Social Animal?

Post author By Daniel Tunkelang
Post date March 12, 2009
10 Comments on Is the Aardvark a Social Animal?

A colleague alerted me to Aardvark, a social search service, scheduled to launch during SXSW, that offers users to ask question via instant messenger or email and receive live answers from your social network. Check out recent coverage by John Batelle and ReadWriteWeb.

The initial press is quite positive. In particular, ReadWriteWeb compares it favorably to asking questions on Twitter:

In our internal tests, we realized that a lot of the answers often rivaled those we received when asking our Twitter network. Thanks to the fact that Aardvark automatically routed our questions to people with the right expertise, all the answers we received so far were top-notch. In case you didn’t like the answer (or if it was obscene), you can flag it and rate it on the service’s website.

I haven’t experienced the service, so I’m in no position to evaluate it. I can’t say I’ve been overwhelemed with social question answering on Google (R.I.P.), Yahoo, or LinkedIn. Asking questions on Twitter works well for me, but that’s probably because I have a substantial number of real, knowledgeable followers (the TunkRank is strong with this one!).

But what I’m not understanding is Aardvark’s incentive system. I’ve looked at their blog and white paper, but I don’t see any mention of tangible or intangible incentives. Perhaps the incentives are reptuation and the interaction itself.

In any case, I’m cautiously optimistic. If anyone has managed to get an invite and can share, I’d greatly appreciate a chance to try it out.

General

Media Cloud: Watch, Analyze, Learn

Post author By Daniel Tunkelang
Post date March 11, 2009
2 Comments on Media Cloud: Watch, Analyze, Learn

A couple of months ago, Tom Tague, who leads the Calais initiative at Thomson Reuters, presented at the New York Semantic Web Meetup. One of the projects he alluded to was announced today and reported in ReadWriteWeb: “Media Cloud Leverages Calais to Track News Trends“:

Media Cloud, a new project from the Berkman Center at Harvard University, has an ambitious goal: It will do the heavy lifting of analyzing stories from thousands of traditional news sources, analyzing the semantics of the content through Calais (covered here and here), and then providing tools to quickly get trending results.

The article also points to an interview of project developer Ethan Zuckerman by the Neiman Journalism Lab.

What particularly excites me about this project is the possiblity of comparing how different news organizations–or, better yet, different clusters of similarly biased news organizations–select and cover news. Ever since hearing Miles Efron present “The Liberal Media and Right-Wing Conspiracies: Using Cocitation Information to Estimate Political Orientation in Web Documents“ at CIKM 2004, I’ve been waiting for someone to take the next step and build analysis tools to compare the media “conspiracies”. For example, what stories are covered in the New York Times, but not in the National Review–and vice versa? Which details appear only in papers associated with one end of the political spectrum?

I don’t know that most people care about these questions. In fact, I suspect they don’t; my experience is that few people are interested in hearing viewpoints that challenge their own. But I fear that we are being personalized to death–that our control over what we read leads to the unfortunate behavior that we only let content through the filter if it reinforces our prejudices.

I know that Media Cloud won’t solve this problem on its own. But at least it’s a great tool for those who do want to broaden their perspectives, and I have hope that intellectually honest people will have the courage to learn from it.

General

Making Ads More Interesting…for Users or for Google?

Post author By Daniel Tunkelang
Post date March 11, 2009
1 Comment on Making Ads More Interesting…for Users or for Google?

Google annouced today that:

We think we can make online advertising even more relevant and useful by using additional information about the websites people visit. Today we are launching “interest-based” advertising as a beta test on our partner sites and on YouTube. These ads will associate categories of interest — say sports, gardening, cars, pets — with your browser, based on the types of sites you visit and the pages you view. We may then use those interest categories to show you more relevant text and display ads.

They do realize that this announcement raises lots of hackles in a world that is increasingly distrustful of Google’s accumulation of data and its control over so much of our online experience. They offer the following as grounds for trusting them:

Transparency – We already clearly label most of the ads provided by Google on the AdSense partner network and on YouTube. You can click on the labels to get more information about how we serve ads, and the information we use to show you ads. This year we will expand the range of ad formats and publishers that display labels that provide a way to learn more and make choices about Google’s ad serving.

Choice – We have built a tool called Ads Preferences Manager, which lets you view, delete, or add interest categories associated with your browser so that you can receive ads that are more interesting to you.

Control – You can always opt out of the advertising cookie for the AdSense partner network here. To make sure that your opt-out decision is respected (and isn’t deleted if you clear the cookies from your browser), we have designed a plug-in for your browser that maintains your opt-out choice.

Despite the predictable reactions from privacy groups, I don’t know that I find behaviorally targeted ads any worse than ads in general. Indeed, Google is probably right that that users will find the ads more relevant–indeed, they have every incentive to increase click-through rates. Privacy groups are right to call out Google’s hypocrisy in changing its tune on behavioral advertising, but so what? If Google’s going to live and die by the ad-supported model and if the overwhelming majority of the online population is on board with it, then, then it’s to be expected that Google will optimize for ad revenue.

Of course, my idea of choice and control is to use an ad blocker (specifically, the CustomizeGoogle Firefox extension), and I think Google takes a very narrow view of transparency. Still, I’m amused that Google is drawing so much heat for what seems to me a minor, incremental change.

Well, a minor change for users. Perhaps it’s not a coincidence that Google’s stock is up 3% today. $3B in market cap is a signifiant increment, even for Google.

General

Exploring Semantic Means

http://static.slideshare.net/swf/ssplayer2.swf?doc=exploringsemanticmeans-090306212947-phpapp01&stripped_title=exploring-semantic-means

I gave a talk last week at the New York Semantic Web Meetup entitled “exploring semantic means“, and I thought readers here might want to peruse the slides. You can see more pictures of the event here, as well as the slides Ken Ellis presented about the work he’s doing at Daylife. I was also interviewed for a few minutes after the talk; I’ll post a link to the podcast when it’s available.

Uncategorized

More Adventures with PR People

Post author By Daniel Tunkelang
Post date March 10, 2009
8 Comments on More Adventures with PR People

A few weeks ago, I wrote a reply to all PR people who seem to think that, because I blog, they should pitch their companies’ press releases at me. I’m not sure whether to be flattered or annoyed.

What I’ve decided to do is share some of my experiences with readers–my top 3 that I haven’t already obliterated beyond recovery. Hopefully these same PR people will learn that indiscriminate marketing isn’t always a net gain. I’ve removed any personally identifying information about the senders; I don’t want vigilante or mischievous readers to get any ideas. Well, at least not to act on them. Here they are, in reverse order of absurdity. Drum roll, please.

#3) The GodTubes Must Be Crazy

Hi Daniel,

I want to introduce you to a new social network called tangle.com. Originally launched in 2007 as GodTube.com, a video sharing site that set the record as the fastest growing Web site in the U.S. during its first month of operation, it attracted 2.7 million users a month. Now, tangle has expanded to become the go-to Web site for the family-friendly community to safely interact on a full social network. Below is the press release that went out this morning, announcing the tangle.com launch.

I’d be happy to arrange a phone interview for you with tangle CEO, Jason Illian, to discuss tangle.com. Jason can provide a unique look into family-friendly social media and how tangle.com differentiates itself from other social networking sites. Additionally, Jason is the author of “MySpace®, MyKids: A Parent’s Guide to Protecting Your Kids and Navigating MySpace.com.”

Please feel free to shoot me an e-mail at XXXXXX@XXXXXXXXXXXX.com or call me at (XXX) XXX-XXXX for more information on tangle.com or to schedule time to speak with Jason.

Thanks for your consideration.

Best,
XXXXXX for tangle.com

#2) Who Died?

Hello Daniel,

Last week, WSJ writer Jeffrey Zaslow reported that, starting next month, the Detroit News and the Detroit Free Press will be offering home delivery just three days a week. So, readers who’ve made a daily ritual of perusing obituaries with their morning coffee — and who won’t go out to buy the paper or go online — aren’t necessarily going to learn about the deaths of their acquaintances. See article here: http://online.wsj.com/article/SB123431793199571075.html

But what if there was a technology that kept readers informed about obituary news, anywhere and at any time of day?

That’s where Tributes.com steps in, the comprehensive resource for local and national obituary news and personal tributes. Tributes.com has over 82 million current and historical death records dating back to 1936.

Tributes.com makes sure that consumers can stay informed 24/7 and connected with accurate obit email alerts for any town in the US, alumni, family name, or military unit. Users can set up alerts based on the zip code they currently reside in as well as previous locations they have lived in, and when someone has passed away in their community, an email will be sent to them with names of those who have passed. Those who like to read the morning obits as much as they like their morning cup of joe won’t have to worry about missing the opportunity to leave a message of condolence or to attend a funeral because of missing the news in the paper.

Like it or not, many newspapers are cutting back on home delivery and people want their news quickly and accurately. Tributes.com is the best alternative, go-to resource for obituary news, making sure no one is left in the dark about a passing.

A few interesting facts surrounding this include:

The obituary market as a $750M-$1B nearly untouched industry

Obituaries- “last man standing” – every other classified section has gone online and made millions (Match.com, eHarmony.com, EBay, Craigslist, Monster.com, etc.)

Newspapers lost $64.5 billion in market value in 12 months in 2008

2.5 million people die in the U.S. every year, and 12,000 of those people are turning 50 every day

Please consider mentioning this in your blog. I would be happy to arrange an interview with Jeff Taylor, founder of Tributes.com and Monster.com, to speak about new online technology and a modern world is changing the face of the print obituary.

For more information or to arrange an interview, please contact me at (XXX) XXX-XXXX xXXX or email me at XXXXXXX@XXXXXXXXXXXX.XXX.

Thanks for your consideration.

Best,

XXXXXXX

And, finally, #1: a letter from the folks at Wolfram Alpha:

Hi Daniel,

I wanted to thank you for your interest in Wolfram|Alpha and for sharing our exciting news with your readers. The response has been fantastic.
We look forward to sharing more news about this new website soon!

If you haven’t already, please sign up to receive Wolfram|Alpha release news.
You can do so at:
http://www.wolframalpha.com/

Thanks again,
XXXXXX

XXXXXX XXXXXX
Wolfram Research, Inc.

I had to say, I was a bit stunned by this email–the only reasonable explanation was my recent post about Wolfram Alpha, which didn’t, in my view, merit such a grateful response. Bemused, I responded and volunteered to look at their technology with an open mind–and even under NDA–if they had anything they were willing to share. I’ll keep you all posted.

Anyway, I hope you all found this amusing. And to any PR people who found their materials reposted here, I hope you understand that unsolicited pitches are fair game. Perhaps this is actually what you hoped for. In that case, you’re welcome, and thank you for helping me entertain my readers.

Uncategorized

Functional Requirements for Bibliographic Records

Post author By Daniel Tunkelang
Post date March 10, 2009
7 Comments on Functional Requirements for Bibliographic Records

This is the first of what I hope to be many guest posts. Our guest blogger is Kelley McGrath, a Cataloging and Metadata Services Librarian at Ball State University Libraries, and at my request she’s supplying a perspective that I feel is crucial for anyone interested in HCIR–that of an actual librarian who deals with the realities of cataloging technologies.

THE PROBLEM

Do people really judge books by their covers? If not, then why is it that the Penguin version of Passage to India is rated 5 stars while the Penguin Classic version only gets 3? A recent BBC blog entry asks this question and concludes that the problem is that bookstore and library records for books (or other things) are designed largely to support inventory tasks and are based on identifiers like ISBNs that relate to particular editions. The BBC points out that sometimes what we really need are what they call “cultural identifiers” for cultural artifacts that point to one place even if you’re talking about the large print or Spanish translation and I’m reading the standard English paperback.

A POSSIBLE SOLUTION

In fact, libraries and the publishing world are well aware of this problem. Since I’m in the library world and I work primarily with cataloging moving images (film, TV, video), I’m going to talk from that perspective. The library world’s proposed solution is an entity-relationship conceptual model called the Functional Requirements for Bibliographic Records (FRBR, often pronounced “ferber”). FRBR divides the bibliographic universe into four main entities, which from the most abstract to concrete are:

Work: This is the BBC’s “cultural artifact” or the abstract commonality of something that is considered the same essential creation. In most cases, this is clear-cut, but there can be disagreements about where to draw the boundaries. For example, is Gus van Sant’s frame-by-frame remake of Psycho a new work or is it an expression of Hitchcock’s work?

Expression: These are versions that vary in content in some significant way. If you actually want to get your hands on something, differences in expression are important. Examples of expressions are things like language translations (dubbed versions, subtitles), accessibility modifications (captions, audio descriptions), widescreen vs. full screen, colorized versions of black and white films, and theatrical release vs. uncut/unrated/director’s cut versions. All differences in expressions may not be practical to track (e.g., the various video versions of Star Wars).

Manifestation: This is more or less the same as what libraries or bookstores keep track of now—generally, a particular published edition or the set of items that have the same characteristics for ordering purposes, e.g., the Warner Home Video DVD with a certain ISBN released in a certain year. From a library perspective VHS vs. DVD vs. Blu-ray, as well as publisher names and publication dates, are manifestation-level attributes.

Item: This is the particular DVD that has a certain barcode on it that you’ve checked out from the library and neglected to bring back on time so now the library wants to charge you a fine.

Particularly as larger, shared library catalogs have become more common, the multi-level FRBR model could be used to present options to users in a more succinct and usable manner. Would it not be easier to see one basic overview record for Hamlet and choices for versions and availability rather than a long list of records of different editions of Hamlet with not much information on the initial hit list page to differentiate them?

STEPS TOWARD THE SOLUTION

So the library world has the challenge of seeing if we can get from where we are now to a FRBR-based model. What we have are records at the manifestation (published edition) level, e.g., the 2-disc special edition released by Paramount on DVD in 2008. This record probably includes information from other levels of the FRBR model such as the director’s name (work level) or the fact that it’s the full screen version (expression level). Unfortunately, these various bits of information are intermingled, not clearly identified in terms of what level they apply to, and often given in free text notes which are hard to analyze or turn into controlled data.

To get from where we are now to a world where the multi-level FRBR model is truly useful, there are a few things I think we need. Again, I am going to talk in terms of film and video, as that is what I work with. I am only going to talk in terms of two levels, work and manifestation (published edition). I think the two levels are a more practical first step. In addition, for most materials this may be a viable approach even in the long run in that the characteristics that identify the expression (version) have to be identified and verified for each new manifestation (published edition) and most of them can be coded in machine-readable form such that expressions (versions) could be automatically calculated.

1. Work sets

The existing manifestation (published edition) records have to somehow be related to the work or works they contain. This is generally done by grouping the records into work sets or by linking the relevant manifestations to work records.

OCLC, a large nonprofit, membership, computer library service and research organization, has developed what is probably the best-known clustering-based FRBR algorithm. OCLC’s algorithm uses several approaches with the most commonly-occurring one based on primary author and title. Because library rules consider most moving images to be works of mixed responsibility without a primary author, this approach works less well for them. The sets that are created by this algorithm are sometimes closer to the expression (version) level than the work level and the data that is displayed to the end user is derived algorithmically from the set of manifestations.

LibraryThing, a social book cataloging site, suggests possible combinations, but relies on human intervention to create its work clusters based on what founder Tim Spalding calls the “cocktail party” test. This test asks whether two people would think they’re talking about the same book in casual conversation. LibraryThing does include basic work-level records, parts of which appears to be surfaced from the manifestation clusters and parts of which are entered manually by users.

Both OCLC and LibraryThing offer services that create work sets on the fly based on an ISBN (XISBN and thingISBN). Manifestation (published edition) records that include multiple works can be particularly problematic for these services.

2. Work Identifiers

As the BBC noted, we need some stable way to identify and refer to works. Both OCLC and LibraryThing offer work identifiers of a sort. OCLC’s database is more comprehensive, but its identifiers are less reliably at the work level. Ed Summers recently pointed out that these identifiers only provide human-readable data and not data useful for machines.

3. Work Records

In the long run there are serious limitations with the clustering approach and with displaying work-level information created by extracting and analyzing data in sets of manifestation (published edition) records. For one thing, if information is automatically generated from clusters, it is difficult to correct errors that may sneak in. This problem may be particularly prevalent in the library world due to the practice of copying much information on new records from previous editions without verification. It is also redundant to re-enter and store all this data in multiple manifestation records when it would be more efficient to assess and maintain it in a single work-level record. In addition, a single work-level record would present consistent information to all users. Currently manifestation records may be more or less complete and may give conflicting information about the same work so what a user finds is arbitrarily influenced by the particular records that happen to be in the library catalog being searched.

A group that I am part of, OLAC (Online Audiovisual Catalogers), has created a task force to examine what it might mean to create work-level records for moving images in a library context and also to what extent we might be able to leverage existing library data. We did a project to try to extract work-level data from existing manifestation (published edition) records. Lynne Bisko and I have an article about our experience in The Code4Lib Journal. Our basic idea is that by extracting what we can from existing library records, possibly in combination with information from external data sources, we can create basic, good-enough work-level records that can be corrected and maintained by human beings. These work-level records could then be linked to our existing manifestation-level records. This data could be used to create an interface to provide better access to moving images both in terms of the work (director, cast, original date, language, and country of production, place and time period of setting, etc.) and in terms of the characteristics of the expression and manifestation that will help users easily select the best items for their needs (language(s) of audio and subtitle tracks, captioning, DVD vs. VHS vs. Blu-ray, availability, etc.).