Categories
Uncategorized

Google Discovers RSS

At long last, Google Alerts are available as RSS feeds! While I realize many people accept email as a means of integrating multiple information feeds, I’m glad that Google has embraced a protocol that is far more suited to the purpose.

Time for me to go off and convert my alerts from email to RSS…

Categories
Uncategorized

All the News that’s Fit to Text Mine

My friend Evan Sandhaus at the New York Times Company told me the other day that the paper of record would be releasing a large collection of their articles. Well, the New York Times Annotated Corpus is here!

For full details check out this overview document, but here are some vital stats to whet your appetite:

  • Over 1.8 million articles written and published between January 1, 1987 and June 19, 2007.
  • Over 650,000 article summaries written by the staff of The New York Times Index Department.
  • Over 1.5 million articles manually tagged by The New York Times Index Department with a normalized indexing vocabulary of people, organizations, locations and topic descriptors.
  • Over 275,000 algorithmically-tagged articles that have been hand verified by the online production staff at NYTimes.com.

LDC members can obtain the corpus for free; non-members pay $300.

This is an exciting development, and yet another encouraging sign that old media dogs can learn new tricks. Thanks to Jon and Panos for posting about it today.

Categories
Uncategorized

Is the Information Retrieval Community Making Progress?

Join the discussion over at Probably Irrelevant.

Categories
Uncategorized

Freebasing

Just saw on Stefano’s Linotype that social datbase Freebase launched a version 4.0. I think Stefano does a better job marketing the redesign than the official Freebase blog. The positives he cites:

  • reducing data agoraphobia
  • one size does not fit all
  • increasing relational density

I really want to be excited about Freebase, especially given the cool interface work that David Huynh is doing there. But I just don’t get it. Is there anyone here who has drank the kool-aid and can explain it to me?

Categories
Uncategorized

Video Conferences Distort Judgment?

Here’s the report via ChiefTech:

A small study raises questions about whether videoconferencing distorts interactions in a subtle but important way.

The study found that doctors and nurses who attended seminars via videoconference were more likely to be influenced by the charisma of the presenter.

In contrast, people who were face-to-face with the presenter were more likely to base their judgment of the presentation on the arguments that were used, the researchers said.

I’ve never been a big fan of videoconferences, but this is the first time I’ve even seen this argument proposed, let alone empirically validated. Personally, I’m more likely to tune out of a remote presentation than to be mesmerized by it. But perhaps that isn’t so different: remote presenters bear a much stronger burden of keeping the audience’s attention, which places a higher premium on charisma than in face-to-face meetings where the audience is more captive.

In any case, it’s a reminder that the way we consume information often matters as much as the information itself.

Categories
Uncategorized

Andrew Tomkins to Academics: Work on Social Media Search

Yahoo Researcher Andrew Tomkins gave the keynote at the CIKM 2008 Workshop on Search and Social Media, entreating academics to forget about core web search, where they can’t compete on a level playing field with commercial search engine companies, and instead to focus on social media search. Notes here, courtesy of Matt Hurst.

Thanks to him and everyone else blogging and twittering from the conference!

Categories
Uncategorized

Blogs I Read: Dave Kellogg

When I decided to start blogging, I scoured the blogosphere for role models. In particular, I looked for examples of people who, despite a strong corporate affiliation, managed to create a very non-corporate, personal voice. It didn’t take long to find Dave Kellogg.

Kellogg is the CEO of Mark Logic, a company that sells an “XML content platform”. Their corporate home page links to his blog, but describes it as a place where he “rifs, rants, and occasionaly raves about content technology.”

It’s an accurate if incomplete description–I’m not sure he restricts his attention to content technology. Some of his exemplary posts:

While Kellogg’s blog probably violates every rule in the PR manual, it’s his strong, unadulterated voice that makes the blog worth reading. And I suspect that his blog gives his company more ROI on marketing than any other promotion efforts.

In any case, the blog makes for informative and entertaining reading. I may not always agree with his content, but I like his style.

Categories
Uncategorized

HCIR ’08 Proceedings Now Online

The HCIR ’08 proceedings are now available at the workshop website. Enjoy!

Categories
Uncategorized

Buzz for Buzzillions

One of Endeca‘s partners, PowerReviews, got a nice write-up in the Wall Street Journal for their reviews and recommendation site, Buzzillions. They’re at the forefront of social navigation, something I think we’ll see increasingly in online retail and media.

Here’s a screenshot to get you in the Halloween spirit:

Categories
Uncategorized

Google’s Getting MaverWiki

Sorry, I couldn’t resist. But I do thank Jeff for alerting me to Google’s SearchWiki efforts:

The new feature is a more transparent way to personalize search results; this time, Google allows users to decide which search results are the most relevant and to share those findings with other users. Instead of bookmarking the results or saving them in Google Notebook, you can make them more visible on a search results page and find them when you search later. Unfortunately, Google’s interface will become cluttered unless Google decides to hide the new options until you click on a link like “Edit the search results”.

Of course, the big question is if / how a user’s feedback will affect result ranking for everyone else. Given Google’s experience with fighting spam, I’d imagine they’d know better than to provide an easily gamed feedback.