Categories
Uncategorized

Note to Bloggers: Don’t Quit Your Day Job

Dan Lyons, better known to most as Fake Steve Jobs, wrote an article in Newsweek today entitled “Time to Hang Up the Pajamas“, or “Growing Rich by Blogging Is a High-Tech Fairy Tale”.

An excerpt:

My first epiphany occurred in August 2007, when The New York Times ran a story revealing my identity, which until then I’d kept secret. On that day more than 500,000 people hit my site—by far the biggest day I’d ever had—and through Google’s AdSense program I earned about a hundred bucks. Over the course of that entire month, in which my site was visited by 1.5 million people, I earned a whopping total of $1,039.81. Soon after this I struck an advertising deal that paid better wages. But I never made enough to quit my day job.

Read the whole post, especially if you’ve ever entertained fantasies of blogging to generate a primary income. I’m not saying you can’t use a blog to promote yourself and cultivate a reputation that you can monetize. But I think it’s unlikely that you’ll make more money from Google AdSense than Fake Steve Jobs.

Categories
Uncategorized

WikiDashboard: Visualizing Wikipedia Edits

Ed Chi, a senior research scientist at the Palo Alto Research Center (PARC), recently delivered a presentation at MIT about  WikiDashboard, a tool that he and PARC colleague Bongwon Suh developed in order to visualize the dynamic nature of Wikipedia’s collaborative editing process. Erica Naone, a regular here at The Noisy Channel, wrote a nice article about it in Technology Review, entitled “Who’s Messing with Wikipedia?“.

I like Ed Chi’s work, and we talked about the WikiDashboard project when I visited him at PARC just over a year ago.  But, as I was quoted in the article, I do wonder what problem this visualization aims to solve. A picture, it is said, is worth a thousand words, but this feels too much like looking at a thousand words. I hope that Ed and the team at PARC invest in distilling a more consumable signal out of this wealth of data that can be applied to solve real problems.

I also hope, that as Rob Miller points out in the article, the collection and publication of such measurements does not simply enourage people to game them.

Categories
Uncategorized

Comments I Read: Jeremy Pickens

Jeremy Pickens doesn’t have a blog–as far as the blogosphere goes, he is homeless. Or rather, he likes to hang out at my house–which is great, because he’s the kind of guest who brings over good wine and then helps you with the cooking. He is by far the most active contributor to the comment threads here at The Noisy Channel. If you are reading this blog through an RSS reader and skipping the comments, here is a taste of what you’ve been missing:

I met Jeremy a few years ago–at RIAO 2007 in Pittsburgh if I recall correctly. He co-authored a paper on “Collaborative Exploratory Search” presented at the inaugural HCIR workshop that same year.

As his home page at FXPAL tells us:

Jeremy’s major research themes, since joining FXPAL in 2005, include Music Information Retrieval, Video Information Retrieval and Collaborative Exploratory Search (Collaborative Information Seeking). He earned his Ph.D. from the University of Massachusetts Amherst at the Center for Intelligent Information Retrieval (CIIR). Jeremy did his post-doctoral work at King’s College in London from 2004-2005.

Jeremy is too modest to claim credit for his outsize contributions to this blog, so I thought I’d break convention and allow his collective comments to qualify as a “blog I read”.  They are certainly worth reading. and I hope he keeps contributing once he does have a blog of its own–which is inevitable.

Categories
Uncategorized

ACM Recommendations on Open Government

The ACM U.S. Public Policy Committee just published its Recommendations on Open Government:

  • Data published by the government should be in formats and approaches that promote analysis and reuse of that data.
  • Data republished by the government that has been received or stored in a machine-readable format (such as as online regulatory filings) should preserve the machine-readability of that data.
  • Information should be posted so as to also be accessible to citizens with limitations and disabilities.
  • Citizens should be able to download complete datasets of regulatory, legislative or other information, or appropriately chosen subsets of that information, when it is published by government.
  • Citizens should be able to directly access government-published datasets using standard methods such as queries via an API (Application Programming Interface).
  • Government bodies publishing data online should always seek to publish using data formats that do not include executable content.
  • Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.

I’ve advocated for such openness myself, and I delighted that the ACM, which represents the concerns of me and tens of thousands of computer science professionals, is taking a stand on this important policy issue.

Categories
Uncategorized

Matt Cutts: Google Still Has Big Ideas

While I strive to be fair and balanced in my coverage of companies–especially those that in any way compete with Endeca–somehow I seem to come down hard on Google.

But today I’m glad to have the opportunity to point readers to a post by Matt Cutts, the head of Google’s Webspam team (and a speaker at the SIGIR ’09 Industry Track!), defending Google against the oft-repeated charge (this time by Om Malik)  that Google has run out of big ideas.

I do note that, other than the deep web research, which I covered in an earlier post, I don’t see much about how Google is innovating in search. Perhaps Google is done with search, and is focusing its innovation efforts elsewhere? While I’m personally interested in solving the open problems in the search space, I don’t doubt that investigating alternative energy is imporant too.

Categories
Uncategorized

SIGIR ’09 Industry Track: The Details You’ve Been Waiting For

Several weeks ago, I announced that I’d be organizing the Industry Track at SIGIR ‘09. Some of you offered helpful suggestions, and many people expressed excitement at this opportunity to bring together the too-often separate worlds of research and practice.

Today, I am proud to share more details about the Industry Track. It will take place during the regular conference program, on Wednesday, July 22nd. Everyone who signs up for the full conference program (July 19th – 23rd) will be allowed to attend the Industry Track sessions at no extra charge. There will also be a one-day registration option.

But the reason you’ll want to be there is the incredible set of people who will be presenting during this full-day event:

We will also have a pair of panels devoted to enterprise search:

For researchers, the Industry Track provides an opportunity to learn about the industry side of information retrieval from some of its leading lights. In particular, it brings together the major players from both web search and enterprise search.

For industry practitioners who may never have heard of SIGIR before, let alone attended one of its conferences, the Industry Track offers an unprecedented opportunity to learn about the science and technology of information retrieval in a vendor-neutral, analyst-neutral setting. All without paying an extortionary registration fee.

Finally, for everyone in the space, this is an incredible professional networking opportunity. I look forward to seeing you there!

Categories
Uncategorized

Is Google Serious about Exploratory Search?

I just noticed this post by Sarah Perez on ReadWriteWeb: “Google: “We’re Not Doing a Good Job with Structured Data“. Here is the excerpt that caught my attention:

The company wants to be able to separate exploratory queries (e.g., “Vietnam travel”) from ones where a user is in search of a particular fact (“Vietnam population”). The former query should deliver information about visa requirements, weather and tour packages, etc.

While I’m not sure how important it is to automatically distinguish between “exploratory queries” and fact-finding ones, I absolutely agree that neither Google nor its web search rivals have delivered much in the way of exploratory search capabilities to users.

Alon Halevy, who is leading up Google’s “deep web” efforts, seems to understand the problem, even if I might quibble on some of the details. The question is: how does he plan to solve it? And how will his solution integrate with Google’s decidedly non-exploratory approach to information seeking?

Categories
Uncategorized

Search-Related Conferences: Where’s The Beef?

The other day, Stephen Arnold published a post entitled “Conference Spam or Conference Prime Rib” bemoaning “an increasing amount of conference spam” in the enterprise search space. Sharing his frustration with the marketing and hype that passes for technical discussion in this field, I posted a comment extolling the upcoming SIGIR Industry Track as an opportunity to bring some substance to the conversation.

To my delight, Arnold included the comment in a follow-up blog entry today. While I don’t take this as an explicit endorsement, I find his arguments very consonant with the case I made in my call to action several months ago. Moreover, Arnold doesn’t mince words when it comes to criticizing trade shows in which “the real losers are the attendees who spend money and invest time to hear lousy speakers or sales pitches advertised as original, substantive talks.”

Over the next months, I’m attending the Enterprise Search Summit, presenting at the Infonortics Search Engine Meeting, and organizing the SIGIR Industry Track. I’m also presenting at Discover, Endeca’s annual user conference. At all of these events, I expect substance, not warmed-over sales pitches. Hopefully these two posts on Arnold’s widely read blog will help inspire such an outcome, or at least will serve to shame some of the worst offenders.

Categories
Uncategorized

Wikipedia Embracing Information Accountability?

Given the recent debate on this blog over the merits of Wikipedia, I’m tickled to see that Jimbo Wales, Wikipedia’s controversial founder, is coming out in favor of requiring anonymous edits to be approved. (via Matthew Webber at UID Teatime Blog). I’m stoked, since this is the one point on which I think Knol beats Wikipedia. Read my past rants on the subject here.

Categories
Uncategorized

No Privacy Through Difficulty

I’ve blogged in the past about the futility of increasing futility of pursuing privacy through difficulty, and generally advocate an approach of “when in doubt, make it public”.

But, if you have any doubt about the current state of personal privacy, read what Robert Mitchell’s article in Computerworld entitles “What the Web knows about you“. As he says, “Social Security numbers are just the beginning.”