Michael Mitzenmacher has a great post this morning on a humorous attempt to outsource the undecidable halting problem to GetACoder.com, complete with advice from Georg Cantor.
Author: Daniel Tunkelang
High-Class Consultant.
Web Search Can Cause Cyberchondria
Ryen White and Eric Horvitz just published a tech teport on “Cyberchondria: Studies of the Escalation of Medical Concerns in Web Search“:
The World Wide Web provides an abundant source of medical information. This information can assist people who are not healthcare professionals to better understand health and disease, and to provide them with feasible explanations for symptoms. However, the Web has the potential to increase the anxieties of people who have little or no medical training, especially when Web search is employed as a diagnostic procedure. We use the term cyberchondria to refer to the unfounded escalation of concerns about common symptomatology, based on the review of search results and literature on the Web. We performed a large-scale, longitudinal, log-based study of how people search for medical information online, supported by a large-scale survey of 515 individuals’ health-related search experiences. We focused on the extent to which common, likely innocuous symptoms can escalate into the review of content on serious, rare conditions that are linked to the common symptoms. Our results show that Web search engines have the potential to escalate medical concerns. We show that escalation is influenced by the amount and distribution of medical content viewed by users, the presence of escalatory terminology in pages visited, and a user’s predisposition to escalate versus to seek more reasonable explanations for ailments. We also demonstrate the persistence of post-session anxiety following escalations and the effect that such anxieties can have on interrupting user’s activities across multiple sessions. Our findings underscore the potential costs and challenges of cyberchondria and suggest actionable design implications that hold opportunity for improving the search and navigation experience for people turning to the Web to interpret common symptoms.
I am the Pegman, GOOG G’Job!
Google just released some nice enhancements to Street View. It’s tempting to make Clippy jokes about their mascot, Pegman, but the Google Maps team clearly understands how to make an avatar non-invasive.
Video tour below (narrated by Pegman, of course):
Ephemeral Conversation Is Dying
Bruce Schneier had a a column in the Wall Street Journal a few days ago entitled “Why Obama Should Keep His BlackBerry – But Won’t“. He uses Obama’s BlackBerry dilemma to make the broader point that, we’ve moved from an assumption of privacy to a world where everything is recorded. His argument is that, rather than trying to turn back the clock, we need to adjust our legal and cultural norms to the reality of our digital trails. (via Vincent Gable’s comment at whydoeseverythingsuck.com)
I’m reminded of these thoughts from Danah Boyd’s Master’s Thesis on managing identity in a digital world:
Although it may seem advantageous to have historical archives of social interactions, these archives take the interactions out of the situational context in which they were located. For example, by using a search engine to access Usenet, people are able to glimpse at messages removed from the conversational thread. Even with the complete archive, one is reading a historical document of a conversation without being aware of the temporal aspect of the situation. As such, archived data presents a different image to a viewer who is accessing it out of the context in which it was created.
Digital archives allow for situational context to collapse with ease. Just as people can access the information without the full context, they can search for information which, when presented, suggests that two different bits of information are related. For example, by searching for an individual’s name, a user can acquire a glimpse at the individual’s digital presentation across many different situations without seeing any of this in context. In effect, digital tools place massive details at one’s fingerprint, thereby enabling anyone to have immediate access to all libraries, public records and other such data. While advantageous for those seeking information, this provides new challenges for those producing sociable data. Although the web is inherently public, people have a notion that they are only performing to a given context at a given time. Additionally, they are accustomed to having control over the data that they provide to strangers. Thus, people must learn to adjust their presentation with the understanding that search engines can collapse any data at any period of time.
And Danah wrote those words seven years ago, before Facebook and Twitter invented micro-blogging and inspired people to voluntarily live in virtual fishbowls. I’ve blogged about the end of privacy through difficulty, but it seems we’re heading in a direction of no privacy at all. It will be interesting to see how the next generation frames this discussion.
A Little Bird Told You…
Since I’m taking a week off after Thanksgiving, I thought I’d be clever and schedule a week’s worth of posts to appear daily while I’m gone. What I didn’t count on, however, is that a bug in TwitterUpdater would post tweets as soon as I scheduled the posts, rather than when they were published. My apologies to Twitter followers for the broken links. But now that you know what’s coming, I hope you’ll come back soon and read the posts as they appear!
I just ran into an intersting post by Vidsense CEO Jaffer Ali entitled “Has Online Advertising Lost Its “Schwerpunkt”?”. Its premise : “Creativity and strategic thinking and planning have become subservient to technology under the guise of analytics.” Part of his evidence includes the how marketers evaluate advertising agencies:
In 2005, those marketers surveyed listed the order of qualities they looked for in their agencies:
1. Quality of Creative Content
2. Price/Cost
3. Innovation and Strategic value
4. Traditional print, offline services
5. Sophisticated analytics/measurement
6. Proficiency in emerging/interactiveIn 2008 the results of the same survey were quite different:
1. Sophisticated analytics/measurement
2. Proficiency in emerging/interactive
3. Price/Cost
4. Quality of creative (virtual tie with price/cost)
5. Traditional print, etc.
6. Innovation and Strategic value
It’s a bit of a curmudgeonly piece, but it rings true. In the heat of web analyitics, we sometimes forget that attention isn’t just a commodity, but represent real human beings exercising free will. Nice to see this coming from someone who sells ads on a Pay-Per-Click (PPC) basis.
Rethinking the ESP Game
Thanks to Amir Michail for a tweet that alerted me to a technical report by Ingmar Weber, Stephen Robertson, and Milan Vojnovic on “Rethinking the ESP Game“.
The ESP Game is a human-based computation game designed by Luis Von Ahn to tag images. Part of the his motivation for developing the game was that image tagging was too hard for machines. Hence, he decided to make it fun for humans to volunteer their own labor to the cause.
But it turns out that, once humans have supplied some initial labels to the game, a machine can take over. Here is the abstract for the technical report:
The ESP Game was designed to harvest human intelligence to assign labels to images – a task which is still difficult for even the most advanced systems in image processing. However, the ESP Game as it is currently implemented encourages players to assign “obvious” labels, which are most likely to lead to an agreement with the partner. But these labels can often be deduced from the labels already present using an appropriate language model and such labels therefore add only little information to the system.
We present a language model which, given enough instances of labeled images as training data, can assign probabilities to the next label to be added. This model is then used in a program, which plays the ESP game without looking at the image. Even without any understanding of the actual image, the program manages to agree with the randomly assigned human partner on a label for 69% of all images, and for 81% of images which have at least one “off-limits” term assigned to them. We then show how, given any generative probabilistic model, the scoring system for the ESP game can be redesigned to encourage users to add less predictable labels, thereby leading to a collection of informative, high entropy tag sets. Finally, we discuss a number of other possible redesign options to improve the quality of the collected labels.
Daniel Lemire commented recently that spammers may help make AI a reality. It’s interesting to see how yesterday’s Turing Test has become today’s CAPTCHA, and the competition between humans and machines is making both smarter.
LinkedIn’s New Search Platform: A Review
I’m an avid LinkedIn user and fancy myself an expert on search, so I was excited today to see that LinkedIn has officially launched its new search platform. I recommend you watch the four-minute video below to get an overview of the new features.
First, the good news. The interface is slick and streamlined as promised. Type-ahead works smoothly, though it is restricted to the names of your contacts. The in-page options for refining your query by changing the sort or adding parameters are a welcome improvement to their previous interface, which took you to another page to make query modifications. And the presentation of results is clean and effective. Finally, there is a “saved searches” feature that also acts as a running query to alert you to new results. This is a great feature for recruiters.
Now, the bad news. There is still no support for exploratory search. I get 1,198 results when I search for Endeca (your results will depend on your personal network). That’s far too many to look through if I’m trying to establish contacts at a company, and sorting by relevance or relationship strength has limited value. What I’d really like is an overview of those 1,198 people that I can explore–by location, job title, their present and past relationship to Endeca, etc. Faceted search would certainly be more helpful than the unguided parametric search they offer.
I don’t mean to damn LinkedIn with faint praise–this is a significant improvement from the experience they offered before. But I wish that they would recognize the importance of exploratory search in the context of a professional networking site. As it is, they are leaving so much on the table.
Today, I heard about two blog analysis tools. Ever the empiricist, I decided to try them out.
Let’s start with GenderAnalyzer, which claims it can determine the gender of a blog author.
Well, it thinks I’m probably male:
- We have strong indicators that https://thenoisychannel.com/ is written by a man (90%).
That’s close enough to spare me any major gender identity issues. Well, maybe. Let’s look at a few blogs that are written by women:
- Gwen Harris’s Taxonomy Watch:
We have strong indicators that http://taxonomy2watch.blogspot.com/ is written by a man (95%). - Essays by Danielle Fong:
We think http://einfall.wordpress.com/ is written by a man (79%). - Claudia Imhoff at BeyeNETWORK:
We think http://www.b-eye-network.com/blogs/imhoff/ is written by a man (89%).
Perhaps GenderAnalyzer doesn’t appreciate women in science and technology. Or perhaps it isn’t doing much better than random.
On to Typealyzer, which claims to perform a Myers-Briggs Type Indicator (MBTI) personality test on a blog. Let’s set aside the unscientifc basis of the theory and try it out.
- The analysis indicates that the author of https://thenoisychannel.com/ is of the type: ISTJ.
ISTJ, huh. I realize that some of you have never met me, but I assure you than I am an off-the-chart extrovert. If these personality tests have any merit, I’m somewhere between an ENTJ and an ENFJ. And, while Typealizer has a disclaimer that “writing style on a blog may have little or nothing to do with a person´s self-percieved personality”, I assure you that the personality of my blog accurately reflects the personality of its author.
So I wouldn’t put too much stock in blog analysis tools. Perhaps these two aren’t the best examples of the genre. But for now I’d suggest they be used for entertainment purposes only.
When I saw the post over at Peter Morville’s findability.org blog featuring this slideshow, I could only think of Eminem’s “The Real Slim Shady“. It’s a nice presentation, enjoy!