Categories
Uncategorized

A Museum of Mathematics

Mathematics illuminates the patterns that abound in our world. The Math Factory strives to enhance public understanding and perception of mathematics. Its dynamic exhibits and programs will stimulate inquiry, spark curiosity, and reveal the wonders of mathematics. The museum’s activities will lead a broad and diverse audience to understand the evolving, creative, human, and aesthetic nature of mathematics.

The above is the mission statement of The Math Factory, an organization headed by former Renaissance Technologies analyst (and CTY alumnus) Glen Whitney that aspires to build a national museum of mathematics in New York. The effort is well underway–the organization has raised $4M to date, attracted an impressive group of trustees and advisors, and has obtained quite a bit of enthusiastic press coverage. No wonder–the Math Midway it exhibited at the World Science Festival this past June was a wild success. I’d gone there to offer moral support, only to find that I was lucky to get close enough to see the exhibits!

Last night, I was fortunate enough to attend a gala at the Urban Academy and actually play with the exhibits–from riding a tricycle with square wheels to walking through a maze without making left turns. It was a blast! And, while I’ll admit to being favorably predisposed towards math, the exhibits hardly required such a predisposition–any more than the Exploratorium in San Francisco requires a predisposition towards science. Rather, experiences like these create excitement, overcoming the negative preconceptions that too many children (and adults!) have about this subjects.

While I suspect that many Noisy Channel readers are already sold on both the enjoyment and core societal value of mathematics, I encourage you to think about how much better a world we would have if this appreciation were more widely shared. For those who have to think about large numbers just to manage their assets, I encourage you to think of The Math Factory as worthy of your philanthropy. I encourage everyone to contribute your ideas and endorsements to this visionary effort.

Categories
General

Privacy, Pseudonymity, and Copyright

A lunch conversation during the Transparent Text symposium about transparency in social media (also a hot topic in the Ethics of Blogging panel) led me to watch the following presentation from Lawrence Lessig on “Privacy 2.0“:

http://blip.tv/play/lG372wMC

Another topic in that conversation was pseudonymity. Someone pointed to a 2000 USENIX paper entitled “Can Pseudonymity Really Guarantee Privacy?” The challenges of implementing pseudonymity have, of course, received lots of attention in the past few years. The most notorious example is the AOL search data scandal, which made the front page of the New York Times. But there’s also the work co-authored by my friend Vitaly Shmatikov on de-anonymizing Netflix data. Indeed, some have expressed concern that the new Netflix competition is a privacy lawsuit waiting to happen.

Finally, danah boyd‘s master’s thesis on “faceted id/entity: managing representation in a digital world” also came up–and I recently discovered by way of Robert Scoble that she’ll be keynoting at SXSW next year. Now I feel even more proud that I convinced her to speak at the SIGIR Industry Track this year. But I digress.

What does any of this have to do with copyright? Watch Lessig’s presentation–it’s long, but I promise you it’s worthwhile and entertaining to boot. Besides, I’ve made it easy by embedding it for you! He makes an analogy–rather, he makes fair use of Jonathan Zittrain‘s analogy–between privacy rights and copyright.

The executive (and overgeneralized) summary is that both privacy-holders (“consumers”) and copyright-holders (“industry”) have complained that technology has undermined their rights, and both have sought out legal remedies. Consumers push back on industry, frustrated with legal strategies to enforce copyright at the expense of consumer freedom, preferring instead to let technology dictate policy; industry pushes back on consumers, frustrated with their legal strategies to enforce privacy rights at the expense of industry freedom, in this case preferring instead to let technology dictate policy. The analogy may not be perfect, but it is close enough to be compelling.

But I’d like to stretch the analogy further than Lessig and Zittrain to consider pseudonymity and derivative works. The pseudonymity challenge (e.g., the recent reports about Project Gaydar) remind us that privacy isn’t binary, and that we have to accept at least some loss of privacy if we are going to live in a social world. Similarly, provisions like fair use exist because copyright is an inherent trade-off between protecting creators’ rights and embracing the value of creation in a social context.

As I said, I find the Zittrain’s analogy and Lessig’s presentation compelling. While it may not answer any of society’s urgent questions about privacy and copyright, it may at least further the conversation. At the very least, I hope the topic is intellectually stimulating.

Categories
Uncategorized

Ethics of Blogging: Webcast Now Available

Thanks to Robin Fray Carey for posting the webcast of the Ethics of Blogging panel on the Social Media Today site. You can also catch the tweet stream at #SMTWebcast while it’s still indexed.

Categories
General

Human-Computer Information Retrieval in Layman’s Terms

One of the great benefits of practicing, as Daniel Lemire calls it, open scholarship is that I have many opportunities to see how ideas translate across the research / practice divide. In particular, I obtain invaluable feedback on the accuracy and effectiveness of that translation process.

A few days ago, I was exchanging email with serial entrepreneur Chris Dixon about human-computer information retrieval (HCIR). He’d just looked through the accepted submissions list for HCIR 2009 and said, if I may paraphrase: this is great stuff, but it needs to be better communicated for broader consumption. I quickly shot back a reaction that I’ll excerpt here (when in doubt, make it public!):

At some level it’s blindingly obvious: to err is human, to really screw up takes a computer. The HealthBase fiasco isn’t a shocker: lots of people are skeptical of pure AI approaches.

What people don’t get is that you can work to optimize the division of labor. I’m evangelizing it in places like Technology Review–a bit more mainstream than my blog. But ultimately the message has to resonate with entrepreneurs and investors who will make that vision a reality. Endeca is all about HCIR. Bing is a step in the right direction for the open web. But there’s a long way to go.

His response: that’s a lot more consumable that any other description of HCIR he’d seen to date (and he’s a regular reader here!). Having just finished reading Steve Blank’s Four Steps to the Epiphany, I appreciate his point: in a new market, the most critical priority is educating the potential customers.

As a number of us prepare for the HCIR 2009 workshop, that’s something to keep in mind. There’s a natural tension between rigorous scholarship and mass communication, but some have the greatest scholars (e.g., Richard Feynman and Linus Pauling) have shown the way for us mere mortals. Indeed, in a field as cross-disciplinary as HCIR, we would do well to make our work and vision as broadly consumable as possible, albeit without oversimplifying it to the point that it is vapid or even misleading.

Generally speaking, I blog in order to convince people that some of the esoteric ideas I encounter–and the occasional ideas I am fortunate enough to conceive–are worthy of broader consideration. I started blogging in order to bring greater visibility to HCIR–to convince people that the choice between human and machine responsibility is a false dichtomy in almost every aspect of the information seeking process.

In grade school, I learned that division of labor is the cornerstone of civilization–perhaps and our adaptive process of allocating effort our greatest achievement as a species. As machines play an increasingly important role in our lives–and serve as the lenses through which seek and consume almost all information–it is key that we not forget our roots. Let us be neither Luddites nor passive participants, but rather let us help computers help us.

Categories
General

Information Retrievability

Last year, I wrote a post about Leif Azzopardi and Vishwa Vinay‘s work on information accessibility:

Instead of an actual physical space, in IR, we are predominately concerned with accessing information within a collection of documents (i.e., information space), and instead of a transportation system, we have an Information Access System (i.e., a means by which we can access the information in the collection, like a query mechanism, a browsing mechanism, etc). The accessibility of a document is indicative of the likelihood or opportunity of it being retrieved by the user in this information space given such a mechanism.

After reading a pre-print of my HCIR 2009 position paper about the information availability problem, Vinay pointed me at follow-up work he’d done with Leif on information retrievability. I agree with his observation that, while I look at information availability from a user-centric perspective; they consider retrievability from  a document- or system-centric perspective. The approaches are complementary, and both add to a growing body of work that advocates a holistic model of how users access information, rather than a narrow focus on reductionist measures like precision and recall at the level of individual queries.

To be clear, those reductionist measures still have their place. In fact, I’m looking forward to NIST‘s Ellen Voorhees defending Cranfield next month to an HCIR crowd that is, for the most part, deeply suspicious of it.

Categories
General

Free Chapter on Faceted Search User Interface Design

If you are are interested in user interface design for faceted search–and I know that’s a hot topic for many Noisy Channel readers–then be sure to check out this free book chapter by Moritz Stefaner, Sébastian Ferré, Saverio Perugini, Jonathan Koren, and Yi Zhang.

By the way, a chapter of my own book on faceted search is also available for free online, as is Marti Hearst‘s entire book on search user interfaces.

Categories
Uncategorized

Ethics of Blogging Panel Today

Just a reminder that I’m participating in an online panel today (at 1pm EST) to discuss the Ethics of Blogging.

Maggie Fox, founder and CEO of Social Media Group, will moderate a panel composed of Augie Ray, who blogs at Experience: The Blog) and is Managing Director of Experiential Marketing at interactive and social media agency Fullhouse; John Jantsch, who blogs at Duct Tape Marketing and is a marketing and digital technology coach; and yours truly. It’s free to attend; just register here.

Categories
General

HCIR 2009 Accepted Submissions

The agenda for HCIR 2009 is now online! As previously announced, Ben Shneiderman from the University of Maryland will be the keynote speaker. The accepted submissions are as follows:

Panel Presentations

  • Usefulness as the Criterion for Evaluation of Interactive Information Retrieval
    Michael Cole, Jingjing Liu, Nicholas Belkin, Ralf Bierig, Jacek Gwizdka, Chang Liu, Jun Zhang and Xiangmin Zhang (Rutgers University)
  • Modeling Searcher Frustration
    Henry Feild and James Allan (University of Massachusetts Amherst)
  • Query Suggestions as Idea Tactics for Information Search
    Diane Kelly (University of North Carolina at Chapel Hill)
  • I Come Not to Bury Cranfield, but to Praise It
    Ellen Voorhees (National Institute of Standards and Technology)
  • Search Tasks and Their Role in Studies of Search Behaviors
    Barbara Wildemuth (University of North Carolina at Chapel Hill) and Luanne Freund (University of British Columbia)

Posters and Demonstrations

  • Visual Interaction for Personalized Information Retrieval
    Jae-wook Ahn and Peter Brusilovsky (University of Pittsburgh)
  • PuppyIR: Designing an Open Source Framework for Interactive Information Services for Children
    Leif Azzopardi (University of Glasgow), Richard Glassey (University of Glasgow), Mounia Lalmas (University of Glasgow), Tamara Polajnar (University of Glasgow) and Ian Ruthven (University of Strathclyde)
  • Designing an Interactive Automatic Document Classification System
    Kirk Baker (Collexis)
  • The HCI Browser Tool for Studying Web Search Behavior
    Robert Capra (University of North Carolina at Chapel Hill)
  • A Graphic User Interface for Content and Structure Queries in XML Retrieval
    Juan M. Fernández-Luna, Luis M. de Campos, Juan F. Huete and Carlos J. Martin-Dancausa (University of Granada)
  • Improving Search-Driven Development with Collaborative Information Retrieval Techniques
    Juan M. Fernández-Luna (University of Granada), Juan F. Huete (University of Granada), Ramiro Pérez-Vázquez (Universidad Central de Las Villas) and Julio C. Rodríguez-Cano (Universidad de Holguín)
  • A visualization interface for interactive search refinement
    Fernando Figueira Filho (State University of Campinas), João Porto de Albuquerque (University of Sao Paulo), André Resende (State University of Campinas), Paulo Lício de Geus (State University of Campinas) and Gary Olson (University of California, Irvine)
  • Cognitive Dimensions Analysis of Interfaces for Information Seeking
    Gene Golovchinsky (FX Palo Alto Laboratory, Inc.)
  • Cognitive Load and Web Search Tasks
    Jacek Gwizdka (Rutgers University)
  • Visualising Digital Video Libraries for TV Broadcasting Industry: A User-Centred Approach
    Mieke Haesen, Jan Meskens and Karin Coninx (Hasselt University)
  • Log Based Analysis of How Faceted and Text Based Searching Interact in a Library Catalog Interface
    Bradley Hemminger (University of North Carolina), Xi Niu (University of North Carolina) and Cory Lown (NC State Libraries)
  • Freebase Cubed: Text-based Collection Queries for Large, Richly Interconnected Data Sets
    David Huynh (Metaweb Technologies, Inc.)
  • System Controlled Assistance for Improving Search Performance
    Bernard Jansen (Pennsylvania State University)
  • Designing for Enterprise Search in a Global Organization
    Maria Johansson and Lina Westerling (Findwise AB)
  • Cultural Differences in Information Behavior
    Anita Komlodi (University of Maryland Baltimore County) and Karoly Hercegfi (Budapest University of Technology and Economics)
  • Adapting an Information Visualization Tool for Mobile Information Retrieval
    Sherry Koshman and Jae-wook Ahn (University of Pittsburgh)
  • A Theoretical Framework for Subjective Relevance
    Katrina Muller and Diane Kelly (University of North Carolina)
  • Query Reuse in Exploratory Search Tasks
    Chirag Shah and Gary Marchionini (University of North Carolina at Chapel Hill)
  • Augmenting Cranfield-Style Evaluation with GOMS to Obtain Timed Predictions of User Performance
    Mark Smucker (Waterloo University)
  • Text-To-Query: Suggesting Structured Analytics to Illustrate Textual Content
    Raphael Thollot (SAP Business Objects) and Marie-Aude Aufaure (Ecole Centrale Paris)
  • The Information Availability Problem
    Daniel Tunkelang (Endeca)
  • Exploratory Search Over Temporal Event Sequences: Novel Requirements, Operations, and a Process Model
    Taowei Wang, Krist Wongsuphasawat, Catherine Plaisant and Ben Shneiderman (University of Maryland)
  • Keyword Search: Quite Exploratory Actually
    Max Wilson (Swansea University)
  • Using Twitter to Assess Information Needs: Early Results
    Max Wilson (Swansea University)
  • Integrating User-generated Content Description to Search Interface Design
    Kyunghye Yoon (SUNY Oswego)
  • Ambiguity and Context-Aware Query Reformulation
    Hui Zhang (Indiana University)
Categories
Uncategorized

Goby Goes Deep

At  the first HCIR workshop in 2007, Michael Stonebraker stood up in the middle of an open discussion session and told all assembled that we needed to be thinking about the deep web.

I don’t know how much the audience took heed of his call, but he certainly followed his own advice. He and Endeca alum Mark Watkins just launched Goby, a vertical search engine that exhorts you to “create your own adventure”.  It’s fun–a sort of exploratory search for explorers. And it uses a deep web crawl to populate its index with semi-structured data.

Anyway, try it out! I’ve been in the private beta, but haven’t had the chance to see what they’ve been up to in the final stretch leading to the launch. You can also read more on Search Engine Land or CNET.

Categories
General

Transparent Text Symposium: Day 2

Given how intense yesterday was at the Transparent Text symposium, I couldn’t imagine that today would match it. But it did!

The morning kicked off with a series of 18 lighting talks in 90 minutes–that was 5 minutes apiece, with a ruthless gong for anyone who went overtime. The presentations were consistently intense, and I had the misfortune to follow one of the best talks–a very passionate presentation about crowd-sourced translation by IBM’s Uyi Stewart. Other notable presenters included design ninja Alexis Lloyd from the New York Times R&D Lab, Karrie Karahalios from the University of Illinois talking about the experimental WeMeddle Twitter client,  MIT Media Lab professor and Berkman Fellow Judith Donath showing a stunning gallery of “data portraits”, and Dragon Systems co-founder Janet Baker explaining how the brain recognizes speech–with an skull as a prop! The session was incredible, and I hope other conferences adopt this model.

After the coffee break, there was a session on Text Analysis in the Large, featuring Dan Gruhl (IBM), Gary King (Harvard), and David Ferrucci (IBM). Dan Gruhl talked about web-scale text analysis–a topic up his alley, considering his role in architecting the IBM WebFountain project. Gary King gave a fascinating talk about using ensemble methods to improve on existing clustering methods–the idea is to synthesize a collection of derived clusterings and place them in an explorable metric space. You can read the full paper here. But the winner for this session was definitely David Ferrucci, who described the work IBM Research is doing to develop a machine Jeopardy player. He spent much of the talk building a case for the difficulty of the problem–and then delivered the punchline: In less then three years of research, they’ve developed a machine player whose performance is comparable to that or jeopardy winners. Hopefully they’ll be competing on live television by next year!

After lunch, there was a session on Investigation, featuring MAPLight Research Director Emily Calhoun, UC Berkeley law professor Kevin Quinn, and Guardian news editor Simon Rogers. Emily Calhoun showed how MAPLight illuminates the connections between money and politics–it was great seeing data to correlate who supports and opposes bills with the associated campaign contributions from interest groups. Kevin Quinn’s presentation was a bit more technical, but his work reminds me a lot of Miles Efron’s work on estimating political orientation in web documents–but Quinn’s work is more general and goes beyond co-citation analysis to analyze the actual language of the documents. Great application of topic modeling! But my favorite presentation in this session was the one from Simon Rogers: he told the story of how the Guardian successfully crowd-sourced a project to investigate the expenses of UK Parliament members.

The final session was a panel discussion about how visualization might elevate or advance the debate over health care policy. The panelists were Ben Fry, Marti Hearst, Gary King, and Simon Rogers; Fernanda Viégas and Martin Wattenberg moderated. Unfortunately, the overwhelming sentiment from the panel was pessimism that anything we could do might actually lead to improved outcomes. Nonetheless, it’s clear that a lot of people are going to try.

Again, I want to thank Fernanda, Martin, Irene Greif, and everyone at IBM for organizing this fantastic event–and for inviting me to attend! I am impressed that anyone could manage to assemble such an impressive set of speakers in one place, and I appreciate the effort that everyone put into making the past two days so worthwhile. I look forward to seeing the videos available online, and I hope those who weren’t able to attend take the opportunity to watch some of them. I also encourage you to check out the live Twitter stream at #tt09 while it’s still available.