Categories
General

A Harmonic Convergence

This week, Forrester released a report entitled “Search + BI = Unified Information Access”. The authors assert the convergence of search and business intelligence, a case that Forrester has been developing for quite some time.

The executive summary:

Search and business intelligence (BI) really are two sides of the same coin. Enterprise search enables people to access unstructured content like documents, blog and wiki entries, and emails stored in repositories across their organizations. BI surfaces structured data in reports and dashboards. As both technologies mature, the boundary between them is beginning to blur. Search platforms are beginning to perform BI functions like data visualization and reporting, and BI vendors have begun to incorporate simple to use search experiences into their products. Information and knowledge management professionals should take advantage of this convergence, which will have the same effect from both sides: to give businesspeople better context and information for the decisions they make every day.

It’s hard to find any fault here. In fact, the convergence of search and BI is a corollary to the fact that people (yes, businesspeople are people too) use these systems, and that the same people have no desire to distinguish between “structured” and “unstructured” content as they pursue their information needs.

That said, I do have some quibbles with how the authors expect the convergence to play out. The authors make two assertions that I have a hard time accepting at face value:

    • People will be able to execute data queries via a search box using natural language.

    Sure, but will they want to? Natural language is fraught with communication challenges, and I’m no more persuaded by natural language queries for BI than I am by natural language queries for search.

    • Visual data representations will increase understanding of linkages among concepts.

    We’ve all heard the cliché that a picture is worth a thousand words. I know this better than most, as I earned my PhD by producing visual representations of networks. But I worry that people overestimate the value of these visualizations. Data visualization is simply a way to represent data analytics. I see more value in making analytics interactive (e.g., supporting and guiding incremental refinement) than in emphasizing visual representations.

    But I quibble. I strongly agree with most of their points, including:

    • BI interfaces will encourage discovery of additional data dimensions.
    • BI and search tools will provide proactive suggestions.
    • BI and search will continue to borrow techniques from each other.

    And it doesn’t hurt that the authors express a very favorable view of Endeca. I can only hope they won’t change their minds after reading this post!

    Categories
    General

    This Conversation is Public

    An interesting implication of blogging and other social media is that conversations once conducted privately have become public. The most common examples are conversations that take place through the comment areas for posts, rather than through private email.

    My initial reaction to this phenomenon was to bemoan the loss of boundaries. But, in keeping with my recent musings about privacy, I increasingly see the virtues of public conversations. After all, a synonym for privacy, albeit with a somewhat different connotation, is secrecy. Near-antonyms include transparency and openness.

    I can’t promise to always serve personally as an open, transparent information access provider. But I’ll do so where possible. Here at The Noisy Channel, the conversation is public.

    Categories
    General

    Business, Technology, and Information

    I was fortunate to attend the Tri-State CIO Forum these last couple of days, and I thought I’d change the pace a bit by posting some reflections about it.

    In his keynote speech last night, George Colony, Chairman and CEO of Forrester Research, called on the business community to drop the name “information technology” (IT) in favor of “business technology” (BT). His reasoning, in a nutshell, was that such nomenclature would reflect the centrality of technology’s role for businesses.

    Following similar reasoning but reaching a different conclusion, Julia King, an Executive Editor for Computerworld and one of of today’s speakers, noted that IT titles are being “techno-scrubbed”, and that there is a shift from managing technology to managing information.

    While I can’t get excited about a naming debate, I do feel there’s an important point overlooked in this discussion. Even though we’ve achieved consensus on the importance of technology, we need a sharper focus on information. It is a cliché that we live in an information age, but expertise about information is scarce. Information scientists struggle to influence technology development, and information theory is mostly confined to areas like cryptography and compression.

    We have no lack of information technology. Search engines, databases, and applications built on top of them are ubiquitous. But we still just learning how to work with information.

    Categories
    General

    Saracevic on Relevance and Interaction

    There is no Nobel Prize in computer science, despite computer science having done more than any other discipline in the past fifty years to change the world. Instead, there is the Turing Award, which serves as a Nobel Prize of computing.

    But the Turing Award has never been given to anyone in information retrieval. Instead, there is the Gerard Salton Award, which serves as a Turing Award of information retrieval. Its recipients represent an A-list of information retrieval researchers.

    Last week, I had the opportunity to talk with Salton Award recipient Tefko Saracevic. If you are not familiar with Saracevic, I suggest you take an hour to watch his 2007 lecture on “Relevance in information science”.

    I won’t try to capture an hour of conversation in a blog post, but here are a few highlights:

    • We learn from philosophers, particularly Alfred Schütz, that we cannot reduce relevance to a single concept, but rather have to consider a system of interdependent relevancies, such as topical relevance, interpretational relevance, and motivational relevance.
    • When we talk about relevance measures, such as precision and recall, we evaluate results from the perspective of a user. But information retrieval approaches necessarily take a systems perspective, making assumptions about what people will want and encoding those assumptions in models and algorithms.
    • A major challenge in the information retrieval is that users–particularly web search users–often formulate queries that are ineffective, particularly because they are too short. Studies have shown that reference interviews can lead to improved retrieval effectiveness (typically through longer, more informative queries). He said that automated systems could help too, but he wasn’t aware of any that had achieved traction.
    • A variety of factors affect interactive information retrieval, including task context, intent, expertise. Moreover, people react to certain relevance clues more than others, and more within some populations than others.

    As I expected, I walked away with more questions than answers. But I did walk away reassured that my colleagues and I at Endeca , along with others in the HCIR community, are attacking the right problem: helping users formulate better queries.

    I’d like to close with an anecdote that Saracevic recounts in his 2007 lecture. Bruce Croft had just delivered an information retrieval talk, and Nick Belkin raised the objection that users need to be incorporated into the study. Croft’s conversation-ending response: “Tell us what to do, and we will do it.”

    We’re halfway there. We’ve built interactive information retrieval systems, and we see from deployment after deployment that they work. Not that there isn’t plenty of room for improvement, but the unmet challenge, as Ellen Voorhees makes clear, is evaluation. We need to address Nick Belkin’s grand challenge and establish a paradigm suitable for evaluation of interactive IR systems.

    Categories
    General

    Guided Summarization

    I’m still waiting for the ECIR organizers to post the slides from the Industry Day. I particularly liked Nick Craswell’s presentation on A Brief Tour of “Query Space”. Until his slides are up, I recommend this SIGIR ’07 paper to give you an idea of his approach.

    Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.

    Categories
    General

    List of Findability Solutions

    Dan Keldsen has posted a list of findability-related solutions at BizTechTalk. The 80 or so solutions that he lists are certainly an attempt to err on the side of recall, by including search, taxonomies, interfaces, and visualization as aspects of findability. Definitely a useful resource for anyone interested in enterprise information access.

    Categories
    General

    Privacy through Difficulty

    I had lunch today with Harr Chen, a graduate student at MIT, and we were talking about the consequences of information efficiency for privacy.

    A nice example is the company pages on LinkedIn. No company, to my knowledge, publishes statistics on:

    • the schools their employees attended.
    • the companies where their employees previously worked.
    • the companies where their ex-employees work next.

    If a company maintains these statistics, it surely considers them to be sensitive and confidential. Nonetheless, by aggregating information from member profiles, LinkedIn computes best guesses at these statistics and makes them public.

    Arguably, information like this was never truly private, but was simply so difficult to aggregate that nobody bothered. As Harr aptly put it, they practiced “privacy through difficulty”–a privacy analog to security through obscurity.

    Some people are terrified by the increasing efficiency of the information market and look for legal remedies as a last ditch attempt to protect their privacy. I am inclined towards the other extreme (see my previous post on privacy and information theory): let’s assume that information flow is efficient and confront the consequences honestly. Then we can have an informed conversation about information privacy.

    Categories
    General

    Social Navigation

    There has bit a lot of recent buzz about social navigation, including some debate about what the phrase means. I dug into the archives and found a paper from the CHI ’94 Conference on Human Factors in Computing Systems entitled “Running Out of Space: Models of Information Navigation”. In it, Paul Dourish and Matthew Chalmers distinguish between semantic navigation and social navigation:

    [semantic navigation offers] the ability to explore and choose perspectives of view based on knowledge of the semantically-structured information.

    In social navigation, movement from one item to another is provoked as an artifact of the activity of another or a group of others.

    Back in 1994, the Web was only starting to reach a broad audience. The authors cite two examples of social navigation: personal home pages, where people listed sites they found interesting, and collaborative filtering (specifically, the Information Tapestry project at Xerox PARC).

    Today, a decade and a half later, the web has scaled by several orders of magnitude, search engines have largely obviated the listing of interesting sites on personal home pages, and collaborative filtering, while still going strong as a social influence on user experience, hardly feels like navigation. It does seem that the term “social navigation” deserves an update.

    Following Dourish and Chalmers, let us define social navigation as the ability to explore and choose perspectives of view based on social information. Importantly, social navigation is user-controlled navigation just like semantic navigation–only that the user is navigating by changing the social lens on the information rather than specifying semantic constraints.

    One example of social navigation is the ratings information at the Internet Movie Database (IMDB). For example, we can see from the ratings for Live Free or Die Hard that the movie appealed most to males under 18.

    Fandango (an Endeca customer) takes this concept a step further, offering users faceted navigation of the space of movie reviews, where facets include age, gender, whether or not the reviewer has children, and whether the reviewer lives near the user.

    More sophisticated interfaces will intermingle semantic and social navigation. Here is a screen shot from a prototype some of my colleagues put together and demonstrated at HCIR ’07:

    Social navigation, defined as above, offers users more than just the ability to be influenced by other people. It offers users transparency and control over the social lens. It allows us to think outside the black box.

    Categories
    General

    Happy Rota Day!


    Since this is a personal blog, I’d like to go a bit off-topic and take recognize my late mentor Gian-Carlo Rota, whose birthday is today. While I and countless others recall Gian-Carlo most fondly as a mentor and teacher, his crowning achievement was to make combinatorics a respectable branch of modern mathematics. Indeed, combinatorics and probability theory have been instrumental to the progress of information retrieval and information science.

    And this nugget of his advice about lecturing seems remarkably appropriate in the context of how information retrieval engines should work:

    Every lecture should state one main point and repeat it over and over, like a theme with variations. An audience is like a herd of cows, moving slowly in the direction they are being driven towards. If we make one point, we have a good chance that the audience will take the right direction; if we make several points, then the cows will scatter all over the field. The audience will lose interest and everyone will go back to the thoughts they interrupted in order to come to our lecture.

    Happy Birthday, Gian-Carlo.

    Categories
    General

    Database Usability

    Just as I was digesting Jeff Naughton’s presentation at DB/IR day, a colleague at Endeca emailed me the keynote that H. V. Jagadish (University of Michigan) presented at SIGMOD ’07 on making database systems usable. He enumerates the familiar pain points of today’s database systems: confusing schemas, too many choices to make, unexpected–and unexplained–system behavior, and too high a cost for initial creation. He proposes “systems that reflect the user’s model of the data, rather than forcing the data to fit a particular model.”

    As with Jeff’s presentation, the main take-away here is a framework (though both he and Jeff have taken initial steps to address the problems they describe). As a practitioner, I’m most encouraged by the fact that database researchers, like information retrieval researchers, are increasingly recognizing the importance of users.