The Noisy Channel

 

Sorting a Petabyte

November 22nd, 2008 · 4 Comments · Uncategorized

Google may be reactionary when it comes to information seeking approaches, but they are at the cutting edge of systems research. Their official blog post today on sorting a petabyte in six hours using MapReduce was a reminder of the impressive caliber of their systems team. You can learn more from their Technology RoundTable Series.

4 responses so far ↓

  • 1 Daniel Lemire // Nov 22, 2008 at 12:56 am

    The problem with MapReduce, as far as I can see, is that it is not necessarily power efficient.

    That’s like saying that if I put twelve V8 engines in my car, it will run faster. Big deal.

    I want to go fast, and save on gas.

  • 2 Daniel Tunkelang // Nov 22, 2008 at 1:01 am

    Given that Google’s energy consumption represents a significant cost for their daily operations, I have to imagine they’ve work on the energy efficiency of MapReduce. And I also suspect they are funding this work at UC Berkeley: http://www.eecs.berkeley.edu/Research/Projects/Data/105613.html

  • 3 Jason Adams // Nov 22, 2008 at 3:14 am

    That’s a good way of thinking about it. Petascale computing should be measured in operations per kilowatt hour (or pick your favorite unit of power consumption). That forces the issue to be a combination of system architecture and algorithm design, as it should be.

  • 4 jeremy // Nov 22, 2008 at 10:59 pm

    Google may be reactionary when it comes to information seeking approaches, but they are at the cutting edge of systems research.

    Totally agree. In fact, one of my coworkers is fond of saying that when we look back at Google in 30 years, the lasting legacy, how we remember them and how they will have contributed overall to the field of computing, will not be their information retrieval advances. It will be their systems advances.

Clicky Web Analytics