I actually feel bad for Stephen Wolfram. After all the weeks of hype leading up to his public demonstration of Wolfram Alpha at Harvard this afternoon, Google upstaged him by releasing Google Public Data today. Catty? Perhaps, but in a very classy way. They just blogged about it and released it. No private demos. no fanfare–they just shipped it.
More importantly, they say:
The data we’re including in this first launch represents just a small fraction of all the interesting public data available on the web. There are statistics for prices of cookies, CO2 emissions, asthma frequency, high school graduation rates, bakers’ salaries, number of wildfires, and the list goes on. Reliable information about these kinds of things exists thanks to the hard work of data collectors gathering countless survey forms, and of careful statisticians estimating meaningful indicators that make hidden patterns of the world visible to the eye. All the data we’ve used in this first launch are produced and published by the U.S. Bureau of Labor Statistics and the U.S. Census Bureau’s Population Division. They did the hard work! We just made the data a bit easier to find and use.
Clearly Google doesn’t put much stock in Wolfram Alpha’s proprietary collection of ten trillion curated facts. Moreover, Google is also using curated data–only that it’s data already freely available in the public domain. Perhaps Wolfram Alpha has collected broader data or built a more robust query parser, but now the onus will be on them to prove it–and to prove that the difference is meaningful to users. I can’t imagine that Wolfram is loving Google right now.
Ironically, one of the things that Google may have inadvertently proved is that this kind of question answering isn’t really that valuable to users. The queries I’ve seen posted or tried myself are a novelty, but at best they are a minimal time saver–a modest improvement on Google Calculator. As I pointed out in an earlier post about Wolfram Alpha, I think the NLP interface is wrong-headed, and that they–or anyone else trying to create more value from objective data–should be focusing on APIs to make it easier to integrate into other applications. But they don’t seem to be headed in that direction.
In any case, Google certainly wins this round. And the blogosphere is loving it.
10 replies on “Google Shows Wolfram Who’s The Alpha Dog”
I believe an NLP Query Processor can be an asset but with a very large caveat.
1. At this time a completely open-ended NLP input system is neither practical or really needed. Couple that with the Google induced pavlovian response of how a query box should operate users don’t actually know how to use an NLP box to it’s fullest. I can attest to this via testing my query box with some very smart people and they had impedance issues.
Now even with all the negatives I think an NLP Query system that uses a controlled vocabulary that is aligned with a systems dB, entity or facet classes can provide an excellent jumping off point for getting a selected set of results back which can be further explored using the UI. For example let’s query the twitter stream with: “All ‘people’ who mentioned Wolfram Alpha ‘Today'”. In this example the words in ” are entity/facet/dB classes and the rest are the filters (which must take into account synonyms, etc. And again other facets must be returned so users can navigate the data further. What I’m suggesting here is a simple NLP query system that has a bit more smarts than good old boolean.
Even this is difficult to pull of effectively. I will have concrete test data to share as this is how I’m handling my query box, users can do the above or just use a basic query commands.
So in a nutshell my personal take is an integrated NLP Query box has value but the value comes not from supporting open ended questions but structured NLP queries. So not so much for general search but for data navigation.
Follow-Up: Think of the query box I’m talking about as a combination of boolean + simple SQL. BoolQL? 😉
I think that search as a starting point for exploration is a great idea. Intelligent query interpretation makes it even better–but it’s hard to get right. So I think the NLP folks have to recognize the point of diminishing return and instead focus on how to draw out more of the user’s information need by offering guidance for query elaboration / refinement. In fact, that should be easier to do in a highly structured query language.
A much better description of my rambling. 🙂
I think the trick to an effective next-generation query interface is 4-fold:
0. Support standard boolean, simple n-gram & new structured semantics within 1 query box with intuitive type-ahead. Additionally don’t be overly prescriptive as far as structure. I’ve only introduced 1 new piece of structure (single-quotes) to denote subject facets.
1. Align the structured query language with the facet map used for post search navigation.
2. Allow for “loose semantics” in the query predicates. So from my original example “mentioned” could be “talked about”.
3. Don’t be to smart for your own good, when in doubt fall-back to baseline queries. I believe to much data will be more welcome that result sets that don’t align well with the query.
In my case I have a bonus 5th factor which is the belief that regardless the query type (simple – structured) the results should also return an exploratory navigation structure which has the same facet topics as the structured query language. Full alignment between interface elements.
I started the first 2 years working diligently on a fully NLP driven query box and not only is it difficult to execute, users just don’t “get it”.
I’m much more encouraged with the testing of my structured query box with “simulated” (aka forgiving) natural language predicates. I’ll be very interested to see how it is received by you and other once it’s out there.
I would say wolfram alpha is a good thing and u cant compare it with google at all ;). Its a complete different thing as i understood since its not a websearch at all. Im curious about it, but i think its a cool thing. Maybe its more compareable with something like wikipedia, but in a more dynamic way…
Reznor, I’m comparing Wolfram Alpha to Google Public Data. Are you familiar with both projects? If so, how do you see them as completely different?
Danny Sullivan has just posted his thoughts following a demonstration of Wolfram Alpha at
During the demonstration, Stephen Wolfram noted that 150 people are currently working on data input and tagging, with one of the surprises being the significant errors being found in published data as it’s cross-checked against other sources. He also mentioned that these human editors assist in creating rules for combining data sources, and in anticipating how to initially respond to ambiguous queries.
I guess Danny is more impressed now that Wolfram has taken the time to brief him. 🙂
I don’t write off Wolfram Alpha, but I think the bar is now much higher. If Google’s results are close, then an incremental improvement won’t cut it. You can’t compete with good enough by just offering slightly better.
[…] Tunkelang, among others, has pointed out (here and here) that Wolfram | Alpha is making its life (and potentially its users’) more difficult […]
[…] of Wolfram Alpha at Harvard, Google releasing Google Public Data to the general public. Ouch. To add insult to injury, Google’s Matt Cutts says the timing was a coincidence–an […]