WolframAlpha, Getting Less Impressed Upon Closer Look
I have been looking forward to the release of WolframAlpha. Last Monday Stephen Wolfram gave a talk at the Harvard University Berkman Center demonstrating his new search engine.
The point of WolframAlpha is giving real answers to questions, not just search results. So if you ask the GDP of France, you will get the actual figure and not a list of pages that talk about and perhaps mention the GDP of France. It will present you data in context. David Weinberger live blogged a list of examples. ReadWriteWeb has a number of screenshots to go with it.
The intelligent part of WolframAlpha seems to be exactly that: linguistically understanding what you are asking, and then presenting the data it finds in a way that has meaning to you, as well as point to more underlying data so you can dive deeper if you want.
The part where I am getting less impressed is how the data is collected, tagged (curated they call it) and accessed. Judging that is, from David Weinbergers blogposting mentioned earlier and Danny Sullivan's.
It's quotes like these that make my enthusiasm burn lower:
"Knitting new knowledge into the system is tricky"
"Wolfram Alpha isn't crawling the web and "scraping" information"
"it's working with a variety of providers to gather public and private information"
"Wolfram Alpha succeeds because [...] it has produced its own centralized repository"
"Wolfram noted that a new moon of Saturn had just been discovered, "so someone is dutifully adding the information""
"a staff of over 150 people to ensure the information is clean and tagged", "It's been 150 for a long time. Now it's 250. It's probably going to be a thousand people."
No crawling? Centralized database, adding data from partners? Manual updating? Adding is tricky? Manually adding metadata (curating)?
For all its coolness on the front of WolframAlpha, on the back end this sounds like it's the mechanical turk of the semantic web.
Of course this may just be the necessary step to bring the semantic web closer, as right now there is little of the 'linked data' that Tim Berners Lee envisions. It makes the curating of data understandable, regardless of it being an arduous task, but not the centralizing bit or the manual updating. Centralizing is easier to do, but wouldn't 'curating' the data in situ not be the way to go, and thus helping data-owners to get to the linked-data stage, while taking care of the updating problem at the same time? And what with the scaling issues involved with all this manual work?
However, these aspects seem to be glossed over in the previews and reports that are available about WolframAlpha now. It would be good to hear Stephen Wolfram address these aspects and explain the rationale of the current set-up as well as how this is envisioned to develop after launch this month. I would be happy to be shown to not understand the inner workings of WolframAlpha. But right now the information available is making me feel less impressed than before.
Permalink (posted by Ton Zijlstra at May 2, 2009 08:45 PM)
Weblog by Ton Zijlstra, Enschede, Netherlands
I write about knowledge work and management, and the tools and strategies that help us navigate the networked world.
Contacting me is easy and appreciated: E-mail, Skype, MSN