I have been looking forward to the release of WolframAlpha. Last Monday Stephen Wolfram gave a talk at the Harvard University Berkman Center demonstrating his new search engine.

The point of WolframAlpha is giving real answers to questions, not just search results. So if you ask the GDP of France, you will get the actual figure and not a list of pages that talk about and perhaps mention the GDP of France. It will present you data in context. David Weinberger live blogged a list of examples. ReadWriteWeb has a number of screenshots to go with it.

The intelligent part of WolframAlpha seems to be exactly that: linguistically understanding what you are asking, and then presenting the data it finds in a way that has meaning to you, as well as point to more underlying data so you can dive deeper if you want.
This is the bit that is jaw dropping, and made some feel the kind of excitement with which I first greeted the Web itself 15 years ago. Adding to the coolness is their intention of making API’s available at presentation layer level, XML, as well as individual data items. This is something right out of the wish list for open data access.

(a long video of the session at Berkman)
(a summary with a number of on-screen examples)

The part where I am getting less impressed is how the data is collected, tagged (curated they call it) and accessed. Judging that is, from David Weinbergers blogposting mentioned earlier and Danny Sullivan’s.

It’s quotes like these that make my enthusiasm burn lower:
Knitting new knowledge into the system is tricky
Wolfram Alpha isn’t crawling the web and “scraping” information
it’s working with a variety of providers to gather public and private information
Wolfram Alpha succeeds because […] it has produced its own centralized repository
Wolfram noted that a new moon of Saturn had just been discovered, “so someone is dutifully adding the information”
a staff of over 150 people to ensure the information is clean and tagged“, “It’s been 150 for a long time. Now it’s 250. It’s probably going to be a thousand people.

No crawling? Centralized database, adding data from partners? Manual updating? Adding is tricky? Manually adding metadata (curating)?

For all its coolness on the front of WolframAlpha, on the back end this sounds like it’s the mechanical turk of the semantic web.

Of course this may just be the necessary step to bring the semantic web closer, as right now there is little of the ‘linked data’ that Tim Berners Lee envisions. It makes the curating of data understandable, regardless of it being an arduous task, but not the centralizing bit or the manual updating. Centralizing is easier to do, but wouldn’t ‘curating’ the data in situ not be the way to go, and thus helping data-owners to get to the linked-data stage, while taking care of the updating problem at the same time? And what with the scaling issues involved with all this manual work?

However, these aspects seem to be glossed over in the previews and reports that are available about WolframAlpha now. It would be good to hear Stephen Wolfram address these aspects and explain the rationale of the current set-up as well as how this is envisioned to develop after launch this month. I would be happy to be shown to not understand the inner workings of WolframAlpha. But right now the information available is making me feel less impressed than before.