I have been looking forward to the release of WolframAlpha. Last Monday Stephen Wolfram gave a talk at the Harvard University Berkman Center demonstrating his new search engine.
The point of WolframAlpha is giving real answers to questions, not just search results. So if you ask the GDP of France, you will get the actual figure and not a list of pages that talk about and perhaps mention the GDP of France. It will present you data in context. David Weinberger live blogged a list of examples. ReadWriteWeb has a number of screenshots to go with it.
The intelligent part of WolframAlpha seems to be exactly that: linguistically understanding what you are asking, and then presenting the data it finds in a way that has meaning to you, as well as point to more underlying data so you can dive deeper if you want.
This is the bit that is jaw dropping, and made some feel the kind of excitement with which I first greeted the Web itself 15 years ago. Adding to the coolness is their intention of making API’s available at presentation layer level, XML, as well as individual data items. This is something right out of the wish list for open data access.
(a long video of the session at Berkman)
(a summary with a number of on-screen examples)
The part where I am getting less impressed is how the data is collected, tagged (curated they call it) and accessed. Judging that is, from David Weinbergers blogposting mentioned earlier and Danny Sullivan’s.
It’s quotes like these that make my enthusiasm burn lower:
“Knitting new knowledge into the system is tricky”
“Wolfram Alpha isn’t crawling the web and “scraping” information”
“it’s working with a variety of providers to gather public and private information”
“Wolfram Alpha succeeds because […] it has produced its own centralized repository”
“Wolfram noted that a new moon of Saturn had just been discovered, “so someone is dutifully adding the information””
“a staff of over 150 people to ensure the information is clean and tagged“, “It’s been 150 for a long time. Now it’s 250. It’s probably going to be a thousand people.”
No crawling? Centralized database, adding data from partners? Manual updating? Adding is tricky? Manually adding metadata (curating)?
For all its coolness on the front of WolframAlpha, on the back end this sounds like it’s the mechanical turk of the semantic web.
Of course this may just be the necessary step to bring the semantic web closer, as right now there is little of the ‘linked data’ that Tim Berners Lee envisions. It makes the curating of data understandable, regardless of it being an arduous task, but not the centralizing bit or the manual updating. Centralizing is easier to do, but wouldn’t ‘curating’ the data in situ not be the way to go, and thus helping data-owners to get to the linked-data stage, while taking care of the updating problem at the same time? And what with the scaling issues involved with all this manual work?
However, these aspects seem to be glossed over in the previews and reports that are available about WolframAlpha now. It would be good to hear Stephen Wolfram address these aspects and explain the rationale of the current set-up as well as how this is envisioned to develop after launch this month. I would be happy to be shown to not understand the inner workings of WolframAlpha. But right now the information available is making me feel less impressed than before.
I love that description: “the mechanical turk of the semantic web”.
That sounds rather disappointing.
Unless they somehow succeed well enough to have most of the answers I’d think of looking for. Google, Facebook, Twitter, most site we think are cool are also massively centralized, even though we use them in decentralized ways.
If this turns out to be really impressive in real life, the next phase would be figuring out how groups of people could be allowed to input semantically tagged bodies of knowledge into it. Does it know about comic books? If not, I’m sure there’ll be well-organized people who’d be happy to take over that part. I hope they’re thinking about that.
But you’re right, it is disturbing to hear that “Knitting new knowledge into the system is tricky”.
Hi Flemming. Indeed Google is highly centralized, but in essence not a centre for webpages, a centre for pointers to pages and the relationships between them. WolframAlpha however seems to be an actual central repository of the data it presents us as content. At least judging from the descriptions currently out there.
Ton,
I think I understand what makes you less impressed. But do you agree that the basic concepts of Alpha incorporate a flexibility for evolutionary enhancements that could overcome such initial constraints? I doubt that Wolfram’s Alpha launch will be anything more than a beta product.
Best, Henry
Hi Henry, thanks for commenting. I agree there is flexibility for incremental change, but at the same time I wonder if it will be enough. Of course I am just building a picture from what little can be judged without WolframAlpha being available, but to me it looks like all the really great innovative stuff is in the front end: the linguistical processing of search queries, the way data is presented. Both are an important and big leap forward for semantic web, very impressive, and the type of functionality I will gladly embrace and use.
On the back end I see what looks to me like a mechanical turk (again: based on what I read in the cited blogposts). I don’t think that is easily fixed with evolutionary enhancements. I even think that it runs contrary to a few things that are key to the internet and web (centralized repository of the content, gatekeepers making sure data is ‘correct’ e.g.), instead of working with the internet and web’s characteristics. That somehow sounds all wrong to my ears and like making a step back.
As if all the deep knowledge of Wolfram on complexity has been invested in the linguistical part and presenting results in a meaningful way, and the structures needed to feed into that from the back end added on without the same level of innovative energy. The way Wolfram describes the way data is fed into the system reads like a job description of a role we have known for ages: a librarian.
As I said in the post, I’d love to be terribly wrong about this.
Ton,
Once the curators and “librarians” have primed the pump, I see no reason that their task can not be liberated to include peer-reviewed contributions from academia and other qualified (and even unqualified) sources. As I see it, Wolfram’s curation can be easily delegated to the interested world at large, once the sets of data formats have been standardized. Such an expandable data base is very much akin to the Evaluated Nuclear Data File (ENDF), originated at Brookhaven National Laboratory (BNL) in 1952, and currently used by such world-class simulation systems as the Monte Carlo n-Particle (MCNP) radiation transport code of Los Alamos National Laboratory (LANL). Conceivably, ENDF itself could be translated to conform with Wolfram|Alpha formats and incorporated into Alpha’s growing data base of useful information. In fact, I fully expect Alpha to ultimately aggregate its own computed results for the purpose of more intricate computation.
Best, Henry
Ton, it’s a curated site, so there is definitely a limit to its ability to scale. But it isn’t intended to scale infinitely, only to cover the topics that Wolfram thinks it important to cover. So, it doesn’t yet “know about” anatomy, but it will, and I suspect it will never know about Paris Hilton’s romances.
I believe Wolfram said in his Harvard talk that there are currently 250 people feeding its maw, and that he expects there to be about 1,000 once it’s truly up and running. Wolfram says that a content expert has to be in the flow to ensure quality. People also have to edit the relationships to present in response to a query. (If you query about a year, does it tell you who was born or died that year? Does it tell you what the average life expectancy was for people born then?)
All of this definitely will limit WolframAlpha’s ability to sprawl across every conceivable field of knowledge, but that’s not its aim. Nor is it a requirement — in my opinion — for the site to be useful and even exciting. There isn’t anything else much like it, is there?
Further, I don’t think we can know how much its manual curation will limit it. I find it hard to estimate how much of the knowledge munging can be automated. In addition, the staff of WA is developing ontologies that of course suffer from the limitations of all ontologies, but also benefit from the fact that their creators are working together on the same project. I thus find it hard to predict what sort of efficiencies and correlations might emerge.
So, I remain excited by it. I expect it to be a useful tool, and for some types of queries, essential. But I don’t expect it to become the universal answering machine. I expect I’ll still use Google and Wikipedia more often than WA, given my own interests. Unless something radical changes that enables widespread participation in data ingestion, calculation, and correlation, it will be a view of the world filtered through Stephen Wolfram’s interests. But, its ability to provide computed answers within its necessarily limited domain is still quite fascinating and promising. IMO, natch.
@Henry: priming the pump, and having an expandable database format develop into an open internet standard that allows it to be used in the exciting way that WolframAlpha demonstrates (natural language interpretation and linked and meaningful presentation of results) is a path I’d gladly see this take.
yet
@David (thanks for adding to the conversation David!): ‘not intended to scale infinitely’ is what worries me. Limiting your scope at the start for the purpose of starting I don’t have any issues with, but not having an open ended structure that could potentially include all fields of users’ interest over time, makes it feel flawed, as it indeed limits us to a view filtered through the interests of Stephen Wolfram (‘unless something radical changes’). It sounds like ‘monopolistic’ gatekeeping to me, and that raises a few hairs in my neck.
Making sure WolframAlpha will never know anything about Paris Hilton’s escapades would be a breath of fresh air though!
With all that said, I do think WolframAlpha is fascinating,promising, and exciting. Just less so than I thought it would be based on the earlier announcements. I’ll be happy and excited to make use of what it does offer.
David Weinberger follows up with a good blogposting about where the transformational aspects of WolframAlpha may be, and where not.
Well in a certain way, this is just a souped up version of the Google unit conversion tool in the search dialogue. They’ve taken a very limited set of non ambiguous datasets and potential queries on them and exposed those.
This is great work and like you say it is an essential step to get semweb stuff working better and better, but you have to give them a lot more credit. When building a system like this working from the ideal world you posit is not very possible. Tim Berners Lee and the semweb people can envision a lot of stuff, we’d like to see some working non-arduous examples.
The data files are poor pretty much everywhere, there are hardly any standard formats for these kind of datasets and if there are they are badly adhered to and then how to link it all up… There just isn’t that much to work with and instead of waiting for all of that to improve itself it’s laudable they’ve started to make something.