Thinking as a Hobby

Nah, not really.

But Chris Anderson, the Editor-in-Chief of Wired, published a recent article claiming that the glut of data would fundamentally change the way science gets done. He claims that the future of science is basically just data mining, rather than coming up with hypotheses and models that make testable predictions.

John Timmer over at Ars Technica has a nice, succinct article which poops all over this idea.

It's easy to see what has Anderson enthused. Modern scientific data sets are increasingly large, comprehensive, and electronic. Things like genome sequences tell us all there is to know about the DNA present in an organism's cells, while DNA chip experiments can determine every gene that's expressed by that cell. That data's also publicly available—out in the cloud, in the current parlance—and it's being mined successfully. That mining extends beyond traditional biological data, too, as projects like WikiProteins are also drawing on text-mining of the electronic scientific literature to suggest connections among biological activities.

There is a lot to like about these trends, and little reason not to be enthused about them. They hold the potential to suggest new avenues of research that scientists wouldn't have identified based on their own analysis of the data. But Anderson appears to take the position that the new research part of the equation has become superfluous; simply having a good algorithm that recognizes the correlation is enough.

The source of this flight of fancy was apparently a quote by Google's research director, who repurposed a cliché that most scientists are aware of: "All models are wrong, and increasingly you can succeed without them." And Google clearly has. It doesn't need to develop a theory as to why a given pattern of links can serve as an indication of valuable information; all it needs to know is that an algorithm that recognizes specific link patterns satisfies its users. Anderson's argument distills down to the suggestion that science can operate on the same level—mechanisms, models, and theories are all dispensable as long as something can pick the correlations out of masses of data.

I can't possibly imagine how he comes to that conclusion. Correlations are a way of catching a scientist's attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. One only needs to look at a promising field that lacks a strong theoretical foundation—high-temperature superconductivity springs to mind—to see how badly the lack of a theory can impact progress. Put in more practical terms, would Anderson be willing to help test a drug that was based on a poorly understood correlation pulled out of a datamine?

That's a good question.

The distinction that's underlying all this is one that I bumped up against when I entered the Cognitive Science program here. It's the distinction between engineering and science. They work in a mutualistic feedback loop, but they are very conceptually different at the core.

An engineer, e.g., one at Google, may or may not care exactly how something works, or whether it has explanatory power that extends beyond what he is working on. His primary concern is that it just works.

A scientist is primarily concerned with questions of ontology, trying to figure out what the true state of the universe is. They may actually come at the problem from the bottom-up (more data driven) or the top-down (more theory driven). But their goal is understanding, not a workable product. That understanding can then be used to make a workable product, and a workable product may give insights into underlying mechanisms. But the important distinction to be made is that the engineer and the scientist have different, but often compatible, goals.

I think that's the key distinction that Anderson is missing, and so it's just silly to suggest that the torrent of data and data mining techniques are going to render standard science obsolete.

UPDATE: About an hour after I wrote this, I checked my Google Reader feed to find this post by Seth Roberts over at Scientific Blogging referring to the same subject, called Science versus Engineering.




	Thinking as a Hobby Home Get Email Updates LINKS JournalScan Email Me Admin Password Remember Me 3478657 Curiosities served Share on Facebook				2008-06-26 9:32 AM The End of Science? Previous Entry :: Next Entry Read/Post Comments (1) Nah, not really. But Chris Anderson, the Editor-in-Chief of Wired, published a recent article claiming that the glut of data would fundamentally change the way science gets done. He claims that the future of science is basically just data mining, rather than coming up with hypotheses and models that make testable predictions. John Timmer over at Ars Technica has a nice, succinct article which poops all over this idea. It's easy to see what has Anderson enthused. Modern scientific data sets are increasingly large, comprehensive, and electronic. Things like genome sequences tell us all there is to know about the DNA present in an organism's cells, while DNA chip experiments can determine every gene that's expressed by that cell. That data's also publicly available—out in the cloud, in the current parlance—and it's being mined successfully. That mining extends beyond traditional biological data, too, as projects like WikiProteins are also drawing on text-mining of the electronic scientific literature to suggest connections among biological activities. There is a lot to like about these trends, and little reason not to be enthused about them. They hold the potential to suggest new avenues of research that scientists wouldn't have identified based on their own analysis of the data. But Anderson appears to take the position that the new research part of the equation has become superfluous; simply having a good algorithm that recognizes the correlation is enough. The source of this flight of fancy was apparently a quote by Google's research director, who repurposed a cliché that most scientists are aware of: "All models are wrong, and increasingly you can succeed without them." And Google clearly has. It doesn't need to develop a theory as to why a given pattern of links can serve as an indication of valuable information; all it needs to know is that an algorithm that recognizes specific link patterns satisfies its users. Anderson's argument distills down to the suggestion that science can operate on the same level—mechanisms, models, and theories are all dispensable as long as something can pick the correlations out of masses of data. I can't possibly imagine how he comes to that conclusion. Correlations are a way of catching a scientist's attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. One only needs to look at a promising field that lacks a strong theoretical foundation—high-temperature superconductivity springs to mind—to see how badly the lack of a theory can impact progress. Put in more practical terms, would Anderson be willing to help test a drug that was based on a poorly understood correlation pulled out of a datamine? That's a good question. The distinction that's underlying all this is one that I bumped up against when I entered the Cognitive Science program here. It's the distinction between engineering and science. They work in a mutualistic feedback loop, but they are very conceptually different at the core. An engineer, e.g., one at Google, may or may not care exactly how something works, or whether it has explanatory power that extends beyond what he is working on. His primary concern is that it just works. A scientist is primarily concerned with questions of ontology, trying to figure out what the true state of the universe is. They may actually come at the problem from the bottom-up (more data driven) or the top-down (more theory driven). But their goal is understanding, not a workable product. That understanding can then be used to make a workable product, and a workable product may give insights into underlying mechanisms. But the important distinction to be made is that the engineer and the scientist have different, but often compatible, goals. I think that's the key distinction that Anderson is missing, and so it's just silly to suggest that the torrent of data and data mining techniques are going to render standard science obsolete. UPDATE: About an hour after I wrote this, I checked my Google Reader feed to find this post by Seth Roberts over at Scientific Blogging referring to the same subject, called Science versus Engineering. Read/Post Comments (1) Previous Entry :: Next Entry Back to Top