You Can't Replace Scientific Thinking With Computers

I've lamented multiple times the negative influence on scientific culture of some trends in the use of computational tools to analyze large datasets, particularly in biology.Over at Nobel Intent, John Timmer brings up another issue related to computational models of complex phenomena: reproducibility:

I've lamented multiple times the negative influence on scientific culture of some trends in the use of computational tools to analyze large datasets, particularly in biology.

Over at Nobel Intent, John Timmer brings up another issue related to computational models of complex phenomena: reproducibility:

In the past, reproduction was generally a straightforward affair. Given a list of reagents, and an outline of the procedure used to generate some results, other labs should be able to see the same things. If a result couldn't be reproduced, then it could be a sign that the original result was so sensitive to the initial conditions that it probably wasn't generally relevant; more seriously, it could be viewed as a sign of serious error or fraud...

But, when it comes to computational analysis, both the equivalent of reagents and procedures have a series of issues that act against reproducibility. The raw material of computational analysis can be a complex mix of public information and internally generated data—for example, it's not uncommon to see a paper that combines information from the public genome repositories with a gene expression analysis performed by an individual research team.

A lot of this data is in a constant state of flux; new genomes are being completed at a staggering pace, meaning that an analysis performed six months later may produce substantially different results unless careful versioning is used...

And that's just the data. An analysis pipeline may involve dozens of specialized software tools chained together in series, each with a number of parameters that need to be documented for their output to be reproduced. Like the data, some of these tools are proprietary, and many of them undergo frequent revisions that add new features, change algorithms, and so on. Some of them may be developed in-house, where commenting and version control often take a back seat to simply getting software that works. Finally, even the best commercial software has bugs.
The net result is that, even in cases where all the data and tools are public, it may simply be impossible to produce the exact same results.

One proposed solution is that all software code used in such research should be open to inspection by other researchers - that's definitely a good start.

The other solution, at least in biology, is that conclusions generated by complex computational tools need to stay close to empirical results they need to be focused on testable and relevant hypotheses. The experimental part of biology is not about to be replaced by computers and databases of sequence and interactome data.

Read the feed:

Old NID
63839

Latest reads

Article teaser image
Donald Trump does not have the power to rescind either constitutional amendments or federal laws by mere executive order, no matter how strongly he might wish otherwise. No president of the United…
Article teaser image
The Biden administration recently issued a new report showing causal links between alcohol and cancer, and it's about time. The link has been long-known, but alcohol carcinogenic properties have been…
Article teaser image
In British Iron Age society, land was inherited through the female line and husbands moved to live with the wife’s community. Strong women like Margaret Thatcher resulted.That was inferred due to DNA…