Celebrating Statistical Guidance and Integration in Biomedical Science


June 4, 2018
“When biostatisticians and biomedical scientists work together – early and often – the end result is more satisfying science,” says Paul Kayne, head of Genomics (Discovery), Bristol-Myers Squibb (Princeton, NJ). And, by satisfying science, Kayne and Robert Nadon, Ph.D., associate professor, Department of Human Genetics at McGill University (Montreal, Quebec, Canada), mean productive and verifiable science that translates effectively to clinical applications.

Kayne and Nadon are guest editors of the June 2018 SLAS Discovery Special Collection on Statistical Applications in Knowledge and Drug Screening. The issue includes four papers illustrating statistical work is most effective when it is proactive and cooperative, displaying a combination of theoretical knowledge and practical acumen essential for applied data analysis. Nadon currently serves as a member of the SLAS Discovery Editorial Board; Kayne formerly served on the SLAS Technology Editorial Board.

Paul Kanye

“I went into molecular biology far too many decades ago working with simple single cell model organisms,” Kayne laughs. “Part of the reason I did so is that the questions we could ask then were fairly straightforward, like ‘is there any effect if I make a big change?’ and we could almost always take those questions to a yes or no conclusion. Because the numbers were much smaller, there really wasn’t a need for any statistical framework to do the analysis.

“My wife, on the other hand, was looking at mineral absorption and transport in rat intestines – a much more complicated system,” he continues. “Even now we don’t understand the full extent of the nature of intestinal absorption and transport. She used very elegant experiments with a statistical approach including mathematical modeling to comprehend what was actually taking place as they did perturbations to understand the impact of the changes. This was my first exposure to using mathematical modeling and statistical approaches to studying biology, and one that primed me to embrace a statistical approach to genomics when it became a reality.”

Fast forward to 2018.

“There has been a continuing and increasing awareness by those who watch this space that as genomics and other omics are getting more complicated, there is greater need for statistical guidance and integration,” Kayne says. He and Nadon have been talking about this for years and saw this special collection as a way to encourage and hasten a true partnership between biomedicine and biostatistics in the scientific community.

Robert Nadon

“Statistics is needed in the life sciences much more than it has been needed in the past – not just in drug discovery but across the life sciences,” says Nadon. “One needs statistics to handle the very large data sets being produced now with genomics technologies and various other technologies that generate very large amounts of data. These require a specialized expertise. There is the other side as well. I know statisticians who want to apply very complicated statistical solutions to every problem that they see. Not all problems need that; very simple straightforward methods would do just fine. The trick is to apply the level of analysis that is appropriate for the problem.

SLAS Discovery Special Collection

“Unfortunately, the principles of statistical design and analysis are very much underused and misunderstood – not just for high-throughput data but for low-throughput data as well,” Nadon continues. “It seemed like a good time to illustrate the benefits of statistical design and analysis to SLAS Discovery readers.”

Black Death and Big Data

James A. Hanley’s "Statistical Behaviors: Personal and Computer-Aided Observations" in the special collection begins with the author’s pronouncement, “I began my career as a biostatistician in 1973 ‘BC’ (before computers). The BC is not entirely accurate, since we did have mainframe computers, but they were slow and not user-friendly, and statistical packages were few, specialized and not very transportable,” says Hanley, Department of Epidemiology, Biostatistics and Occupational Health, McGill University (Montreal, Quebec, Canada).

Using examples from his own career and providing analyses of statistics gathered during historical events like the 1348 Black Death, 1768 Devonshire colic and 1854 London cholera outbreaks, Hanley cautions statisticians and biologists alike to be vigilant in assessing and re-assessing, testing and retesting, thinking and rethinking.

“Over these 45 years that I have been a statistical observer, much in medical statistics has changed for the better,” he states. “But with bigger and more rapid data, more accessible statistical tools available to non-statisticians, more ways to subset our data and more journals to publish in, there is also a much higher risk not just of being wrong, but also of being more precisely wrong. We can now cause more harm or waste more resources.”

The SLAS Discovery guest editors appreciate Hanley’s descriptions of the challenges of doing data analysis in modern contexts and “how even experts can be led astray by theoretically well-understood but insidious statistical phenomena, such as regression to the mean.” His recollections and simulations demonstrate even more clearly the value of wedding best experimental practices with appropriate statistical methods.

Proper Algorithm Development

“Identification and Correction of Additive and Multiplicative Spatial Biases in Experimental High-Throughput Screening” features a collaborative team from the Université du Québec à Montréal, McGill University Genome Quebec Innovation Centre and McGill University – Bogdan Mazoure, Iurie Caraus, Vladimir Makarenkov, along with Nadon.

“The authors contribute to the active area of algorithm development in high-throughput screening. Merging approaches from computer science, mathematics and statistics, the authors use simulated and Chembank empirical data to illustrate that systematic error (bias) can be either additive or multiplicative in complex ways with implications for measurement and analysis,” according to the guest editor introduction.

The article describes three additive and three multiplicative spatial bias models as well as procedures for general bias detection and removal. The evidence from these six bias models show relevance for the analysis of experimental high-throughput, high-content and small molecule microarray data and their methodology “is designed to minimize the impact of a plate-specific (additive or multiplicative) spatial bias.”

Prevent Loss of Key Information

“Integrating Population Heterogeneity Indices with Microfluidic Cell-Based Assays” is of special interest to scientists working with cell-based assays and to computational scientists who wish to become more familiar with the challenges of this exciting and burgeoning field say Kayne and Nadon. Authors Thomas A. Moore, Alexander Li and Edmond W.K. Young from the Institute of Biomaterials & Biomedical Engineering and the Division of Engineering Science at the University of Toronto (Toronto, Ontario, Canada) combined to study “statistical analyses using Pittsburgh Heterogeneity Indices (PHIs) to understand the heterogeneity and evolution of cell population demographics on datasets generated from a microfluidic single-cell-resolution cell-based assay.”

“The authors show that quantitative estimates of cell heterogeneity outperform graphics-based qualitative assessments of statistical distributions of cell measurements,” the guest editors say. “Interestingly, they present radar plots which, although likely unfamiliar to most scientists, show great promise to quickly and non-technically convey detailed quantitative information on cell heterogeneity indices.”

Moore et al. conclude that “microfluidic cell culture systems are increasingly being developed as advanced cell-based assays, including for use as chemosensitivity and resistance assays for cancer therapy selection. Continued application of microfluidic systems in biology and medicine will rely on our ability to increase throughput, reduce rare sample consumption and discover new ways to analyze data that capture important properties of entire cell populations, such as their heterogeneity. This study applied heterogeneity indices as quantitative metrics to analyze microfluidics-derived cell population data and showed that these indices can provide a new way to look at the evolving cell population distributions within a microfluidic setting. Importantly, heterogeneity indices were employed to reveal that it may be possible to reduce population size within a given assay, but that a limit to this reduction exists before significant differences are detected by these heterogeneity indices. Thus, this study provides support for achieving more advanced and potentially more useful assays through the integration of microfluidics with heterogeneity analyses.”

Reproducibility and Agility

Hanspeter Gubler, Nicholas Clare, Laurent Galafassi, Uwe Geissler, Michel Girod and Guy Herr from the Novartis Institutes for BioMedical Research (Basel, Switzerland) author “Helios: History and Anatomy of a Successful In-House Enterprise High-Throughput Screening and Profiling Data Analysis System” in the SLAS Discovery special collection. They detail this data analysis software system being used successfully across Novartis globally for more than 10 years.

“A high degree of automation is reached by embedding the data analysis capabilities into a software ecosystem that deals with the management of samples, plates, and result data files, including automated data loading,” says the team. “The application provides a series of analytical procedures, ranging from very simple to advanced, which can easily be assembled by users in very flexible ways. This also includes the automatic derivation of a large set of quality control (QC) characteristics at every step. Any of the raw, intermediate and final results and QC-relevant quantities can be easily explored through linked visualizations. Links to global assay metadata management, data warehouses and an electronic lab notebook system are in place. Automated transfer of relevant data to data warehouses and electronic lab notebook systems are also implemented.”

“The paper nicely illustrates the advantages of a happy marriage between time-honored statistical principles and the automation needed for modern pharmaceutical data systems,” Kayne and Nadon say. “It is a must-read for anyone interested in comprehensive, statistically-principled and robust software development.”

The Topic Everyone Loves to Hate

Nadon loves statistics and admits he is in the minority. “But, as shown in the SLAS Discovery collection, it doesn’t have to be that way,” says Nadon. “Biostatisticians are fellow scientists! Paul and I have known each other for a couple of decades now. Hearing him talk about the kinds of misunderstandings those on the biology side might have about statistics was very helpful to me. Paul is a first-rate biologist so I learned a lot about the various issues that biologists have to contend with that were very helpful to me in my work.”

“With this special collection, Bob and I wanted to give clear examples as to how the two groups working together actually lead to a better outcome which often includes less work at less cost,” says Kayne. It’s a sound investment.  To realize the benefits, the statistician has to be willing to understand some of the pain points for the biologist and likewise the biologist must grasp some of the basic principles relevant to statistics. It does take a mindset that recognizes there’s long-term value here and that in the end both sides will be doing better science.”

“I found that most biostatisticians are eager to collaborate on important and interesting scientific questions in biology and other disciplines,” Nadon explains. “Mutual ignorance of one another’s discipline often gets in the way. One thing biologists can do to help improve the interaction is to learn a little bit about statistics – in terms of both analysis and study design to truly get a better understanding of how statisticians think about problems.”

Nadon recommends working scientists take a look at these books:

Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking by Harvey Motulsky

What is a P Value Anyway? 34 Stories to Help You Actually Understand Statistics by Andrew J. Vickers

Improving Almost Anything: Ideas and Essays by George Box

“The Box series of essays is a personal favorite of mine,” says Nadon. “It offers quite deep and profound statistical ideas explained in a way that scientists can understand and see value. There often are misconceptions about what biostatisticians can do. Very often, we’re called in after the fact to explain what the experiment died of. If we’re consulted in advance, we can often help scientists find effects they might otherwise miss.”

The guest editors are proud to showcase these four success stories in the special collection.

“As scientists, we’re familiar with delayed gratification; it can take a while to learn and train to do the things we want to accomplish,” Kayne says. “We’re trying to show some examples in this issue that up-front cost is worth it. It is far more satisfying because when we do our analyses and we essentially have independent validation of our results with our partners, we can count on those results, believe in them, use them. My organization uses such analyses to make critical decisions. My group and the statisticians we’ve worked with have developed a very good track record. People have learned that when we bring things forward, they can trust them. In the end, that’s what research should be about.”


National Institutes of Health: Biostatistical Methods and Research Design Study Section

Too Many Numbers: Microarrays in Clinical Cancer Research

National Cancer Institute: Statistical Considerations for Trials Designed to Determine Clinical Utility of ctDNA Assays