Join MySLAS Social

Images courtesy of Chris Bouton and Donald G. Jackson.

Tackle the Data, Capture the Meaning, Do Great Science

The tools are there and the data is flowing. But, how do today's laboratory scientists maximize the technology available to them to obtain the correct data and then extract the appropriate meaning? SLAS members do what great scientists have always done – ask the right questions, design the experiments carefully and take advantage of the latest tools and technology.

 

"Technology keeps racing ahead," states Steve Hamilton, Ph.D., SLAS director of education. "Our ability to generate data can move ahead of our ability to really understand and extract good meaning from our data. We can get out of sync."

Hamilton says that's where good science steps in. You have to think hard about the questions you want to answer and be sure you design your experiments to get the data you need. And, you have to have in place a means of working with the data you generate.

Easier said than done? Maybe. Maybe not. How can we tackle the challenges presented by the potentially overwhelming amount of data gathered in today's outstanding laboratories and to achieve desired results? Three SLAS members offer thoughts to help in the Fall 2012 SLAS Webinar Series, Extracting Meaning From Your Data, which is free to SLAS members, both live and at SLAS On-Demand.

Designing Experiments

Paul Taylor, M.S., is a group leader in automation and sample management at Boehringer Ingelheim Pharmaceuticals and a proponent of applying the statistical design of experiment techniques to pharmaceutical research.

"Design of experiments (DOE) has been around a long time," he says, acknowledging Ronald Fisher's 1935 agricultural work on crop variation, which outlined DOE's experimental methodology. Key in applying the approach to the life sciences is taking into account that "most biology is not linear, necessitating careful consideration of design types."

Taylor explains that with classical assay development at the bench, physical logistics typically limit the number of factors and levels that can be tested at any one time. Common factors tested in assay development include reagent concentrations, buffer components, pH, temperature and plate types. Responses studies frequently include assay dynamic range, working solution stability and the effects of miniaturization. When assay development is carried out with manual techniques, result interpretation commonly focuses on main effects, only because of the difficulty in separating out confounding factor interactions within smaller experimental designs. "Historically, you didn't always know where the effect or the variability was coming from," he states.

"In contrast,' he continues, "applying design of experiments to pharmaceutical research assays gives researchers the ability to test an extraordinary number of conditions all at one time. You can look at your effects across response surfaces and be able to pull apart where your main effects and interactions are coming from. DOE gives the ability to construct very complex experiments, which would be either very difficult or impossible to carry out with classical experimental techniques.

"Over the last 10 years, automation has been developed where, for example, we now have the ability to design experiments for seven factors and five levels with each test condition run in triplicate," he shares. "Now we can design the experiments in a titration-centric manner and approach the experiment while thinking like a biologist."

At Boehringer, Taylor credits the addition of a Beckman Coulter BioRAPTR with extending the capability of running DOE experiments.

"We have the flexibility of using a Biomek FX for experiments requiring larger volumes and using the BioRAPTR for assays requiring smaller volumes, including those in 1,536 formats," he summarizes. "When we begin to work with scientists, the first request we make is to ask for an existing assay protocol, if it's available, and a matrix of factors and levels they'd like to investigate."

With this information, Taylor and his team can select an experimental design that uses factors and levels as stipulated by the investigating scientists. Most commonly, titrations with doubling dilutions are chosen and effects are visualized as response surfaces as a constituent of a predictive model.

Boehringer Ingelheim has applied statistical design of experiments to protein purification to find conditions where they could increase the yield they get from proteins. Most recently, the technique has been used for ligand mixture studies as a tool for pathway interrogation.

"We took eight ligands and ran a cell-based assay using three-way mixtures of the eight ligands," he explains. "This made 512 three-way ligand mixtures and we were able to interrogate whether there was any kind of synergy between those ligands across large numbers of mixtures. Trying to do that manually would have been extremely difficult or impossible to do. With automation, the experiment was assembled in less than two hours. From that experiment, we found responses and combinations of ligands which were unexpected in terms of the types of stimulatory activity they provided."

Taylor indicates that, at this point in time, design of experiment techniques are more of a "niche tool" in pharmaceutical research. "In part, this is because the automation tools have needed to mature, but also because it's a bit of a different way of thinking," he explains. "At first, the tools may appear unfamiliar – in addition to biology, there are the added components of automation, statistical design and analysis. I've always known that the approach won't catch on unless it is easy to do, with all the complex technology aspects happening under the hood. That way, biologists can think as scientists rather than being distracted by the technology. Any great form of experimentation or technology needs to be friendly and accessible. DOE is making somewhat of a resurgence in our organization due to the development of improved tools; I always knew it had high potential, but I also knew that unless it became more friendly, I'd be the only guy using it."

Taylor will share more of his insight and experience regarding design of experiments as part of the Fall 2012 SLAS Webinar Series. He leads the December 11 webinar, "Application of Design of Experiments (DOE) in Protein Purification, Assay Development and Ligand Interaction Studies," beginning at 11:30 a.m. EST. Like all SLAS webinars, it is free to SLAS members. Participants will learn the technical steps involved and leave with ideas about how they might put them to work in their organizations.

"If, based on good science, you have a really well-designed experiment, using DOE in conjunction with easy to run technology will usually deliver scientific answers quite a bit faster," Taylor concludes. "SLAS is a good audience for this as it's a good example of the interface of technology and science. If you have one without the other, you're at risk of being somewhat hampered. You can do science the classical way and you'll eventually get to the answers, but not necessarily the best possible answers or the specific optimization you want. Likewise, you can invest in a myriad of fancy technology, but unless you have solid science guiding experimental approaches, it really won't be of much use. Good design of experiments emphasizes the combination of science and technology."

Selecting Tools

On November 13 at 9:00 a.m. EST, the second in the Fall 2012 SLAS Webinar Series features instructor Chris Bouton, CEO at Entagen. He will focus on the tools available to help scientists get at all the data they need in "TripleMap: Next Generation Semantic Search & Analytics for Life Sciences."

"At Entagen, we believe the confluence of two types of technology – semantic and big data capable – will allow for deeper insight into the complex questions that researchers are trying to answer in their daily work," Bouton explains. "The key advantage of combining semantic technology and big data capable technology is basically to be able to connect the dots, to tie together information through associations between entities like compounds, targets, diseases and so on. This allows scientists to more effectively generate hypotheses and identify unexpected associations more rapidly."

Bouton explains that semantic technology is the brainchild of Tim Berners-Lee, inventor of many of the Internet standards in play today. He introduced the semantic web in a 2001 Scientific American article.

"Imagine the World Wide Web as an interlinked network of documents, where the linkage between any two documents is a pointer," outlines Bouton. "In the semantic web, the proposal was for an interlinked network of concepts where each concept represents a thing and the linkages between those concepts has meaning, hence the term semantic. An example is ice as one concept and the other concept is water. Where you connect those two is referred to as a predicate – ice melts into water and water freezes into ice."

The semantic technology required by scientists ties together concepts, and it must also uncover relationships, he adds. To streamline the process for scientist, Bouton offers that the meaning must be expressed in a form that is computable. "At Entagen, our work has been to build large networks of triples, which are mechanisms by which we can express meaning in computable form. We then allow computers to traverse these networks to identify connections that are difficult for humans to identify simply due to the scale of data that they are trying to deal with. Triples arise not only from structured data sources but also from unstructured data sources, like PubMed articles and documents people are pushing to SharePoint. These need to be tied together. Our specific technology to accomplish this is called TripleMap."

According to the Entagen website:

TripleMap represents everything in its semantic data core as "master entities" that integrate all information for any given entity (e.g. protein, gene, compound, clinical trial, disease) including names, synonyms, symbols, meta-data properties, and associations to other entities. By searching for, and saving sets of entities, users build, share and, analyze "dynamic knowledge maps" of entities and their associations. These knowledge maps give users a "bird's eye view" of patterns of interconnection between entities of interest, are used to continuously scan for novel information as it becomes available and allow users to find other users creating similar maps, thereby enabling collaborative knowledge exchange.

"By allowing a researcher to quickly gain access and insight into how things are connected together across entity types, they can more readily identify things that are of interest to them in their work," Bouton notes.

"Biology over the last two decades has gone from being a research area in which the data was relatively manageable to a research area in which the data is overwhelming," Bouton adds. "Handling big data is a challenge – how do I migrate data, how do I store it in a secure and robust way and finally how do I make sense of all this information? One hopes to somehow convert data to information and then catalyze that information into knowledge that allows somebody to make insight and generate hypotheses. Enabling data to become information and then knowledge is big data analytics."

Bouton started his career as a bench scientist in the days when you could look at the five bars in your gel in the sunlight and get the information you needed. "Now terabytes of data are generated, but we still have the same fundamental need to understand the patterns and the connections between the bits of information in that experiment," Bouton says. "That's at the core of the challenge we have in applying technology to science as data grows in these fields. We believe there is going to be a continuing need for innovative tools to address the types of questions like big data analytics to pattern finding to novel mechanisms that allow researchers to interact with data like iPads and touchable screens. We're incredibly excited to build those types of tools and see them made use of in science."

Analyzing Data

Donald G. Jackson, Ph.D., Bristol-Myers Squibb Research & Development, seconds Bouton regarding the tremendous growth in data available to be analyzed.

"One of the big challenges in analyzing multidimensional data is that compared to a lot of conventional assays where you might look at one or two measurements, such as a luciferase measurement and a viability measurement, high-content assays can give anywhere from tens to hundreds of measurements from a single well with different biological probes. That provides a lot more information about what's going on than an individual measurement. Instead of just knowing that the signal went up or the signal went down, you can ask questions like where in the cell is the signal, is it being distributed evenly, is it showing us specific compartments, things like that. But the systems we built for conventional screening management really weren't well suited to capture that. The other issue that we had was that high-content assays and RNA interference assays both by their nature are cell-based assays. There is a lot of additional information about the cell model and about how the assay was done that is critical to future analysis and understanding of the results. Systems that were built to capture results from biochemical assays really weren't designed with that in mind."

To address those issues, BMS embarked on an internal software development project on Jackson's watch, which was published in the Journal of Biomolecular Screening (Jackson, D. HCS Road: An Enterprise System for Integrated HCS Data Management and Analysis. J. Biomol. Screen. 2010, 15(7): 882-891).

"Across the organization, we had the tools we needed but they weren't in the right places," Jackson notes. "For example, we in genomics now needed tools to run IC50 analyses on our results. We couldn't make use of existing tools from lead discovery because they were tightly coupled to a database that was built for biochemical assays and we couldn't readily translate our assays into that data model. Conversely, our colleagues in lead discovery were finding they needed to make much more use of tools like hierarchical clustering and other hypothesis neutral tools. And, again, their systems weren't set up to share data with the genomics tools that could run the clustering and other analyses."

Jackson said there was nothing available commercially to meet those needs at the time to help BMS manage large volumes of data with thousands of measurements per well in a way that was fast enough that it was useable but wouldn't completely overwhelm their databases.

"We also needed new data analysis tools – ways where people could look at the summary data, the IC50 data and other data and see how that correlated across many different measurements and ultimately to be able to go back and look at the original images that that information came from so that they could make the call," he adds. "The number changed, but does that mean the cell changed or does that mean there was something strange going on in this well that I didn't anticipate? So we had to build those tools together."

Using third-party software for the analysis, Jackson and his team focused on creating the data management system customized to work within the existing BMS infrastructure to interact with its compound management system and other databases.

"Now, we are working on merging HCS Road and our existing HTS database to figure out ways to get the best of both worlds," he continues. "I believe one of the most interesting things we're doing is combining approaches, methods and best practices from functional genomics, microarrays and screening communities to build tools and solutions that combine the best qualities of all."

Jackson, who trained as a developmental biologist, moved into bioinformatics during his postdoc and joined BMS in that area. He changed roles to take over a BMS group that does experimental biology with an emphasis on target discovery, target validation and biomarker identification and makes extensive use of high-content screening assays. But he has always been the guy in the laboratory playing with the computer instead of being at the bench.

Jackson presented "Data Management, Analysis and Visualization Tools for Understanding Multidimensional Screening Results" on September 27 as part of the Fall 2012 SLAS Webinar Series, where he talked about the tools and the system put in place for high-content data management and data analysis across Bristol-Myers. His session is available at SLAS On-Demand.

Fall 2012 SLAS Webinar Series Recap

Extracting Meaning From Your Data, a three-part webinar series, is free to SLAS members, both live and On-Demand.

September 27, 2012, 11:30 a.m. ET
Data Management, Analysis and Visualization Tools for Understanding Multidimensional Screening Results
Presenter: Donald G. Jackson, Ph.D., Bristol-Myers Squibb Research & Development

November 13, 2012, 9:00 a.m. ET
TripleMap: Next Generation Semantic Search & Analytics for Life Sciences
Christopher ML Bouton, Ph.D., Entagen

December 11, 2012, 11:30 a.m. ET
Application of Design of Experiments (DOE) in Protein Purification, Assay Development and Ligand Interaction Studies
Paul Taylor, M.S., Boehringer Ingelheim Pharmaceuticals, Inc.

October 22, 2012