Join MySLAS Social

The Dark Genome: Let There Be Light

“The dark genome represents the set of genes and their corresponding proteins that remain unstudied or understudied. Given that target validation is a critical bottleneck in the discovery of new therapeutics, illumination of the dark genome could identify druggable targets relevant to diseases,” says Rajarshi Guha, research scientist at the U.S. National Institutes of Health (NIH) National Center for Advancing Translational Sciences and chair of the SLAS2017 Informatics Session on Let There Be Light: Informatics Approaches to Exploring the Dark Genome. 

By The Lab Man

(AKA SLAS Director of Education Steve Hamilton)

Quarter Century and Still Learning  

In the 25 years since the initiation of the Human Genome Project, many revelations about our genetic makeup have come to light. For instance, we now understand that 95 percent of the human genome is non-coding, in that it does not contain a blueprint for making proteins. This used to be referred to as “junk DNA.” Guha indicates, “Analyzing data from the NIH RePORTER indicates that a relatively small set of targets is the focus of the bulk of publicly funded research. Furthermore, bibliometric searches corroborate this observation. Given that target validation is a critical bottleneck in the discovery of new therapeutics, illumination of the dark genome could identify druggable targets relevant to diseases. 

“The NIH Illuminating the Druggable Genome (IDG) program is facilitating these efforts in two, parallel and complementary tracks,” continues Guha. “First, the IDG program has funded several technology development groups whose mandate is to develop novel technologies that can be used to shed light on specific targets or target families. These include novel imaging modalities that enable assigning function to targets in in vivo settings such as those developed by the Yeh group at Massachusetts General Hospital and the Kokel group at the University of California, San Francisco; and methodologies to interrogate the druggable human GPCRome by the Roth group at the University of North Carolina and the Shoichet group at the University of California, San Francisco. Second, the IDG has funded the development of a Knowledge Management Center (KMC) whose goal is to integrate disease, pathway, protein, gene, chemical, bioactivity, drug discovery and clinical status databases and documents (publications & patents), supported by innovative algorithmic platforms, knowledge management tools and user interfaces. The KMC bridges clinical, biological, chemical and genomic data to prioritize targets from within these privileged target families for further experimental evaluation and analyses by the broader scientific community.”

Guha indicates that the requests for abstracts for the next phase of the IDG program has been released and invites interested parties to respond by Feb. 14, 2017. Data and Resource Generation Centers for Illuminating the Druggable Genome (U24) can be found here, while Knowledge Management Center for Illuminating the Druggable Genome  (U24) is here.

The Role of Informatics

“Informatics plays a foundational role in this effort,” explains Guha. “First, the KMC must collate multiple data sources and data types that characterize what is known about targets. This involves developing workflows to ensure that entities are correctly resolved and linked. Second, having collated the data, it must be made available to users in an intuitive fashion. This involves the development of effective search methods, visualization tools and interface development. Third, given that target relevance is context dependent, computational approaches to target prioritization can play a critical role in focusing experimental research on specific targets. Finally, target prioritization usually involves multiple parameters – what does the target do? Is it druggable? And so on. For dark targets, this information is not always available from the literature or from experimental resources and thus computational methods to impute this information can be useful for prioritization pipelines.”

Dark Genome Informatics at SLAS2017, Feb. 4-8, Washington, DC

The Dark Genome Informatics Session at SLAS2017 consists of four highly respected speakers. Guha offers the following insight into these upcoming presentations:

Denise Carvalho-Silva
European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
Open Targets Platform: Mining Gene-Disease Evidence for Improved Drug Target Selection

Carvalho-Silva talks about the EMBL-EBI Open Targets initiative, whose goal is to provide a resource that identifies and validates the causal links between targets, diseases and pathways. Her presentation is relevant to the topic because it is a complementary effort to the IDG, and given its focus on target validation, presents an alternative view to the notion of illumination of dark targets.

Tudor I. Oprea
University of New Mexico, Albuquerque, NM
Illuminating the Druggable Proteome

Oprea talks about the IDG Knowledge Management Center and its efforts to collate diverse data sources, implement drug-target ontology and present all of this via an intuitive user interface. Using this resource, he highlights the existing knowledge deficit (i.e., we have no functional information on a significant fraction of the proteome) and that only three percent of the proteome is therapeutically addressed by drugs.

Avi Ma’ayan
Mount Sinai Hospital, New York, NY
A Web Application to Predict Drugs to Modulate the Expression of a Specific Gene

Ma’ayan describes an approach to prioritize small molecules that are predicted to affect the expression of specific genes based on experimental data from the NIH National Library of Network-Based Cellular Signatures (LINCS) L1000 and the NIH National Center for Biotechnology Information Gene Expression Omnibus (GEO) resources. The talk is relevant to the topic because it provides a way to identify probes of molecular function, and therefore, a means to illuminate genes with unknown or unclear functions.

Viswanath Devanarayan
Gene-Network Based Predictive Modeling to Identify Biomarkers for High Dimensional Genomic Data

Devanarayan describes an approach that reconstructs gene regulatory networks and uses features derived from these networks (such as hub nodes) to build predictive biomarker models. This is in contrast to previous approaches that tend to build such models using individual genes. As with Ma’ayan’s presentation, this is important because it provides a computational approach to identifying relevant genes – which may include dark genes – for a given condition.


“The Dark Genome Informatics Session is of interest to experimentalists interested in learning about methods and resources that can be used for target prioritization and generating a research portfolio around a target of interest and to computational scientists interested in learning about integrated data sources and novel predictive methods,” says Guha. “In addition, the session can be of interest to funding agencies and program managers who are interested in resources that could enable them to get a comprehensive overview of the research landscape around specific targets and target families, thereby enabling decisions on funding specific research areas.”


About the Author

The Lab Man is SLAS Director of Education Steve Hamilton, Ph.D., a creative change maker who delivers the fresh thinking and energy that has helped make SLAS the go-to resource for those in life sciences discovery and technology. After years in the drug discovery world, spearheading many leading-edge automation projects for companies such as Eli Lilly, Scitec and Amgen, Hamilton joined the SLAS professional team in 2010. He received his Ph.D. in analytical chemistry from Purdue University and a B.S. in chemistry from Southeast Missouri State University.

November 21, 2016