Predicting Genomic Variation Effects on Human Gene Transcription

Web Published:

De novo prediction of tissue-specific expression effect and disease risk for every mutation in a human genome - Expecto


Princeton Docket # 18-3423


Sequence-based models, with their scalability and usage of sequence dependencies, enable a new era of mutation analysis at unprecedented scale that can yield a new perspective on human variation in human diseases and complex traits. Researchers in the Department of Computer Science and the Lewis-Sigler Institute of Integrative Genomics at Princeton University and the Center for Computational Biology at the Flatiron Institute, a part of the Simons Foundation have developed data-driven models (ExPecto) that predict tissue-specific transcription levels for each gene in the human genome directly from 40kbp promoter-proximal sequences, leveraging sequence features learned from chromatin profiling data.  ExPecto is capable of predicting expression-altering effects of any mutation with high confidence, across over 200 tissues and cell types.  ExPecto was used to systematically predict likely expression-altering human genome variants, which were used to prioritize causal variants within GWAS disease- or trait-associated loci. The researchers experimentally showed that ExPecto predicted putative causal SNPs identified by the original GWAS studies, but not the lead SNPs  identified by the original GWAS studies at three loci associated with four diseases cause expression-altering effects. 


The scalability of ExPecto allows one to systematically characterize  the full predicted expression effect space of potential mutations for each gene, via profiling of over 140 million promoter proximal mutations. The resulting distribution of predicted mutation effects is informative of gene-specific evolutionary constraints of expression, indicating whether a specific gene is under evolutionary pressure for low or high expression in a particular tissue.  Understanding such constraints on human gene expression could provide valuable information on deleterious impacts of gene transcription dysregulation in a systematic manner, which are otherwise difficult to obtain due to experimental limitations in humans, enabling identification of novel potential disease-associated non-coding mutations.




Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk, Jian Zhou, Chandra L. Theesfeld 1, Kevin Yao, Kathleen M. Chen3, Aaron K. Wong and

Olga G. Troyanskaya, Nature Genetics, Vol 50. Aug 2018, 1171-1179.


Potential Applications


       Use in genetic counseling in assessing risk of genomic variations with or without prior knowledge

       Identification of genomic variations that effect gene expression in patients

       Identification of disease causing non-coding gene alternations




       Prediction of  expression-altering effects with high confidence

       Scalability and usage of sequence dependencies


Intellectual Property & Development Status


Patent protection is pending.

Princeton is currently seeking commercial partners for the further development and commercialization of this opportunity.


The Inventors


Olga  Troyanskaya  is a professor at the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University, where she has been on the faculty since 2003. In 2014 she became the deputy director of Genomics at the Center for Computational Biology at the Flatiron Institute, a part of the Simons Foundation in NYC. She holds a Ph.D. in Biomedical Informatics from Stanford University, has been honored as one of the top young technology innovators by the MIT Technology Review, and is a recipient of the Sloan Research Fellowship, the National Science Foundation CAREER award, the Overton award from the International Society for Computational Biology, and the Ira Herskowitz award from the Genetic Society of America.


Jian Zhou is a Flatrion fellow, at the Flatiron Institute at the Simons Foundation  that mainly works on understanding chromatin and genome variation. He received a B.S. from Peking University


Chandra Theesfeld  is a research scientist in the laboratory of professor Olga Troyanskaya  at Princeton University.





Patent Information:
For Information, Contact:
Cortney Cavanaugh
New Ventures and licensing associate
Princeton University
Olga Troyanskaya
Jian Zhou
Chandra Theesfeld