DeepSEA and Seqweaver: Deep learning frameworks for the identification of de novo disease-causing mutations in noncoding sequences

Web Published:

DeepSEA and Seqweaver: Deep learning frameworks for the identification of de novo disease-causing mutations in noncoding sequences


Princeton Docket #s 18-3426, 18-3440


Researchers at Princeton University and Rockefeller University have collaborated to establish methods for the identification of de novo mutations in the vast noncoding genomic sequences. These methods, DeepSEA 2.0 and Seqweaver, use a novel deep learning framework to compare large datasets of regulatory sequences compiled from in vivo analyses with datasets of families affected by complex human diseases to pinpoint contributing mutations at the single nucleotide level in transcriptional regulatory sequences and RNA regulatory sequences, respectively. Application of DeepSEA 2.0 and Seqweaver to a database of families affected by simplex Autism Spectrum Disorder (ASD) revealed point mutations in transcriptional regulatory sequences and RNA regulatory sequences that contribute to an estimated 14% and 12% of cases, respectively. This provides more than a 50% increase to cases with the previously identified mutations in coding regions, which contribute to an estimated 30% of cases. DeepSEA 2.0 and Seqweaver can be used to analyze other complex human diseases with large datasets to identify contributing point mutations in noncoding regions.


The human genome is comprised of gene coding regions, which code for the amino acid sequences of proteins, and noncoding regions, which include sequences that regulate gene expression and RNA processing at the DNA and RNA levels.  While significant progress has been made in identifying disease-causing mutations in gene coding regions, the impact of noncoding mutations, which make up the majority of the human genome, remains underappreciated. The lack of progress in this area reflects the difficulty in distinguishing rare disease-relevant mutations from biological and technical variations that are common in noncoding sequences.



       Genetic testing: Identification of DNA mutations/variants that could cause disease

       Drug target identification: Predicts likely causal disease mutations

       Development of personalized medicine and clinical diagnosis products



       Accurately identifies novel transcriptional and RNA processing mutations.

       Accurately predicts RNA binding protein binding sites using RNA sequence features alone

       Prioritizes regulatory sequences in the noncoding genome


Intellectual Property & Development Status

Patent protection is pending.

Princeton is currently seeking commercial partners for the further development and commercialization of this opportunity.



Zhou J, Park C, Theesfeld C, Yuan Y, Sawicka K, Darnell J, Scheckel C, Fak J, Tajima Y, Darnell R, Troyanskaya O. 2018. Whole-genome deep learning analysis reveals causal role of noncoding mutations in autism. bioRxiv doi: 10.1101/319681


The Inventors

Olga Troyanskaya is a professor at the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University, where she has been on the faculty since 2003. In 2014 she became the deputy director of Genomics at the Center for Computational Biology at the Flatiron Institute, a part of the Simons Foundation in NYC. She holds a Ph.D. in Biomedical Informatics from Stanford University, has been honored as one of the top young technology innovators by the MIT Technology Review, and is a recipient of the Sloan Research Fellowship, the National Science Foundation CAREER award, the Overton award from the International Society for Computational Biology, and the Ira Herskowitz award from the Genetic Society of America.


Robert Darnell is the Robert and Harriet Heilbrunn Professor of Cancer Biology at Rockefeller University, where he has been on the faculty since 1992. He has been a Howard Hughes Medical Institute Investigator since 2002 and the President, CEO, and Scientific Director of the New York Genome Center since 2012. He holds a Ph.D. in Molecular Biology and M.D. from Washington University School of Medicine and is a recipient of the NINDS Outstanding Investigator award, the NIH Director’s Transformative Research award, the Burroughs Wellcome Fund award, The Derek Denny-Brown Young Neurological Scholar award, and the Irma T. Hirschl/Monique Weill-Caulier Trust Research award.


Jian Zhou is a Flatrion fellow, at the Flatiron Institute at the Simons Foundation that mainly works on understanding chromatin and genome variation. He received a B.S. from Peking University and a Ph.D. from Princeton University.


Chandra Theesfeld is a research scientist in the laboratory of Professor Olga Troyanskaya at Princeton University.


Christopher Park is a research scientist at the Simons Foundation.



Laurie Tzodikov

Princeton University Office of Technology Licensing

(609) 258-7256 •


Catherine Ruesch

Princeton University Office of Technology Licensing

University Administrative Fellow


Patent Information:
For Information, Contact:
Cortney Cavanaugh
New Ventures and licensing associate
Princeton University
Jian Zhou
Olga Troyanskaya
Christopher Park
Robert Darnell
Chandra Theesfeld