Aneuploidy Prediction and Tumor Classification with Heterogeneous Hidden Conditional Random Fields

Web Published:

Princeton University Invention # 09-2491



Cancer signatures are often confounded when looking at tumor morphology, but may be inferred through genetic aberration patterns. Array-based methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes and identifying potentially causative loci remains a challenge. Conventional statistical and machine learning methods for linking alterations to clinical outcome ignore critical features. On the other hand, existing sequence classification methods can only model aggregate copy number instability, and disregard what happens at genetic loci.


Researchers in the Computer Science Department and the Lewis-Sigler Institute for Integrative Genomics, Princeton University have developed an integrated method for jointly classifying tumors, inferring copy numbers, and identifying clinically relevant positions in recurrent alteration regions from high-throughput copy number data, such as array CGH.


By capturing sequential as well as local information, this  integrated model, referred to as

 ¿ Heterogeneous Hidden Conditional Random Field¿  provides better noise reduction, and achieves more relevant gene retrieval and more accurate classification than existing methods. This new method notably selects a small set of candidate genes that can be statistically linked with high confidence to disease-specific genetic aberration patterns, and provides unbiased starting points in deciding which genomic regions and which genes to pursue for further examination.


Experiments on synthetic data and on cancer data show that this method is superior, in terms of both prediction accuracy and relevant feature discovery, to existing methods. The utility of  Heterogeneous Hidden Conditional Random Field  has been demonstrated  by generating novel biological hypotheses for breast and bladder cancer and melanoma

(see cited reference below).


Heterogeneous Hidden Conditional Random Field has the potential to be used to discover genes involved in cancer, as a method for developing molecular diagnosis, or for analysis of patient data for potential molecular-based targeted treatments, treatment response, and prognosis.


Princeton is currently seeking industrial collaborators to further develop and commercialize this technology. Patent protection is pending.





Barutcuoglu Z., Airoldi E., Dumeaux V., Schapire R., Troyanskaya O.,  Aneuploidy Prediction and Tumor Classification with Heterogenous Hidden Conditional Random Fields, Bioinformatics Advanced Access published December 4, 2008.




For more information on Princeton University invention # 09-2491 please contact:


                        Laurie Tzodikov

                        Office of Technology Licensing and Intellectual Property

                        Princeton University

                        4 New South Building

                        Princeton, NJ 08544-0036

                        (609) 258-7256

                        (609) 258-1159 fax


Patent Information:
For Information, Contact:
Laurie Tzodikov
Licensing Associates
Princeton University
Robert Schapire
Olga Troyanskaya
Zafer Barutcuoglu
Edoardo Airoldi