A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat Commun 2020 Feb 05;11(1):728
Date
02/07/2020Pubmed ID
32024849Pubmed Central ID
PMC7002586DOI
10.1038/s41467-019-13825-8Scopus ID
2-s2.0-85079062177 (requires institutional sign-in at Scopus site) 108 CitationsAbstract
In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA.
Author List
Jiao W, Atwal G, Polak P, Karlic R, Cuppen E, PCAWG Tumor Subtypes and Clinical Translation Working Group, Danyi A, de Ridder J, van Herpen C, Lolkema MP, Steeghs N, Getz G, Morris QD, Stein LD, PCAWG ConsortiumAuthors
Akinyemi Ojesina MD, PhD Assistant Professor in the Obstetrics and Gynecology department at Medical College of WisconsinJanet Sue Rader MD Chair, Professor in the Obstetrics and Gynecology department at Medical College of Wisconsin
MESH terms used to index this publication - Major topics in bold
Computational BiologyFemale
Genome, Human
Humans
Male
Mutation
Neoplasm Metastasis
Neoplasms
Reproducibility of Results
Whole Genome Sequencing