Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res 2019 05 07;47(8):e45

Date

02/19/2019

Pubmed ID

30773592

Pubmed Central ID

PMC6486576

DOI

10.1093/nar/gkz096

Scopus ID

2-s2.0-85068538022   4 Citations

Abstract

Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao's score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods. Its source code is available at https://github.com/HelloYiHan/DriverML.

Author List

Han Y, Yang J, Qian X, Cheng WC, Liu SH, Hua X, Zhou L, Yang Y, Wu Q, Liu P, Lu Y

Author

Pengyuan Liu PhD Adjunct Professor in the Physiology department at Medical College of Wisconsin




MESH terms used to index this publication - Major topics in bold

Atlases as Topic
Cell Cycle Proteins
Cell Line, Tumor
Cell Movement
Cell Proliferation
Datasets as Topic
Gene Expression Regulation, Neoplastic
Humans
Machine Learning
Monte Carlo Method
Mutation
Neoplasm Proteins
Neoplasms
Nuclear Proteins
Oncogenes
Software
jenkins-FCD Prod-484 8aa07fc50b7f6d102f3dda2f4c7056ff84294d1d