Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Variable selection and pattern recognition with gene expression data generated by the microarray technology. Math Biosci 2002 Mar;176(1):71-98 PMID: 11867085

Pubmed ID



Lack of adequate statistical methods for the analysis of microarray data remains the most critical deterrent to uncovering the true potential of these promising techniques in basic and translational biological studies. The popular practice of drawing important biological conclusions from just one replicate (slide) should be discouraged. In this paper, we discuss some modern trends in statistical analysis of microarray data with a special focus on statistical classification (pattern recognition) and variable selection. In addressing these issues we consider the utility of some distances between random vectors and their nonparametric estimates obtained from gene expression data. Performance of the proposed distances is tested by computer simulations and analysis of gene expression data on two different types of human leukemia. In experimental settings, the error rate is estimated by cross-validation, while a control sample is generated in computer simulation experiments aimed at testing the proposed gene selection procedures and associated classification rules.

Author List

Szabo A, Boucher K, Carroll WL, Klebanov LB, Tsodikov AD, Yakovlev AY


Aniko Szabo PhD Associate Professor in the Institute for Health and Equity department at Medical College of Wisconsin


2-s2.0-0036130619   47 Citations

MESH terms used to index this publication - Major topics in bold

Computer Simulation
Gene Expression Profiling
Leukemia, Myeloid, Acute
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated
Precursor Cell Lymphoblastic Leukemia-Lymphoma
jenkins-FCD Prod-299 9ef562391eceb2b8f95265c767fbba1ce5a52fd6