Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Variable selection and pattern recognition with gene expression data generated by the microarray technology. Math Biosci 2002 Mar;176(1):71-98



Pubmed ID




Scopus ID

2-s2.0-0036130619   49 Citations


Lack of adequate statistical methods for the analysis of microarray data remains the most critical deterrent to uncovering the true potential of these promising techniques in basic and translational biological studies. The popular practice of drawing important biological conclusions from just one replicate (slide) should be discouraged. In this paper, we discuss some modern trends in statistical analysis of microarray data with a special focus on statistical classification (pattern recognition) and variable selection. In addressing these issues we consider the utility of some distances between random vectors and their nonparametric estimates obtained from gene expression data. Performance of the proposed distances is tested by computer simulations and analysis of gene expression data on two different types of human leukemia. In experimental settings, the error rate is estimated by cross-validation, while a control sample is generated in computer simulation experiments aimed at testing the proposed gene selection procedures and associated classification rules.

Author List

Szabo A, Boucher K, Carroll WL, Klebanov LB, Tsodikov AD, Yakovlev AY


Aniko Szabo PhD Professor in the Institute for Health and Equity department at Medical College of Wisconsin

MESH terms used to index this publication - Major topics in bold

Computer Simulation
Gene Expression Profiling
Leukemia, Myeloid, Acute
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated
Precursor Cell Lymphoblastic Leukemia-Lymphoma
jenkins-FCD Prod-482 91ad8a360b6da540234915ea01ff80e38bfdb40a