Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Joint Screening for Ultra-High Dimensional Multi-Omics Data. Bioengineering (Basel) 2024 Nov 25;11(12)

Date

01/08/2025

Pubmed ID

39768011

Pubmed Central ID

PMC11727280

DOI

10.3390/bioengineering11121193

Scopus ID

2-s2.0-85213292858 (requires institutional sign-in at Scopus site)

Abstract

Investigators often face ultra-high dimensional multi-omics data, where identifying significant genes and omics within a gene is of interest. In such data, each gene forms a group consisting of its multiple omics. Moreover, some genes may also be highly correlated. This leads to a tri-level hierarchical structured data: the cluster level, which is the group of correlated genes, the subgroup level, which is the group of omics of the same gene, and the individual level, which consists of omics. Screening is widely used to remove unimportant variables so that the number of remaining variables becomes smaller than the sample size. Penalized regression with the remaining variables after performing screening is then used to identify important variables. To screen unimportant genes, we propose to cluster genes and conduct screening. We show that the proposed screening method possesses the sure screening property. Extensive simulations show that the proposed screening method outperforms competing methods. We apply the proposed variable selection method to the TCGA breast cancer dataset to identify genes and omics that are related to breast cancer.

Author List

Kemmo Tsafack U, Lin CW, Ahn KW

Authors

Kwang Woo Ahn PhD Director, Professor in the Data Science Institute department at Medical College of Wisconsin
Chien-Wei Lin PhD Associate Professor in the Data Science Institute department at Medical College of Wisconsin