Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects. Genet Epidemiol 2012 Jul;36(5):508-16

Date

05/31/2012

Pubmed ID

22644746

Pubmed Central ID

PMC3703942

DOI

10.1002/gepi.21647

Scopus ID

2-s2.0-84862254279 (requires institutional sign-in at Scopus site)   12 Citations

Abstract

Genotype imputation provides imputation of untyped single nucleotide polymorphisms (SNPs) that are present on a reference panel such as those from the HapMap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for African American populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for African Americans. The intersection strategy used a combined panel consisting of SNPs polymorphic in both CEU and YRI. The union strategy used a panel consisting of SNPs polymorphic in either CEU or YRI. The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1KG) Project for genotype imputation, we evaluated both 1KG-based imputations and HapMap-based imputations. We used 23,707 SNPs from chromosomes 21 and 22 on Affymetrix SNP Array 6.0 genotyped for 1,075 HyperGEN African Americans. We found that 1KG-based imputations provided a substantially larger number of variants than HapMap-based imputations, about three times as many common variants and eight times as many rare and low-frequency variants. This higher yield is expected because the 1KG panel includes more SNPs. Accuracy rates using 1KG data were slightly lower than those using HapMap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNPs polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1KG-based imputations can facilitate discovery of significant associations for SNPs across the whole MAF spectrum. Because the 1KG Project is still under way, we expect that later versions will provide better imputation performance.

Author List

Sung YJ, Gu CC, Tiwari HK, Arnett DK, Broeckel U, Rao DC

Author

Ulrich Broeckel MD Chief, Center Associate Director, Professor in the Pediatrics department at Medical College of Wisconsin




MESH terms used to index this publication - Major topics in bold

Algorithms
Chromosome Mapping
Genetic Linkage
Genome
Genome, Human
Genotype
Humans
Linkage Disequilibrium
Models, Genetic
Oligonucleotide Array Sequence Analysis
Polymorphism, Genetic
Polymorphism, Single Nucleotide
Reproducibility of Results
Software