Medical College of Wisconsin
CTSICores SearchResearch InformaticsREDCap

LOcating non-unique matched tags (LONUT) to improve the detection of the enriched regions for ChIP-seq data. PLoS One 2013;8(6):e67788

Date

07/05/2013

Pubmed ID

23825685

Pubmed Central ID

PMC3692479

DOI

10.1371/journal.pone.0067788

Scopus ID

2-s2.0-84879382729 (requires institutional sign-in at Scopus site)   10 Citations

Abstract

One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.

Author List

Wang R, Hsu HK, Blattler A, Wang Y, Lan X, Wang Y, Hsu PY, Leu YW, Huang TH, Farnham PJ, Jin VX

Author

Victor X. Jin PhD Professor in the Institute for Health and Equity department at Medical College of Wisconsin




MESH terms used to index this publication - Major topics in bold

Algorithms
Base Sequence
Chromatin Immunoprecipitation
Humans
K562 Cells
Linear Models
MCF-7 Cells
Sequence Analysis
Statistics as Topic