UM  > Faculty of Health Sciences
Residential Collegefalse
Status已發表Published
Simultaneously learning DNA motif along with Its position and sequence rank preferences through em algorithm
Zhang Z.1; Chang C.W.2; Hugo W.1; Cheung E.2; Sung W.-K.1
2012-05-15
Conference Name16th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2012
Source PublicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7262 LNBI
Pages355-370
Conference Date21 April 2012 ~ 24 April 2012
Conference PlaceSingapore
Abstract

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e. position preference and sequence rank preference). This information is usually required from the user. This paper presents a de novo motif discovery algorithm called SEME which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large scale synthetic datasets, 32 metazoan compendium benchmark datasets and 164 ChIP-Seq libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (co-TF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct co-TF motifs and, at the same time, predicted co-TF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each co-TF reveals potential interaction mechanisms between the primary TF and the co-TF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the co-TFs. © 2012 Springer-Verlag Berlin Heidelberg.

KeywordBinding Preference Expectation Maximization Importance Sampling Motif Finding
DOI10.1007/978-3-642-29627-7_37
URLView the original
Language英語English
Scopus ID2-s2.0-84860820471
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionFaculty of Health Sciences
Affiliation1.National University of Singapore
2.A-Star, Genome Institute of Singapore
Recommended Citation
GB/T 7714
Zhang Z.,Chang C.W.,Hugo W.,et al. Simultaneously learning DNA motif along with Its position and sequence rank preferences through em algorithm[C], 2012, 355-370.
APA Zhang Z.., Chang C.W.., Hugo W.., Cheung E.., & Sung W.-K. (2012). Simultaneously learning DNA motif along with Its position and sequence rank preferences through em algorithm. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7262 LNBI, 355-370.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang Z.]'s Articles
[Chang C.W.]'s Articles
[Hugo W.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang Z.]'s Articles
[Chang C.W.]'s Articles
[Hugo W.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang Z.]'s Articles
[Chang C.W.]'s Articles
[Hugo W.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.