CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

Alignment-free sequence comparison based on next-generation sequencing reads.

by: Kai Song, Jie Ren, Zhiyuan Zhai, Xuemei Liu, Minghua Deng, Fengzhu Sun
Journal of computational biology : a journal of computational molecular cell biology, Vol. 20, No. 2. (February 2013), pp. 64-79, doi:10.1089/cmb.2012.0228  Key: citeulike:12004103

Formatted Citation


Show HTML

Likes (beta)

This copy of the article hasn't been liked by anyone yet.

View FullText article


Abstract

Abstract Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D(2), [Formula: see text], and [Formula: see text], both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both [Formula: see text] and [Formula: see text] outperform D(2) for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of [Formula: see text] and [Formula: see text]. Finally, variations of these statistics, d(2), [Formula: see text] and [Formula: see text], respectively, are used to first cluster five mammalian species with known phylogenetic relationships, and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using [Formula: see text] are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic [Formula: see text] provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.


accopeland's tags for this article

Citations (CiTO)

No CiTO relationships defined

X There are no reviews yet

X Find related articles with these CiteULike tags

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.