![]() |
CiteULike | ![]() |
tny's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Gclust: trans-kingdom classification of proteins using automatic individual threshold settingby: Naoki Sato
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractMotivation: Trans-kingdom protein clustering remained difficult because of large sequence divergence between eukaryotes and prokaryotes and the presence of a transit sequence in organellar proteins. A large-scale protein clustering including such divergent organisms needs a heuristic to efficiently select similar proteins by setting a proper threshold for homologs of each protein. Here a method is described using two similarity measures and organism count. Results: The Gclust software constructs minimal homolog groups using all-against-all BLASTP results by single-linkage clustering. Major points include (i) estimation of domain structure of proteins; (ii) exclusion of multi-domain proteins; (iii) explicit consideration of transit peptides; and (iv) heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method. The resultant clusters were evaluated in the light of power law. The software was used to construct protein clusters for up to 95 organisms. Availability: Software and data are available at http://gclust.c.u-tokyo.ac.jp/Gclust_Download.html. Contact: naokisat@bio.c.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. 10.1093/bioinformatics/btp047
BibTeX record
RIS record