<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns="http://purl.org/rss/1.0/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
   xmlns:dcterms="http://purl.org/dc/terms/"

>
<channel rdf:about="http://www.citeulike.org/about">
<pubDate>Thu, 21 Aug 2008 15:23:12 BST</pubDate>


	<title>CiteULike: azazello's library [115 articles]</title>
	<description>CiteULike: azazello's library [115 articles]</description>


	<link>http://www.citeulike.org/user/azazello</link>
	<dc:publisher>CiteULike.org</dc:publisher>
	<dc:language>en-gb</dc:language>
	<dc:rights>Copyright &#169; 2004-2008 citeulike.org</dc:rights>
	<items>
    <rdf:Seq>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1314150"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1604373"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/3041531"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1937011"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1167867"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2877844"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2877841"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2427288"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2784425"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2048011"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/79177"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2730053"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/975256"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1203362"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2343216"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/392420"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/131325"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2585828"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/681624"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/700422"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1167473"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2514966"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1963651"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1303460"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2242249"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/270463"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2382364"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/891657"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2570446"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1106952"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2324215"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/79831"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2242997"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2563094"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2547951"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1873399"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1543434"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2547927"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1125851"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1122449"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2311222"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/373647"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1152377"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/392364"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/340715"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2291023"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2291022"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/1472428"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/2086009"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/azazello/article/238188"/>

	</rdf:Seq>
	</items>
	</channel>


<item rdf:about="http://www.citeulike.org/user/azazello/article/1314150">
    <title>An Introduction to Systems Biology: Design Principles of Biological Circuits (Chapman &#38; Hall/Crc Mathematical and Computational Biology Series)</title>
    <link>http://www.citeulike.org/user/azazello/article/1314150</link>
    <description>&lt;i&gt;(07 July 2006)&lt;/i&gt;</description>
    <dc:title>An Introduction to Systems Biology: Design Principles of Biological Circuits (Chapman &#38; Hall/Crc Mathematical and Computational Biology Series)</dc:title>

    <dc:creator>Uri Alon</dc:creator>
    <dc:source>(07 July 2006)</dc:source>
    <dc:date>2007-05-21T00:56:08-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publisher>Chapman &#38; Hall/CRC</prism:publisher>
    <prism:category>no-tag</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1604373">
    <title>The phusion assembler.</title>
    <link>http://www.citeulike.org/user/azazello/article/1604373</link>
    <description>&lt;i&gt;Genome Res, Vol. 13, No. 1. (2003), pp. 81-90.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ~7.5x sequence coverage, producing a high-quality draft assembly 2.6 gigabases in size, of which 90% of these bases are in 479 scaffolds. For the mouse genome, which is a large and repeat-rich genome, the input dataset was designed to include a high proportion of paired end sequences of various size selected inserts, from 2-200 kbp lengths, into various host vector templates. Phusion uses sequence data, called reads, and information about reads that share common templates, called read pairs, to drive the assembly of this large genome to highly accurate results. The preassembly stage, which clusters the reads into sensible groups, is a key element of the entire assembler, because it permits a simple approach to parallelization of the assembly stage, as each cluster can be treated independent of the others. In addition to the application of Phusion to the mouse genome, we will also present results from the WGS assembly of Caenorhabditis briggsae sequenced to about 11x coverage. The C. briggsae assembly was accessioned through EMBL, http://www.ebi.ac.uk/services/index.html, using the series CAAC01000001-CAAC01000578, however, the Phusion mouse assembly described here was not accessioned. The mouse data was generated by the Mouse Genome Sequencing Consortium. The C. briggsae sequence was generated at The Wellcome Trust Sanger Institute and the Genome Sequencing Center, Washington University School of Medicine.</description>
    <dc:title>The phusion assembler.</dc:title>

    <dc:creator>JC Mullikin</dc:creator>
    <dc:creator>Z Ning</dc:creator>
    <dc:identifier>doi:10.1101/gr.731003</dc:identifier>
    <dc:source>Genome Res, Vol. 13, No. 1. (2003), pp. 81-90.</dc:source>
    <dc:date>2007-08-29T09:26:30-00:00</dc:date>
    <prism:publicationYear>2003</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:volume>13</prism:volume>
    <prism:number>1</prism:number>
    <prism:startingPage>81</prism:startingPage>
    <prism:endingPage>90</prism:endingPage>
    <prism:category>cse</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/3041531">
    <title>Large-scale Genome Sequence Processing</title>
    <link>http://www.citeulike.org/user/azazello/article/3041531</link>
    <description>&lt;i&gt;(07 July 2006)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Univ. of Tokyo, Japan. Textbook emphasizes basic software implementation techniques for processing large-scale genome sequences and provides executable sample programs. Includes simple string search, sorting, lookup tables, suffix arrays, approximate string search, seeded alignments, whole genome shotgun sequencing, and more. For researchers. Expanded-outline format.</description>
    <dc:title>Large-scale Genome Sequence Processing</dc:title>

    <dc:creator>Masahiro Kasahara</dc:creator>
    <dc:creator>Shinichi Morishita</dc:creator>
    <dc:source>(07 July 2006)</dc:source>
    <dc:date>2008-07-25T01:28:29-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publisher>World Scientific Publishing Company</prism:publisher>
    <prism:category>cse</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1937011">
    <title>The BellKor solution to the Netflix Prize</title>
    <link>http://www.citeulike.org/user/azazello/article/1937011</link>
    <description>&lt;i&gt;&lt;/i&gt;</description>
    <dc:title>The BellKor solution to the Netflix Prize</dc:title>

    <dc:creator>Robert Bell</dc:creator>
    <dc:creator>Yehuda Koren</dc:creator>
    <dc:creator>Chris Volinsky</dc:creator>
    <dc:date>2007-11-19T08:44:50-00:00</dc:date>
    <prism:category>binning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1167867">
    <title>A short introduction to boosting</title>
    <link>http://www.citeulike.org/user/azazello/article/1167867</link>
    <description>&lt;i&gt;(1999)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting's relationship to support-vector machines. Some examples of recent applications of boosting are also described.</description>
    <dc:title>A short introduction to boosting</dc:title>

    <dc:creator>Y Freund</dc:creator>
    <dc:creator>R Schapire</dc:creator>
    <dc:source>(1999)</dc:source>
    <dc:date>2007-03-16T18:11:58-00:00</dc:date>
    <prism:publicationYear>1999</prism:publicationYear>
    <prism:category>binning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2877844">
    <title>Stochastic gradient boosting</title>
    <link>http://www.citeulike.org/user/azazello/article/2877844</link>
    <description>&lt;i&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current &#34;pseudo&#34;--residuals by least--squares at each iteration. The pseudo--residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point, evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by...</description>
    <dc:title>Stochastic gradient boosting</dc:title>

    <dc:creator>J Friedman</dc:creator>
    <dc:date>2008-06-09T18:30:52-00:00</dc:date>
    <prism:category>binning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2877841">
    <title>k-means++: the advantages of careful seeding</title>
    <link>http://www.citeulike.org/user/azazello/article/2877841</link>
    <description>&lt;i&gt;(2007), pp. 1027-1035.&lt;/i&gt;</description>
    <dc:title>k-means++: the advantages of careful seeding</dc:title>

    <dc:creator>David Arthur</dc:creator>
    <dc:creator>Sergei Vassilvitskii</dc:creator>
    <dc:source>(2007), pp. 1027-1035.</dc:source>
    <dc:date>2008-06-09T18:29:14-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:startingPage>1027</prism:startingPage>
    <prism:endingPage>1035</prism:endingPage>
    <prism:publisher>Society for Industrial and Applied Mathematics</prism:publisher>
    <prism:category>binning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2427288">
    <title>Lessons from the Netflix prize challenge</title>
    <link>http://www.citeulike.org/user/azazello/article/2427288</link>
    <description>&lt;i&gt;SIGKDD Explor. Newsl., Vol. 9, No. 2. (December 2007), pp. 75-79.&lt;/i&gt;</description>
    <dc:title>Lessons from the Netflix prize challenge</dc:title>

    <dc:creator>Robert Bell</dc:creator>
    <dc:creator>Yehuda Koren</dc:creator>
    <dc:identifier>doi:10.1145/1345448.1345465</dc:identifier>
    <dc:source>SIGKDD Explor. Newsl., Vol. 9, No. 2. (December 2007), pp. 75-79.</dc:source>
    <dc:date>2008-02-25T22:23:38-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>SIGKDD Explor. Newsl.</prism:publicationName>
    <prism:issn>1931-0145</prism:issn>
    <prism:volume>9</prism:volume>
    <prism:number>2</prism:number>
    <prism:startingPage>75</prism:startingPage>
    <prism:endingPage>79</prism:endingPage>
    <prism:publisher>ACM</prism:publisher>
    <prism:category>binning</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2784425">
    <title>Computational identification of putative programmed translational frameshift sites</title>
    <link>http://www.citeulike.org/user/azazello/article/2784425</link>
    <description>&lt;i&gt;Bioinformatics, Vol. 18, No. 8. (1 August 2002), pp. 1046-1053.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Motivation: In an effort to identify potential programmed frameshift sites by statistical analysis, we explore the hypothesis that selective pressure would have rendered such sites underabundant and underrepresented in protein-coding sequences. We developed a computer program to compare the frequencies of k-length subsequences of nucleotides with the frequencies predicted by a zero order Markov chain determined by the codon bias of the same set of sequences. The program was used to calculate and evaluate the distribution of 7-base oligonucleotides in the 6000+ putative protein-coding sequences of S. cerevisiae preliminary to the laboratory testing of the most highly underrepresented oligos for frameshifting efficiency. Results: Among the most significant results is the finding that the heptanucleotides CUU-AGG-C and CUU-AGU-U, sites of the programmed +1 translational frameshifts required for the production in yeast of actin filament-binding protein ABP140 and telomerase subunit EST3, respectively, rank among the least represented of phase I heptanucleotides in the coding sequences of S. cerevisiae. Laboratory experiments demonstrated that other underrepresented heptanucleotides identified by the program, for example GGU-CAG-A, are also prone to significant translational frameshifting, suggesting the possibility that genes containing other underrepresented heptamers may also encode transframe products. Availability: The program is available for download from http://www.gesteland.genetics.utah.edu/freqAnalysis Contact: ivaylo.ivanov@m.cc.utah.edu Supplementary Information: Complete results from the analysis of S. cerevisiae are available on http://www.gesteland.genetics.utah.edu/freqAnalysis 10.1093/bioinformatics/18.8.1046</description>
    <dc:title>Computational identification of putative programmed translational frameshift sites</dc:title>

    <dc:creator>Atul Shah</dc:creator>
    <dc:creator>Michael Giddings</dc:creator>
    <dc:creator>Jasmin Parvaz</dc:creator>
    <dc:creator>Raymond Gesteland</dc:creator>
    <dc:creator>John Atkins</dc:creator>
    <dc:creator>Ivaylo Ivanov</dc:creator>
    <dc:identifier>doi:10.1093/bioinformatics/18.8.1046</dc:identifier>
    <dc:source>Bioinformatics, Vol. 18, No. 8. (1 August 2002), pp. 1046-1053.</dc:source>
    <dc:date>2008-05-11T14:04:53-00:00</dc:date>
    <prism:publicationYear>2002</prism:publicationYear>
    <prism:publicationName>Bioinformatics</prism:publicationName>
    <prism:volume>18</prism:volume>
    <prism:number>8</prism:number>
    <prism:startingPage>1046</prism:startingPage>
    <prism:endingPage>1053</prism:endingPage>
    <prism:category>frameshift</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2048011">
    <title>Heuristic approach to deriving models for gene finding.</title>
    <link>http://www.citeulike.org/user/azazello/article/2048011</link>
    <description>&lt;i&gt;Nucleic Acids Res, Vol. 27, No. 19. (1 October 1999), pp. 3911-3920.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Computer methods of accurate gene finding in DNA sequences require models of protein coding and non-coding regions derived either from experimentally validated training sets or from large amounts of anonymous DNA sequence. Here we propose a new, heuristic method producing fairly accurate inhomogeneous Markov models of protein coding regions. The new method needs such a small amount of DNA sequence data that the model can be built 'on the fly' by a web server for any DNA sequence &#62;400 nt. Tests on 10 complete bacterial genomes performed with the GeneMark.hmm program demonstrated the ability of the new models to detect 93.1% of annotated genes on average, while models built by traditional training predict an average of 93.9% of genes. Models built by the heuristic approach could be used to find genes in small fragments of anonymous prokaryotic genomes and in genomes of organelles, viruses, phages and plasmids, as well as in highly inhomogeneous genomes where adjustment of models to local DNA composition is needed. The heuristic method also gives an insight into the mechanism of codon usage pattern evolution.</description>
    <dc:title>Heuristic approach to deriving models for gene finding.</dc:title>

    <dc:creator>J Besemer</dc:creator>
    <dc:creator>M Borodovsky</dc:creator>
    <dc:source>Nucleic Acids Res, Vol. 27, No. 19. (1 October 1999), pp. 3911-3920.</dc:source>
    <dc:date>2007-12-03T07:19:08-00:00</dc:date>
    <prism:publicationYear>1999</prism:publicationYear>
    <prism:publicationName>Nucleic Acids Res</prism:publicationName>
    <prism:issn>1362-4962</prism:issn>
    <prism:volume>27</prism:volume>
    <prism:number>19</prism:number>
    <prism:startingPage>3911</prism:startingPage>
    <prism:endingPage>3920</prism:endingPage>
    <prism:category>no-tag</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/79177">
    <title>An Eulerian path approach to DNA fragment assembly.</title>
    <link>http://www.citeulike.org/user/azazello/article/79177</link>
    <description>&lt;i&gt;Proc Natl Acad Sci U S A, Vol. 98, No. 17. (14 August 2001), pp. 9748-9753.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;For the last 20 years, fragment assembly in DNA sequencing followed the &#34;overlap-layout-consensus&#34; paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical &#34;overlap-layout-consensus&#34; approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old &#34;repeat problem&#34; in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. euler, in contrast to the celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.</description>
    <dc:title>An Eulerian path approach to DNA fragment assembly.</dc:title>

    <dc:creator>PA Pevzner</dc:creator>
    <dc:creator>H Tang</dc:creator>
    <dc:creator>MS Waterman</dc:creator>
    <dc:identifier>doi:10.1073/pnas.171285098</dc:identifier>
    <dc:source>Proc Natl Acad Sci U S A, Vol. 98, No. 17. (14 August 2001), pp. 9748-9753.</dc:source>
    <dc:date>2005-01-17T22:02:54-00:00</dc:date>
    <prism:publicationYear>2001</prism:publicationYear>
    <prism:publicationName>Proc Natl Acad Sci U S A</prism:publicationName>
    <prism:issn>0027-8424</prism:issn>
    <prism:volume>98</prism:volume>
    <prism:number>17</prism:number>
    <prism:startingPage>9748</prism:startingPage>
    <prism:endingPage>9753</prism:endingPage>
    <prism:category>cse</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2730053">
    <title>A new approach to fragment assembly in DNA sequencing</title>
    <link>http://www.citeulike.org/user/azazello/article/2730053</link>
    <description>&lt;i&gt;(2001), pp. 256-267.&lt;/i&gt;</description>
    <dc:title>A new approach to fragment assembly in DNA sequencing</dc:title>

    <dc:creator>Pavel Pevzner</dc:creator>
    <dc:creator>Haixu Tang</dc:creator>
    <dc:creator>Michael Waterman</dc:creator>
    <dc:identifier>doi:10.1145/369133.369230</dc:identifier>
    <dc:source>(2001), pp. 256-267.</dc:source>
    <dc:date>2008-04-28T13:55:44-00:00</dc:date>
    <prism:publicationYear>2001</prism:publicationYear>
    <prism:startingPage>256</prism:startingPage>
    <prism:endingPage>267</prism:endingPage>
    <prism:publisher>ACM</prism:publisher>
    <prism:category>cse</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/975256">
    <title>Whole-genome re-sequencing</title>
    <link>http://www.citeulike.org/user/azazello/article/975256</link>
    <description>&lt;i&gt;Current Opinion in Genetics &#38; Development, Vol. 16, No. 6. (December 2006), pp. 545-552.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;DNA sequencing can be used to gain important information on genes, genetic variation and gene function for biological and medical studies. The growing collection of publicly available reference genome sequences will underpin a new era of whole genome re-sequencing, but sequencing costs need to fall and throughput needs to rise by several orders of magnitude. Novel technologies are being developed to meet this need by generating massive amounts of sequence that can be aligned to the reference sequence. The challenge is to maintain the high standards of accuracy and completeness that are hallmarks of the previous genome projects. One or more new sequencing technologies are expected to become the mainstay of future research, and to make DNA sequencing centre stage as a routine tool in genetic research in the coming years.</description>
    <dc:title>Whole-genome re-sequencing</dc:title>

    <dc:creator>David Bentley</dc:creator>
    <dc:identifier>doi:10.1016/j.gde.2006.10.009</dc:identifier>
    <dc:source>Current Opinion in Genetics &#38; Development, Vol. 16, No. 6. (December 2006), pp. 545-552.</dc:source>
    <dc:date>2006-12-05T12:46:36-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Current Opinion in Genetics &#38; Development</prism:publicationName>
    <prism:volume>16</prism:volume>
    <prism:number>6</prism:number>
    <prism:startingPage>545</prism:startingPage>
    <prism:endingPage>552</prism:endingPage>
    <prism:category>no-tag</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1203362">
    <title>ICDS database: interrupted CoDing sequences in prokaryotic genomes.</title>
    <link>http://www.citeulike.org/user/azazello/article/1203362</link>
    <description>&lt;i&gt;Nucleic Acids Res, Vol. 34, No. Database issue. (1 January 2006)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.</description>
    <dc:title>ICDS database: interrupted CoDing sequences in prokaryotic genomes.</dc:title>

    <dc:creator>E Perrodou</dc:creator>
    <dc:creator>C Deshayes</dc:creator>
    <dc:creator>J Muller</dc:creator>
    <dc:creator>C Schaeffer</dc:creator>
    <dc:creator>A Van Dorsselaer</dc:creator>
    <dc:creator>R Ripp</dc:creator>
    <dc:creator>O Poch</dc:creator>
    <dc:creator>JM Reyrat</dc:creator>
    <dc:creator>O Lecompte</dc:creator>
    <dc:source>Nucleic Acids Res, Vol. 34, No. Database issue. (1 January 2006)</dc:source>
    <dc:date>2007-04-02T20:47:01-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Nucleic Acids Res</prism:publicationName>
    <prism:issn>1362-4962</prism:issn>
    <prism:volume>34</prism:volume>
    <prism:number>Database issue</prism:number>
    <prism:category>frameshift</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2343216">
    <title>Metagenomics: Read length matters</title>
    <link>http://www.citeulike.org/user/azazello/article/2343216</link>
    <description>&lt;i&gt;Appl. Environ. Microbiol. (11 January 2008), AEM.02181-07.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Obtaining an unbiased view of the phylogenetic composition and functional diversity within a microbial community is one central objective of metagenomic analysis. New technologies, such as 454 pyrosequencing, have dramatically reduced sequencing costs to a level where metagenomic analysis may become a viable alternative to more focused assessments of the phylogenetic (eg. 16S rDNA) and functional diversity of microbial communities. To determine whether the short ([~]100-200 bp) sequence reads obtained from pyrosequencing are appropriate for phylogenetic and functional characterization of microbial communities the results of BLAST and COG analyses were compared for long ([~]750 bp) and randomly derived short reads from each of two microbial and one virioplankton metagenome libraries. Overall, BLASTX searches against GenBank nr found far fewer homologs within the short sequence library. This was especially pronounced for a Chesapeake Bay virioplankton metagenome library. Increasing the short read sampling depth or the length of derived short reads (up to 400 bp) did not completely resolve the discrepancy in BLASTX homolog detection. Only in cases where the long read sequence had a close homolog (low BLAST E-score) did the derived short read sequence also find a significant homolog. Thus, more distant homologs of microbial and viral genes are not detected by short read sequences. Among COG hits, derived short reads sampled at a depth of two short reads per long read missed up to 72% of the COGs found using long reads. Noting the current limitation in computational approaches for analysis of short sequences, use of short read length libraries does not appear to be an appropriate tool for metagenomic characterization of microbial communities. 10.1128/AEM.02181-07</description>
    <dc:title>Metagenomics: Read length matters</dc:title>

    <dc:creator>Eric Wommack</dc:creator>
    <dc:creator>Jaysheel Bhavsar</dc:creator>
    <dc:creator>Jacques Ravel</dc:creator>
    <dc:identifier>doi:10.1128/AEM.02181-07</dc:identifier>
    <dc:source>Appl. Environ. Microbiol. (11 January 2008), AEM.02181-07.</dc:source>
    <dc:date>2008-02-06T19:53:32-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Appl. Environ. Microbiol.</prism:publicationName>
    <prism:startingPage>AEM.02181-07</prism:startingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/392420">
    <title>Mauve: multiple alignment of conserved genomic sequence with rearrangements.</title>
    <link>http://www.citeulike.org/user/azazello/article/392420</link>
    <description>&lt;i&gt;Genome Res, Vol. 14, No. 7. (July 2004), pp. 1394-1403.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacterial chromosomes, and deletions remove segments of the genome. Consequently, each genome is a mosaic of unique lineage-specific segments, regions shared with a subset of other genomes and segments conserved among all the genomes under consideration. Furthermore, the linear order of these segments may be shuffled among genomes. We present methods for identification and alignment of conserved genomic DNA in the presence of rearrangements and horizontal transfer. Our methods have been implemented in a software package called Mauve. Mauve has been applied to align nine enterobacterial genomes and to determine global rearrangement structure in three mammalian genomes. We have evaluated the quality of Mauve alignments and drawn comparison to other methods through extensive simulations of genome evolution.</description>
    <dc:title>Mauve: multiple alignment of conserved genomic sequence with rearrangements.</dc:title>

    <dc:creator>AC Darling</dc:creator>
    <dc:creator>B Mau</dc:creator>
    <dc:creator>FR Blattner</dc:creator>
    <dc:creator>NT Perna</dc:creator>
    <dc:identifier>doi:10.1101/gr.2289704</dc:identifier>
    <dc:source>Genome Res, Vol. 14, No. 7. (July 2004), pp. 1394-1403.</dc:source>
    <dc:date>2005-11-14T16:09:29-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:volume>14</prism:volume>
    <prism:number>7</prism:number>
    <prism:startingPage>1394</prism:startingPage>
    <prism:endingPage>1403</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/131325">
    <title>Environmental Genome Shotgun Sequencing of the Sargasso Sea</title>
    <link>http://www.citeulike.org/user/azazello/article/131325</link>
    <description>&lt;i&gt;Science, Vol. 304, No. 5667. (02 April 2004), pp. 66-74.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;We have applied &#34;whole-genome shotgun sequencing&#34; to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors. Variation in species present and stoichiometry suggests substantial oceanic microbial diversity.</description>
    <dc:title>Environmental Genome Shotgun Sequencing of the Sargasso Sea</dc:title>

    <dc:creator>Craig Venter</dc:creator>
    <dc:creator>Karin Remington</dc:creator>
    <dc:creator>John Heidelberg</dc:creator>
    <dc:creator>Aaron Halpern</dc:creator>
    <dc:creator>Doug Rusch</dc:creator>
    <dc:creator>Jonathan Eisen</dc:creator>
    <dc:creator>Dongying Wu</dc:creator>
    <dc:creator>Ian Paulsen</dc:creator>
    <dc:creator>Karen Nelson</dc:creator>
    <dc:creator>William Nelson</dc:creator>
    <dc:creator>Derrick Fouts</dc:creator>
    <dc:creator>Samuel Levy</dc:creator>
    <dc:creator>Anthony Knap</dc:creator>
    <dc:creator>Michael Lomas</dc:creator>
    <dc:creator>Ken Nealson</dc:creator>
    <dc:creator>Owen White</dc:creator>
    <dc:creator>Jeremy Peterson</dc:creator>
    <dc:creator>Jeff Hoffman</dc:creator>
    <dc:creator>Rachel Parsons</dc:creator>
    <dc:creator>Holly Baden-Tillson</dc:creator>
    <dc:creator>Cynthia Pfannkoch</dc:creator>
    <dc:creator>Yu-Hui Rogers</dc:creator>
    <dc:creator>Hamilton Smith</dc:creator>
    <dc:identifier>doi:10.1126/science.1093857</dc:identifier>
    <dc:source>Science, Vol. 304, No. 5667. (02 April 2004), pp. 66-74.</dc:source>
    <dc:date>2005-03-17T15:09:06-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:publicationName>Science</prism:publicationName>
    <prism:volume>304</prism:volume>
    <prism:number>5667</prism:number>
    <prism:startingPage>66</prism:startingPage>
    <prism:endingPage>74</prism:endingPage>
    <prism:category>no-tag</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2585828">
    <title>Toward a census of bacteria in soil.</title>
    <link>http://www.citeulike.org/user/azazello/article/2585828</link>
    <description>&lt;i&gt;PLoS Comput Biol, Vol. 2, No. 7. (21 July 2006)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;For more than a century, microbiologists have sought to determine the species richness of bacteria in soil, but the extreme complexity and unknown structure of soil microbial communities have obscured the answer. We developed a statistical model that makes the problem of estimating richness statistically accessible by evaluating the characteristics of samples drawn from simulated communities with parametric community distributions. We identified simulated communities with rank-abundance distributions that followed a truncated lognormal distribution whose samples resembled the structure of 16S rRNA gene sequence collections made using Alaskan and Minnesotan soils. The simulated communities constructed based on the distribution of 16S rRNA gene sequences sampled from the Alaskan and Minnesotan soils had a richness of 5,000 and 2,000 operational taxonomic units (OTUs), respectively, where an OTU represents a collection of sequences not more than 3% distant from each other. To sample each of these OTUs in the Alaskan 16S rRNA gene library at least twice, 480,000 sequences would be required; however, to estimate the richness of the simulated communities using nonparametric richness estimators would require only 18,000 sequences. Quantifying the richness of complex environments such as soil is an important step in building an ecological framework. We have shown that generating sufficient sequence data to do so requires less sequencing effort than completely sequencing a bacterial genome.</description>
    <dc:title>Toward a census of bacteria in soil.</dc:title>

    <dc:creator>PD Schloss</dc:creator>
    <dc:creator>J Handelsman</dc:creator>
    <dc:identifier>doi:10.1371/journal.pcbi.0020092</dc:identifier>
    <dc:source>PLoS Comput Biol, Vol. 2, No. 7. (21 July 2006)</dc:source>
    <dc:date>2008-03-25T13:50:10-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>PLoS Comput Biol</prism:publicationName>
    <prism:issn>1553-7358</prism:issn>
    <prism:volume>2</prism:volume>
    <prism:number>7</prism:number>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/681624">
    <title>Metagenomic Analysis of the Human Distal Gut Microbiome</title>
    <link>http://www.citeulike.org/user/azazello/article/681624</link>
    <description>&lt;i&gt;Science, Vol. 312, No. 5778. (2 June 2006), pp. 1355-1359.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The human intestinal microbiota is composed of 1013 to 1014 microorganisms whose collective genome (&#34;microbiome&#34;) contains at least 100 times as many genes as our own genome. We analyzed [~]78 million base pairs of unique DNA sequence and 2062 polymerase chain reaction-amplified 16S ribosomal DNA sequences obtained from the fecal DNAs of two healthy adults. Using metabolic function analyses of identified genes, we compared our human genome with the average content of previously sequenced microbial genomes. Our microbiome has significantly enriched metabolism of glycans, amino acids, and xenobiotics; methanogenesis; and 2-methyl-D-erythritol 4-phosphate pathway-mediated biosynthesis of vitamins and isoprenoids. Thus, humans are superorganisms whose metabolism represents an amalgamation of microbial and human attributes. 10.1126/science.1124234</description>
    <dc:title>Metagenomic Analysis of the Human Distal Gut Microbiome</dc:title>

    <dc:creator>Steven Gill</dc:creator>
    <dc:creator>Mihai Pop</dc:creator>
    <dc:creator>Robert Deboy</dc:creator>
    <dc:creator>Paul Eckburg</dc:creator>
    <dc:creator>Peter Turnbaugh</dc:creator>
    <dc:creator>Buck Samuel</dc:creator>
    <dc:creator>Jeffrey Gordon</dc:creator>
    <dc:creator>David Relman</dc:creator>
    <dc:creator>Claire Fraser-Liggett</dc:creator>
    <dc:creator>Karen Nelson</dc:creator>
    <dc:identifier>doi:10.1126/science.1124234</dc:identifier>
    <dc:source>Science, Vol. 312, No. 5778. (2 June 2006), pp. 1355-1359.</dc:source>
    <dc:date>2006-06-02T14:00:57-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Science</prism:publicationName>
    <prism:volume>312</prism:volume>
    <prism:number>5778</prism:number>
    <prism:startingPage>1355</prism:startingPage>
    <prism:endingPage>1359</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/700422">
    <title>Comparative Metagenomics of Microbial Communities</title>
    <link>http://www.citeulike.org/user/azazello/article/700422</link>
    <description>&lt;i&gt;Science, Vol. 308, No. 5721. (22 April 2005), pp. 554-557.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The species complexity of microbial communities and challenges in culturing representative isolates make it difficult to obtain assembled genomes. Here we characterize and compare the metabolic capabilities of terrestrial and marine microbial communities using largely unassembled sequence data obtained by shotgun sequencing DNA isolated from the various environments. Quantitative gene content analysis reveals habitat-specific fingerprints that reflect known characteristics of the sampled environments. The identification of environment-specific genes through a gene-centric comparative analysis presents new opportunities for interpreting and diagnosing environments. 10.1126/science.1107851</description>
    <dc:title>Comparative Metagenomics of Microbial Communities</dc:title>

    <dc:creator>Susannah Tringe</dc:creator>
    <dc:creator>Christian von Mering</dc:creator>
    <dc:creator>Arthur Kobayashi</dc:creator>
    <dc:creator>Asaf Salamov</dc:creator>
    <dc:creator>Kevin Chen</dc:creator>
    <dc:creator>Hwai Chang</dc:creator>
    <dc:creator>Mircea Podar</dc:creator>
    <dc:creator>Jay Short</dc:creator>
    <dc:creator>Eric Mathur</dc:creator>
    <dc:creator>John Detter</dc:creator>
    <dc:creator>Peer Bork</dc:creator>
    <dc:creator>Philip Hugenholtz</dc:creator>
    <dc:creator>Edward Rubin</dc:creator>
    <dc:identifier>doi:10.1126/science.1107851</dc:identifier>
    <dc:source>Science, Vol. 308, No. 5721. (22 April 2005), pp. 554-557.</dc:source>
    <dc:date>2006-06-19T04:39:47-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>Science</prism:publicationName>
    <prism:volume>308</prism:volume>
    <prism:number>5721</prism:number>
    <prism:startingPage>554</prism:startingPage>
    <prism:endingPage>557</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1167473">
    <title>Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes</title>
    <link>http://www.citeulike.org/user/azazello/article/1167473</link>
    <description>&lt;i&gt;PLoS Biology, Vol. 5, No. 3. (1 March 2007), e82.&lt;/i&gt;</description>
    <dc:title>Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes</dc:title>

    <dc:creator>Jonathan Eisen</dc:creator>
    <dc:identifier>doi:10.1371/journal.pbio.0050082</dc:identifier>
    <dc:source>PLoS Biology, Vol. 5, No. 3. (1 March 2007), e82.</dc:source>
    <dc:date>2007-03-16T09:31:59-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>PLoS Biology</prism:publicationName>
    <prism:volume>5</prism:volume>
    <prism:number>3</prism:number>
    <prism:startingPage>e82</prism:startingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2514966">
    <title>Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis.</title>
    <link>http://www.citeulike.org/user/azazello/article/2514966</link>
    <description>&lt;i&gt;Proc Natl Acad Sci U S A, Vol. 105, No. 9. (4 March 2008), pp. 3473-3478.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Neisseria meningitidis is a leading cause of infectious childhood mortality worldwide. Most research efforts have hitherto focused on disease isolates belonging to only a few hypervirulent clonal lineages. However, up to 10% of the healthy human population is temporarily colonized by genetically diverse strains mostly with little or no pathogenic potential. Currently, little is known about the biology of carriage strains and their evolutionary relationship with disease isolates. The expression of a polysaccharide capsule is the only trait that has been convincingly linked to the pathogenic potential of N. meningitidis. To gain insight into the evolution of virulence traits in this species, whole-genome sequences of three meningococcal carriage isolates were obtained. Gene content comparisons with the available genome sequences from three disease isolates indicate that there is no core pathogenome in N. meningitidis. A comparison of the chromosome structure suggests that a filamentous prophage has mediated large chromosomal rearrangements and the translocation of some candidate virulence genes. Interspecific comparison of the available Neisseria genome sequences and dot blot hybridizations further indicate that the insertion sequence IS1655 is restricted only to N. meningitidis; its low sequence diversity is an indicator of an evolutionarily recent population bottleneck. A genome-based phylogenetic reconstruction provides evidence that N. meningitidis has emerged as an unencapsulated human commensal from a common ancestor with Neisseria gonorrhoeae and Neisseria lactamica and consecutively acquired the genes responsible for capsule synthesis via horizontal gene transfer.</description>
    <dc:title>Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis.</dc:title>

    <dc:creator>C Schoen</dc:creator>
    <dc:creator>J Blom</dc:creator>
    <dc:creator>H Claus</dc:creator>
    <dc:creator>A Schramm-Glück</dc:creator>
    <dc:creator>P Brandt</dc:creator>
    <dc:creator>T Müller</dc:creator>
    <dc:creator>A Goesmann</dc:creator>
    <dc:creator>B Joseph</dc:creator>
    <dc:creator>S Konietzny</dc:creator>
    <dc:creator>O Kurzai</dc:creator>
    <dc:creator>C Schmitt</dc:creator>
    <dc:creator>T Friedrich</dc:creator>
    <dc:creator>B Linke</dc:creator>
    <dc:creator>U Vogel</dc:creator>
    <dc:creator>M Frosch</dc:creator>
    <dc:identifier>doi:10.1073/pnas.0800151105</dc:identifier>
    <dc:source>Proc Natl Acad Sci U S A, Vol. 105, No. 9. (4 March 2008), pp. 3473-3478.</dc:source>
    <dc:date>2008-03-11T13:48:44-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Proc Natl Acad Sci U S A</prism:publicationName>
    <prism:issn>1091-6490</prism:issn>
    <prism:volume>105</prism:volume>
    <prism:number>9</prism:number>
    <prism:startingPage>3473</prism:startingPage>
    <prism:endingPage>3478</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1963651">
    <title>Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite</title>
    <link>http://www.citeulike.org/user/azazello/article/1963651</link>
    <description>&lt;i&gt;Nature, Vol. 450, No. 7169., pp. 560-565.&lt;/i&gt;</description>
    <dc:title>Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite</dc:title>

    <dc:creator>Falk Warnecke</dc:creator>
    <dc:creator>Peter Luginbühl</dc:creator>
    <dc:creator>Natalia Ivanova</dc:creator>
    <dc:creator>Majid Ghassemian</dc:creator>
    <dc:creator>Toby Richardson</dc:creator>
    <dc:creator>Justin Stege</dc:creator>
    <dc:creator>Michelle Cayouette</dc:creator>
    <dc:creator>Alice Mchardy</dc:creator>
    <dc:creator>Gordana Djordjevic</dc:creator>
    <dc:creator>Nahla Aboushadi</dc:creator>
    <dc:creator>Rotem Sorek</dc:creator>
    <dc:creator>Susannah Tringe</dc:creator>
    <dc:creator>Mircea Podar</dc:creator>
    <dc:creator>Hector Martin</dc:creator>
    <dc:creator>Victor Kunin</dc:creator>
    <dc:creator>Daniel Dalevi</dc:creator>
    <dc:creator>Julita Madejska</dc:creator>
    <dc:creator>Edward Kirton</dc:creator>
    <dc:creator>Darren Platt</dc:creator>
    <dc:creator>Ernest Szeto</dc:creator>
    <dc:creator>Asaf Salamov</dc:creator>
    <dc:creator>Kerrie Barry</dc:creator>
    <dc:creator>Natalia Mikhailova</dc:creator>
    <dc:creator>Nikos Kyrpides</dc:creator>
    <dc:creator>Eric Matson</dc:creator>
    <dc:creator>Elizabeth Ottesen</dc:creator>
    <dc:creator>Xinning Zhang</dc:creator>
    <dc:creator>Myriam Hernández</dc:creator>
    <dc:creator>Catalina Murillo</dc:creator>
    <dc:creator>Luis Acosta</dc:creator>
    <dc:creator>Isidore Rigoutsos</dc:creator>
    <dc:creator>Giselle Tamayo</dc:creator>
    <dc:creator>Brian Green</dc:creator>
    <dc:creator>Cathy Chang</dc:creator>
    <dc:creator>Edward Rubin</dc:creator>
    <dc:creator>Eric Mathur</dc:creator>
    <dc:creator>Dan Robertson</dc:creator>
    <dc:creator>Philip Hugenholtz</dc:creator>
    <dc:creator>Jared Leadbetter</dc:creator>
    <dc:identifier>doi:10.1038/nature06269</dc:identifier>
    <dc:source>Nature, Vol. 450, No. 7169., pp. 560-565.</dc:source>
    <dc:date>2007-11-23T05:36:59-00:00</dc:date>
    <prism:publicationName>Nature</prism:publicationName>
    <prism:issn>0028-0836</prism:issn>
    <prism:volume>450</prism:volume>
    <prism:number>7169</prism:number>
    <prism:startingPage>560</prism:startingPage>
    <prism:endingPage>565</prism:endingPage>
    <prism:publisher>Nature Publishing Group</prism:publisher>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1303460">
    <title>Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.</title>
    <link>http://www.citeulike.org/user/azazello/article/1303460</link>
    <description>&lt;i&gt;Nat Methods (29 April 2007)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.</description>
    <dc:title>Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.</dc:title>

    <dc:creator>Konstantinos Mavromatis</dc:creator>
    <dc:creator>Natalia Ivanova</dc:creator>
    <dc:creator>Kerrie Barry</dc:creator>
    <dc:creator>Harris Shapiro</dc:creator>
    <dc:creator>Eugene Goltsman</dc:creator>
    <dc:creator>Alice C McHardy</dc:creator>
    <dc:creator>Isidore Rigoutsos</dc:creator>
    <dc:creator>Asaf Salamov</dc:creator>
    <dc:creator>Frank Korzeniewski</dc:creator>
    <dc:creator>Miriam Land</dc:creator>
    <dc:creator>Alla Lapidus</dc:creator>
    <dc:creator>Igor Grigoriev</dc:creator>
    <dc:creator>Paul Richardson</dc:creator>
    <dc:creator>Philip Hugenholtz</dc:creator>
    <dc:creator>Nikos C Kyrpides</dc:creator>
    <dc:identifier>doi:10.1038/nmeth1043</dc:identifier>
    <dc:source>Nat Methods (29 April 2007)</dc:source>
    <dc:date>2007-05-17T17:08:06-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Nat Methods</prism:publicationName>
    <prism:issn>1548-7091</prism:issn>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2242249">
    <title>The greedy path-merging algorithm for contig scaffolding</title>
    <link>http://www.citeulike.org/user/azazello/article/2242249</link>
    <description>&lt;i&gt;J. ACM, Vol. 49, No. 5. (September 2002), pp. 603-615.&lt;/i&gt;</description>
    <dc:title>The greedy path-merging algorithm for contig scaffolding</dc:title>

    <dc:creator>Daniel Huson</dc:creator>
    <dc:creator>Knut Reinert</dc:creator>
    <dc:creator>Eugene Myers</dc:creator>
    <dc:identifier>doi:10.1145/585265.585267</dc:identifier>
    <dc:source>J. ACM, Vol. 49, No. 5. (September 2002), pp. 603-615.</dc:source>
    <dc:date>2008-01-17T01:41:14-00:00</dc:date>
    <prism:publicationYear>2002</prism:publicationYear>
    <prism:publicationName>J. ACM</prism:publicationName>
    <prism:issn>0004-5411</prism:issn>
    <prism:volume>49</prism:volume>
    <prism:number>5</prism:number>
    <prism:startingPage>603</prism:startingPage>
    <prism:endingPage>615</prism:endingPage>
    <prism:publisher>ACM</prism:publisher>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/270463">
    <title>Genome sequencing in microfabricated high-density picolitre reactors</title>
    <link>http://www.citeulike.org/user/azazello/article/270463</link>
    <description>&lt;i&gt;Nature (31 July 2005)&lt;/i&gt;</description>
    <dc:title>Genome sequencing in microfabricated high-density picolitre reactors</dc:title>

    <dc:creator>Marcel Margulies</dc:creator>
    <dc:creator>Michael Egholm</dc:creator>
    <dc:creator>William Altman</dc:creator>
    <dc:creator>Said Attiya</dc:creator>
    <dc:creator>Joel Bader</dc:creator>
    <dc:creator>Lisa Bemben</dc:creator>
    <dc:creator>Jan Berka</dc:creator>
    <dc:creator>Michael Braverman</dc:creator>
    <dc:creator>Yi-Ju Chen</dc:creator>
    <dc:creator>Zhoutao Chen</dc:creator>
    <dc:creator>Scott Dewell</dc:creator>
    <dc:creator>Lei Du</dc:creator>
    <dc:creator>Joseph Fierro</dc:creator>
    <dc:creator>Xavier Gomes</dc:creator>
    <dc:creator>Brian Godwin</dc:creator>
    <dc:creator>Wen He</dc:creator>
    <dc:creator>Scott Helgesen</dc:creator>
    <dc:creator>Chun Ho</dc:creator>
    <dc:creator>Gerard Irzyk</dc:creator>
    <dc:creator>Szilveszter Jando</dc:creator>
    <dc:creator>Maria Alenquer</dc:creator>
    <dc:creator>Thomas Jarvie</dc:creator>
    <dc:creator>Kshama Jirage</dc:creator>
    <dc:creator>Jong-Bum Kim</dc:creator>
    <dc:creator>James Knight</dc:creator>
    <dc:creator>Janna Lanza</dc:creator>
    <dc:creator>John Leamon</dc:creator>
    <dc:creator>Steven Lefkowitz</dc:creator>
    <dc:creator>Ming Lei</dc:creator>
    <dc:creator>Jing Li</dc:creator>
    <dc:creator>Kenton Lohman</dc:creator>
    <dc:creator>Hong Lu</dc:creator>
    <dc:creator>Vinod Makhijani</dc:creator>
    <dc:creator>Keith Mcdade</dc:creator>
    <dc:creator>Michael Mckenna</dc:creator>
    <dc:creator>Eugene Myers</dc:creator>
    <dc:creator>Elizabeth Nickerson</dc:creator>
    <dc:creator>John Nobile</dc:creator>
    <dc:creator>Ramona Plant</dc:creator>
    <dc:creator>Bernard Puc</dc:creator>
    <dc:creator>Michael Ronan</dc:creator>
    <dc:creator>George Roth</dc:creator>
    <dc:creator>Gary Sarkis</dc:creator>
    <dc:creator>Jan Simons</dc:creator>
    <dc:creator>John Simpson</dc:creator>
    <dc:creator>Maithreyan Srinivasan</dc:creator>
    <dc:creator>Karrie Tartaro</dc:creator>
    <dc:creator>Alexander Tomasz</dc:creator>
    <dc:creator>Kari Vogt</dc:creator>
    <dc:creator>Greg Volkmer</dc:creator>
    <dc:creator>Shally Wang</dc:creator>
    <dc:creator>Yong Wang</dc:creator>
    <dc:creator>Michael Weiner</dc:creator>
    <dc:creator>Pengguang Yu</dc:creator>
    <dc:creator>Richard Begley</dc:creator>
    <dc:creator>Jonathan Rothberg</dc:creator>
    <dc:identifier>doi:10.1038/nature03959</dc:identifier>
    <dc:source>Nature (31 July 2005)</dc:source>
    <dc:date>2005-07-31T21:14:04-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>Nature</prism:publicationName>
    <prism:issn>0028-0836</prism:issn>
    <prism:publisher>Nature Publishing Group</prism:publisher>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2382364">
    <title>MEGAN analysis of metagenomic data.</title>
    <link>http://www.citeulike.org/user/azazello/article/2382364</link>
    <description>&lt;i&gt;Genome Res, Vol. 17, No. 3. (March 2007), pp. 377-386.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Metagenomics is the study of the genomic content of a sample of organisms obtained from a common habitat using targeted or random sequencing. Goals include understanding the extent and role of microbial diversity. The taxonomical content of such a sample is usually estimated by comparison against sequence databases of known sequences. Most published studies use the analysis of paired-end reads, complete sequences of environmental fosmid and BAC clones, or environmental assemblies. Emerging sequencing-by-synthesis technologies with very high throughput are paving the way to low-cost random &#34;shotgun&#34; approaches. This paper introduces MEGAN, a new computer program that allows laptop analysis of large metagenomic data sets. In a preprocessing step, the set of DNA sequences is compared against databases of known sequences using BLAST or another comparison tool. MEGAN is then used to compute and explore the taxonomical content of the data set, employing the NCBI taxonomy to summarize and order the results. A simple lowest common ancestor algorithm assigns reads to taxa such that the taxonomical level of the assigned taxon reflects the level of conservation of the sequence. The software allows large data sets to be dissected without the need for assembly or the targeting of specific phylogenetic markers. It provides graphical and statistical output for comparing different data sets. The approach is applied to several data sets, including the Sargasso Sea data set, a recently published metagenomic data set sampled from a mammoth bone, and several complete microbial genomes. Also, simulations that evaluate the performance of the approach for different read lengths are presented.</description>
    <dc:title>MEGAN analysis of metagenomic data.</dc:title>

    <dc:creator>DH Huson</dc:creator>
    <dc:creator>AF Auch</dc:creator>
    <dc:creator>J Qi</dc:creator>
    <dc:creator>SC Schuster</dc:creator>
    <dc:identifier>doi:10.1101/gr.5969107</dc:identifier>
    <dc:source>Genome Res, Vol. 17, No. 3. (March 2007), pp. 377-386.</dc:source>
    <dc:date>2008-02-14T19:21:34-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:volume>17</prism:volume>
    <prism:number>3</prism:number>
    <prism:startingPage>377</prism:startingPage>
    <prism:endingPage>386</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/891657">
    <title>Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities</title>
    <link>http://www.citeulike.org/user/azazello/article/891657</link>
    <description>&lt;i&gt;Nature Biotechnology, Vol. 24, No. 10. (24 September 2006), pp. 1263-1269.&lt;/i&gt;</description>
    <dc:title>Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities</dc:title>

    <dc:creator>Héctor Martín</dc:creator>
    <dc:creator>Natalia Ivanova</dc:creator>
    <dc:creator>Victor Kunin</dc:creator>
    <dc:creator>Falk Warnecke</dc:creator>
    <dc:creator>Kerrie Barry</dc:creator>
    <dc:creator>Alice Mchardy</dc:creator>
    <dc:creator>Christine Yeates</dc:creator>
    <dc:creator>Shaomei He</dc:creator>
    <dc:creator>Asaf Salamov</dc:creator>
    <dc:creator>Ernest Szeto</dc:creator>
    <dc:creator>Eileen Dalin</dc:creator>
    <dc:creator>Nik Putnam</dc:creator>
    <dc:creator>Harris Shapiro</dc:creator>
    <dc:creator>Jasmyn Pangilinan</dc:creator>
    <dc:creator>Isidore Rigoutsos</dc:creator>
    <dc:creator>Nikos Kyrpides</dc:creator>
    <dc:creator>Linda Blackall</dc:creator>
    <dc:creator>Katherine Mcmahon</dc:creator>
    <dc:creator>Philip Hugenholtz</dc:creator>
    <dc:identifier>doi:10.1038/nbt1247</dc:identifier>
    <dc:source>Nature Biotechnology, Vol. 24, No. 10. (24 September 2006), pp. 1263-1269.</dc:source>
    <dc:date>2006-10-10T18:53:33-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Nature Biotechnology</prism:publicationName>
    <prism:issn>1087-0156</prism:issn>
    <prism:volume>24</prism:volume>
    <prism:number>10</prism:number>
    <prism:startingPage>1263</prism:startingPage>
    <prism:endingPage>1269</prism:endingPage>
    <prism:publisher>Nature Publishing Group</prism:publisher>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2570446">
    <title>CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads</title>
    <link>http://www.citeulike.org/user/azazello/article/2570446</link>
    <description>&lt;i&gt;ArXiv e-prints, Vol. 708 (August 2007)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. It applies principal component analysis to project the data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We demonstrate the algorithm's accuracy on a variety of simulated data sets and on one metagenomic data set with known species assignments. CompostBin is a work in progress, with several refinements of the algorithm planned for the future.</description>
    <dc:title>CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads</dc:title>

    <dc:creator>S Chatterji</dc:creator>
    <dc:creator>I Yamazaki</dc:creator>
    <dc:creator>Z Bai</dc:creator>
    <dc:creator>J Eisen</dc:creator>
    <dc:source>ArXiv e-prints, Vol. 708 (August 2007)</dc:source>
    <dc:date>2008-03-21T17:56:08-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>ArXiv e-prints</prism:publicationName>
    <prism:volume>708</prism:volume>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1106952">
    <title>A general approach to single-nucleotide polymorphism discovery.</title>
    <link>http://www.citeulike.org/user/azazello/article/1106952</link>
    <description>&lt;i&gt;Nat Genet, Vol. 23, No. 4. (December 1999), pp. 452-456.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence as a template on which to layer often unmapped, fragmentary sequence data and to use base quality values to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.</description>
    <dc:title>A general approach to single-nucleotide polymorphism discovery.</dc:title>

    <dc:creator>GT Marth</dc:creator>
    <dc:creator>I Korf</dc:creator>
    <dc:creator>MD Yandell</dc:creator>
    <dc:creator>RT Yeh</dc:creator>
    <dc:creator>Z Gu</dc:creator>
    <dc:creator>H Zakeri</dc:creator>
    <dc:creator>NO Stitziel</dc:creator>
    <dc:creator>L Hillier</dc:creator>
    <dc:creator>PY Kwok</dc:creator>
    <dc:creator>WR Gish</dc:creator>
    <dc:identifier>doi:10.1038/70570</dc:identifier>
    <dc:source>Nat Genet, Vol. 23, No. 4. (December 1999), pp. 452-456.</dc:source>
    <dc:date>2007-02-14T15:38:23-00:00</dc:date>
    <prism:publicationYear>1999</prism:publicationYear>
    <prism:publicationName>Nat Genet</prism:publicationName>
    <prism:issn>1061-4036</prism:issn>
    <prism:volume>23</prism:volume>
    <prism:number>4</prism:number>
    <prism:startingPage>452</prism:startingPage>
    <prism:endingPage>456</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2324215">
    <title>Short read fragment assembly of bacterial genomes</title>
    <link>http://www.citeulike.org/user/azazello/article/2324215</link>
    <description>&lt;i&gt;Genome Res., Vol. 18, No. 2. (1 February 2008), pp. 324-330.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short reads produced by short read technologies. We present a new Eulerian assembler that generates nearly optimal short read assemblies of bacterial genomes and describe an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined. 10.1101/gr.7088808</description>
    <dc:title>Short read fragment assembly of bacterial genomes</dc:title>

    <dc:creator>Mark Chaisson</dc:creator>
    <dc:creator>Pavel Pevzner</dc:creator>
    <dc:identifier>doi:10.1101/gr.7088808</dc:identifier>
    <dc:source>Genome Res., Vol. 18, No. 2. (1 February 2008), pp. 324-330.</dc:source>
    <dc:date>2008-02-02T22:44:54-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Genome Res.</prism:publicationName>
    <prism:volume>18</prism:volume>
    <prism:number>2</prism:number>
    <prism:startingPage>324</prism:startingPage>
    <prism:endingPage>330</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/79831">
    <title>De novo repeat classification and fragment assembly.</title>
    <link>http://www.citeulike.org/user/azazello/article/79831</link>
    <description>&lt;i&gt;Genome Res, Vol. 14, No. 9. (September 2004), pp. 1786-1796.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Repetitive sequences make up a significant fraction of almost any genome, and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a new approach to repeat classification that represents all repeats in a genome as a mosaic of sub-repeats. Our key algorithmic idea also leads to new approaches to multiple alignment and fragment assembly. In particular, we show that our FragmentGluer assembler improves on Phrap and ARACHNE in assembly of BACs and bacterial genomes.</description>
    <dc:title>De novo repeat classification and fragment assembly.</dc:title>

    <dc:creator>PA Pevzner</dc:creator>
    <dc:creator>H Tang</dc:creator>
    <dc:creator>G Tesler</dc:creator>
    <dc:identifier>doi:10.1101/gr.2395204</dc:identifier>
    <dc:source>Genome Res, Vol. 14, No. 9. (September 2004), pp. 1786-1796.</dc:source>
    <dc:date>2005-01-18T20:02:34-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:volume>14</prism:volume>
    <prism:number>9</prism:number>
    <prism:startingPage>1786</prism:startingPage>
    <prism:endingPage>1796</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2242997">
    <title>Extending assembly of short DNA sequences to handle error.</title>
    <link>http://www.citeulike.org/user/azazello/article/2242997</link>
    <description>&lt;i&gt;Bioinformatics, Vol. 23, No. 21. (1 November 2007), pp. 2942-2944.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads ( approximately 30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error. AVAILABILITY: http://152.2.15.114/~labweb/VCAKE</description>
    <dc:title>Extending assembly of short DNA sequences to handle error.</dc:title>

    <dc:creator>WR Jeck</dc:creator>
    <dc:creator>JA Reinhardt</dc:creator>
    <dc:creator>DA Baltrus</dc:creator>
    <dc:creator>MT Hickenbotham</dc:creator>
    <dc:creator>V Magrini</dc:creator>
    <dc:creator>ER Mardis</dc:creator>
    <dc:creator>JL Dangl</dc:creator>
    <dc:creator>CD Jones</dc:creator>
    <dc:identifier>doi:10.1093/bioinformatics/btm451</dc:identifier>
    <dc:source>Bioinformatics, Vol. 23, No. 21. (1 November 2007), pp. 2942-2944.</dc:source>
    <dc:date>2008-01-17T05:36:23-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Bioinformatics</prism:publicationName>
    <prism:issn>1460-2059</prism:issn>
    <prism:volume>23</prism:volume>
    <prism:number>21</prism:number>
    <prism:startingPage>2942</prism:startingPage>
    <prism:endingPage>2944</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2563094">
    <title>The fragment assembly string graph.</title>
    <link>http://www.citeulike.org/user/azazello/article/2563094</link>
    <description>&lt;i&gt;Bioinformatics, Vol. 21 Suppl 2 (1 September 2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.</description>
    <dc:title>The fragment assembly string graph.</dc:title>

    <dc:creator>EW Myers</dc:creator>
    <dc:source>Bioinformatics, Vol. 21 Suppl 2 (1 September 2005)</dc:source>
    <dc:date>2008-03-19T17:02:58-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>Bioinformatics</prism:publicationName>
    <prism:issn>1460-2059</prism:issn>
    <prism:volume>21 Suppl 2</prism:volume>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2547951">
    <title>ALLPATHS: De novo assembly of whole-genome shotgun microreads.</title>
    <link>http://www.citeulike.org/user/azazello/article/2547951</link>
    <description>&lt;i&gt;Genome Res (13 March 2008)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun &#34;microreads.&#34; For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80x coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.</description>
    <dc:title>ALLPATHS: De novo assembly of whole-genome shotgun microreads.</dc:title>

    <dc:creator>Jonathan Butler</dc:creator>
    <dc:creator>Iain Maccallum</dc:creator>
    <dc:creator>Michael Kleber</dc:creator>
    <dc:creator>Ilya A Shlyakhter</dc:creator>
    <dc:creator>Matthew K Belmonte</dc:creator>
    <dc:creator>Eric S Lander</dc:creator>
    <dc:creator>Chad Nusbaum</dc:creator>
    <dc:creator>David B Jaffe</dc:creator>
    <dc:identifier>doi:10.1101/gr.7337908</dc:identifier>
    <dc:source>Genome Res (13 March 2008)</dc:source>
    <dc:date>2008-03-18T00:16:38-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1873399">
    <title>SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing.</title>
    <link>http://www.citeulike.org/user/azazello/article/1873399</link>
    <description>&lt;i&gt;Genome Res, Vol. 17, No. 11. (November 2007), pp. 1697-1706.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The latest revolution in the DNA sequencing field has been brought about by the development of automated sequencers that are capable of generating giga base pair data sets quickly and at low cost. Applications of such technologies seem to be limited to resequencing and transcript discovery, due to the shortness of the generated reads. In order to extend the fields of application to de novo sequencing, we developed the SHARCGS algorithm to assemble short-read (25-40-mer) data with high accuracy and speed. The efficiency of SHARCGS was tested on BAC inserts from three eukaryotic species, on two yeast chromosomes, and on two bacterial genomes (Haemophilus influenzae, Escherichia coli). We show that 30-mer-based BAC assemblies have N50 sizes &#62;20 kbp for Drosophila and Arabidopsis and &#62;4 kbp for human in simulations taking missing reads and wrong base calls into account. We assembled 949,974 contigs with length &#62;50 bp, and only one single contig could not be aligned error-free against the reference sequences. We generated 36-mer reads for the genome of Helicobacter acinonychis on the Illumina 1G sequencing instrument and assembled 937 contigs covering 98% of the genome with an N50 size of 3.7 kbp. With the exception of five contigs that differ in 1-4 positions relative to the reference sequence, all contigs matched the genome error-free. Thus, SHARCGS is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy.</description>
    <dc:title>SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing.</dc:title>

    <dc:creator>Juliane C Dohm</dc:creator>
    <dc:creator>Claudio Lottaz</dc:creator>
    <dc:creator>Tatiana Borodina</dc:creator>
    <dc:creator>Heinz Himmelbauer</dc:creator>
    <dc:identifier>doi:10.1101/gr.6435207</dc:identifier>
    <dc:source>Genome Res, Vol. 17, No. 11. (November 2007), pp. 1697-1706.</dc:source>
    <dc:date>2007-11-06T09:59:44-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:volume>17</prism:volume>
    <prism:number>11</prism:number>
    <prism:startingPage>1697</prism:startingPage>
    <prism:endingPage>1706</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1543434">
    <title>Whole-genome sequencing and assembly with high-throughput, short-read technologies.</title>
    <link>http://www.citeulike.org/user/azazello/article/1543434</link>
    <description>&lt;i&gt;PLoS ONE, Vol. 2 (2007)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.</description>
    <dc:title>Whole-genome sequencing and assembly with high-throughput, short-read technologies.</dc:title>

    <dc:creator>A Sundquist</dc:creator>
    <dc:creator>M Ronaghi</dc:creator>
    <dc:creator>H Tang</dc:creator>
    <dc:creator>P Pevzner</dc:creator>
    <dc:creator>S Batzoglou</dc:creator>
    <dc:identifier>doi:10.1371/journal.pone.0000484</dc:identifier>
    <dc:source>PLoS ONE, Vol. 2 (2007)</dc:source>
    <dc:date>2007-08-08T14:23:56-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>PLoS ONE</prism:publicationName>
    <prism:issn>1932-6203</prism:issn>
    <prism:volume>2</prism:volume>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2547927">
    <title>De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.</title>
    <link>http://www.citeulike.org/user/azazello/article/2547927</link>
    <description>&lt;i&gt;Genome Res (10 March 2008)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Novel high-throughput DNA sequencing technologies allow researchers to characterize a bacterial genome during a single experiment and at a moderate cost. However, the increase in sequencing throughput that is allowed by using such platforms is obtained at the expense of individual sequence read length, which must be assembled into longer contigs to be exploitable. This study focuses on the Illumina sequencing platform that produces millions of very short sequences that are 35 bases in length. We propose a de novo assembler software that is dedicated to process such data. Based on a classical overlap graph representation and on the detection of potentially spurious reads, our software generates a set of accurate contigs of several kilobases that cover most of the bacterial genome. The assembly results were validated by comparing datasets that were obtained experimentally for Staphylococcus aureus strain MW2 and Helicobacter acinonychis strain Sheeba with that of their published genomes acquired by conventional sequencing of 1.5 - 3.0 kb fragments. We also provide indications that the broad coverage achieved by high throughput sequencing might allow for the detection of clonal polymorphisms in the set of DNA molecules being sequenced.</description>
    <dc:title>De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.</dc:title>

    <dc:creator>David Hernandez</dc:creator>
    <dc:creator>Patrice Francois</dc:creator>
    <dc:creator>Laurent Farinelli</dc:creator>
    <dc:creator>Magne Osteras</dc:creator>
    <dc:creator>Jacques Schrenzel</dc:creator>
    <dc:identifier>doi:10.1101/gr.072033.107</dc:identifier>
    <dc:source>Genome Res (10 March 2008)</dc:source>
    <dc:date>2008-03-17T23:46:00-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1125851">
    <title>Assembling millions of short DNA sequences using SSAKE</title>
    <link>http://www.citeulike.org/user/azazello/article/1125851</link>
    <description>&lt;i&gt;Bioinformatics, Vol. 23, No. 4. (15 February 2007), pp. 500-501.&lt;/i&gt;</description>
    <dc:title>Assembling millions of short DNA sequences using SSAKE</dc:title>

    <dc:creator>Warren</dc:creator>
    <dc:creator>L Rene</dc:creator>
    <dc:creator>Sutton</dc:creator>
    <dc:creator>G Granger</dc:creator>
    <dc:creator>Jones</dc:creator>
    <dc:creator>JM Steven</dc:creator>
    <dc:creator>Holt</dc:creator>
    <dc:creator>A Robert</dc:creator>
    <dc:identifier>doi:10.1093/bioinformatics/btl629</dc:identifier>
    <dc:source>Bioinformatics, Vol. 23, No. 4. (15 February 2007), pp. 500-501.</dc:source>
    <dc:date>2007-02-27T07:55:18-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Bioinformatics</prism:publicationName>
    <prism:issn>1367-4803</prism:issn>
    <prism:volume>23</prism:volume>
    <prism:number>4</prism:number>
    <prism:startingPage>500</prism:startingPage>
    <prism:endingPage>501</prism:endingPage>
    <prism:publisher>Oxford University Press</prism:publisher>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1122449">
    <title>Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments</title>
    <link>http://www.citeulike.org/user/azazello/article/1122449</link>
    <description>&lt;i&gt;Science, Vol. 315, No. 5815. (23 February 2007), pp. 1126-1130.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The taxonomic composition of environmental communities is an important indicator of their ecology and function. We used a set of protein-coding marker genes, extracted from large-scale environmental shotgun sequencing data, to provide a more direct, quantitative, and accurate picture of community composition than that provided by traditional ribosomal RNA-based approaches depending on the polymerase chain reaction. Mapping marker genes from four diverse environmental data sets onto a reference species phylogeny shows that certain communities evolve faster than others. The method also enables determination of preferred habitats for entire microbial clades and provides evidence that such habitat preferences are often remarkably stable over time. 10.1126/science.1133420</description>
    <dc:title>Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments</dc:title>

    <dc:creator>C von Mering</dc:creator>
    <dc:creator>P Hugenholtz</dc:creator>
    <dc:creator>J Raes</dc:creator>
    <dc:creator>SG Tringe</dc:creator>
    <dc:creator>T Doerks</dc:creator>
    <dc:creator>LJ Jensen</dc:creator>
    <dc:creator>N Ward</dc:creator>
    <dc:creator>P Bork</dc:creator>
    <dc:identifier>doi:10.1126/science.1133420</dc:identifier>
    <dc:source>Science, Vol. 315, No. 5815. (23 February 2007), pp. 1126-1130.</dc:source>
    <dc:date>2007-02-26T09:46:57-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Science</prism:publicationName>
    <prism:volume>315</prism:volume>
    <prism:number>5815</prism:number>
    <prism:startingPage>1126</prism:startingPage>
    <prism:endingPage>1130</prism:endingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2311222">
    <title>Pyrobayes: an improved base caller for SNP discovery in pyrosequences</title>
    <link>http://www.citeulike.org/user/azazello/article/2311222</link>
    <description>&lt;i&gt;Nature Methods, Vol. 5, No. 2. (13 January 2008), pp. 179-181.&lt;/i&gt;</description>
    <dc:title>Pyrobayes: an improved base caller for SNP discovery in pyrosequences</dc:title>

    <dc:creator>Aaron Quinlan</dc:creator>
    <dc:creator>Donald Stewart</dc:creator>
    <dc:creator>Michael Strömberg</dc:creator>
    <dc:creator>Gábor Marth</dc:creator>
    <dc:identifier>doi:10.1038/nmeth.1172</dc:identifier>
    <dc:source>Nature Methods, Vol. 5, No. 2. (13 January 2008), pp. 179-181.</dc:source>
    <dc:date>2008-01-31T11:58:19-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Nature Methods</prism:publicationName>
    <prism:issn>1548-7091</prism:issn>
    <prism:volume>5</prism:volume>
    <prism:number>2</prism:number>
    <prism:startingPage>179</prism:startingPage>
    <prism:endingPage>181</prism:endingPage>
    <prism:publisher>Nature Publishing Group</prism:publisher>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/373647">
    <title>Estimating Continuous Distributions in Bayesian Classifiers</title>
    <link>http://www.citeulike.org/user/azazello/article/373647</link>
    <description>&lt;i&gt;pp. 338-345.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains,...</description>
    <dc:title>Estimating Continuous Distributions in Bayesian Classifiers</dc:title>

    <dc:creator>George John</dc:creator>
    <dc:creator>Pat Langley</dc:creator>
    <dc:source>pp. 338-345.</dc:source>
    <dc:date>2005-10-31T15:29:18-00:00</dc:date>
    <prism:startingPage>338</prism:startingPage>
    <prism:endingPage>345</prism:endingPage>
    <prism:category>cs</prism:category>
    <prism:category>frameshift</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1152377">
    <title>Hawkeye: a visual analytics tool for genome assemblies</title>
    <link>http://www.citeulike.org/user/azazello/article/1152377</link>
    <description>&lt;i&gt;Genome Biology, Vol. 8 (09 March 2007), R34.&lt;/i&gt;</description>
    <dc:title>Hawkeye: a visual analytics tool for genome assemblies</dc:title>

    <dc:creator>Michael Schatz</dc:creator>
    <dc:creator>Adam Phillippy</dc:creator>
    <dc:creator>Ben Shneiderman</dc:creator>
    <dc:creator>Steven Salzberg</dc:creator>
    <dc:identifier>doi:10.1186/gb-2007-8-3-r34</dc:identifier>
    <dc:source>Genome Biology, Vol. 8 (09 March 2007), R34.</dc:source>
    <dc:date>2007-03-10T01:45:11-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Genome Biology</prism:publicationName>
    <prism:issn>1465-6906</prism:issn>
    <prism:volume>8</prism:volume>
    <prism:startingPage>R34</prism:startingPage>
    <prism:category>qual</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/392364">
    <title>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.</title>
    <link>http://www.citeulike.org/user/azazello/article/392364</link>
    <description>&lt;i&gt;Nucleic Acids Res, Vol. 31, No. 1. (1 January 2003), pp. 365-370.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.</description>
    <dc:title>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.</dc:title>

    <dc:creator>B Boeckmann</dc:creator>
    <dc:creator>A Bairoch</dc:creator>
    <dc:creator>R Apweiler</dc:creator>
    <dc:creator>MC Blatter</dc:creator>
    <dc:creator>A Estreicher</dc:creator>
    <dc:creator>E Gasteiger</dc:creator>
    <dc:creator>MJ Martin</dc:creator>
    <dc:creator>K Michoud</dc:creator>
    <dc:creator>C O'Donovan</dc:creator>
    <dc:creator>I Phan</dc:creator>
    <dc:creator>S Pilbout</dc:creator>
    <dc:creator>M Schneider</dc:creator>
    <dc:identifier>doi:10.1093/nar/gkg095</dc:identifier>
    <dc:source>Nucleic Acids Res, Vol. 31, No. 1. (1 January 2003), pp. 365-370.</dc:source>
    <dc:date>2005-11-14T15:37:40-00:00</dc:date>
    <prism:publicationYear>2003</prism:publicationYear>
    <prism:publicationName>Nucleic Acids Res</prism:publicationName>
    <prism:issn>1362-4962</prism:issn>
    <prism:volume>31</prism:volume>
    <prism:number>1</prism:number>
    <prism:startingPage>365</prism:startingPage>
    <prism:endingPage>370</prism:endingPage>
    <prism:category>no-tag</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/340715">
    <title>Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)</title>
    <link>http://www.citeulike.org/user/azazello/article/340715</link>
    <description>&lt;i&gt;(08 June 2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. &#60;br&#62;&#60;br&#62;The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more.&#60;br&#62;&#60;br&#62;+ Authors, Ian Witten and Eibe Frank, recipients of the 2005 ACM SIGKDD Service Award.&#60;br&#62;+ Algorithmic methods at the heart of successful data miningincluding tried and true techniques as well as leading edge methods; &#60;br&#62;+ Performance improvement techniques that work by transforming the input or output; &#60;br&#62;+ Downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualizationin a new, interactive interface.</description>
    <dc:title>Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)</dc:title>

    <dc:creator>Ian Witten</dc:creator>
    <dc:creator>Eibe Frank</dc:creator>
    <dc:source>(08 June 2005)</dc:source>
    <dc:date>2005-10-04T14:35:45-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publisher>Morgan Kaufmann</prism:publisher>
    <prism:category>no-tag</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2291023">
    <title>Projector: automatic contig mapping for gap closure purposes.</title>
    <link>http://www.citeulike.org/user/azazello/article/2291023</link>
    <description>&lt;i&gt;Nucleic Acids Res, Vol. 31, No. 22. (15 November 2003)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Projector was designed for automatic positioning of contigs from an unfinished prokaryotic genome onto a template genome of a closely related strain or species. Projector mapped 84 contigs of Lactococcus lactis MG1363 (corresponding to 81% of the assembly nucleotides) against the genome of L.lactis IL1403. Ninety three percent of subsequent gap closure PCRs were successful. Moreover, a significant improvement in the N50 and N80 values (describing the assembly quality) was observed after the use of Projector. Because increasing numbers of bacterial genomes are being sequenced, Projector provides an efficient method to close a significant number of remaining gaps in the late stages of a genome sequencing project.</description>
    <dc:title>Projector: automatic contig mapping for gap closure purposes.</dc:title>

    <dc:creator>SA van Hijum</dc:creator>
    <dc:creator>AL Zomer</dc:creator>
    <dc:creator>OP Kuipers</dc:creator>
    <dc:creator>J Kok</dc:creator>
    <dc:source>Nucleic Acids Res, Vol. 31, No. 22. (15 November 2003)</dc:source>
    <dc:date>2008-01-25T17:13:10-00:00</dc:date>
    <prism:publicationYear>2003</prism:publicationYear>
    <prism:publicationName>Nucleic Acids Res</prism:publicationName>
    <prism:issn>1362-4962</prism:issn>
    <prism:volume>31</prism:volume>
    <prism:number>22</prism:number>
    <prism:category>scaffolding</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2291022">
    <title>Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies.</title>
    <link>http://www.citeulike.org/user/azazello/article/2291022</link>
    <description>&lt;i&gt;Nucleic Acids Res, Vol. 33, No. Web Server issue. (1 July 2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;With genome sequencing efforts increasing exponentially, valuable information accumulates on genomic content of the various organisms sequenced. Projector 2 uses (un)finished genomic sequences of an organism as a template to infer linkage information for a genome sequence assembly of a related organism being sequenced. The remaining gaps between contigs for which no linkage information is present can subsequently be closed with direct PCR strategies. Compared with other implementations, Projector 2 has several distinctive features: a user-friendly web interface, automatic removal of repetitive elements (repeat-masking) and automated primer design for gap-closure purposes. Moreover, when using multiple fragments of a template genome, primers for multiplex PCR strategies can also be designed. Primer design takes into account that, in many cases, contig ends contain unreliable DNA sequences and repetitive sequences. Closing the remaining gaps in prokaryotic genome sequence assemblies is thereby made very efficient and virtually effortless. We demonstrate that the use of single or multiple fragments of a template genome (i.e. unfinished genome sequences) in combination with repeat-masking results in mapping success rates close to 100%. The web interface is freely accessible at http://molgen.biol.rug.nl/websoftware/projector2.</description>
    <dc:title>Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies.</dc:title>

    <dc:creator>SA van Hijum</dc:creator>
    <dc:creator>AL Zomer</dc:creator>
    <dc:creator>OP Kuipers</dc:creator>
    <dc:creator>J Kok</dc:creator>
    <dc:source>Nucleic Acids Res, Vol. 33, No. Web Server issue. (1 July 2005)</dc:source>
    <dc:date>2008-01-25T17:12:37-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>Nucleic Acids Res</prism:publicationName>
    <prism:issn>1362-4962</prism:issn>
    <prism:volume>33</prism:volume>
    <prism:number>Web Server issue</prism:number>
    <prism:category>qual</prism:category>
    <prism:category>scaffolding</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/1472428">
    <title>Accuracy and quality of massively-parallel DNA pyrosequencing</title>
    <link>http://www.citeulike.org/user/azazello/article/1472428</link>
    <description>&lt;i&gt;Genome Biology, Vol. 8 (20 July 2007), R143.&lt;/i&gt;</description>
    <dc:title>Accuracy and quality of massively-parallel DNA pyrosequencing</dc:title>

    <dc:creator>Susan Huse</dc:creator>
    <dc:creator>Julie Huber</dc:creator>
    <dc:creator>Hilary Morrison</dc:creator>
    <dc:creator>Mitchell Sogin</dc:creator>
    <dc:creator>David Welch</dc:creator>
    <dc:identifier>doi:10.1186/gb-2007-8-7-r143</dc:identifier>
    <dc:source>Genome Biology, Vol. 8 (20 July 2007), R143.</dc:source>
    <dc:date>2007-07-22T01:21:27-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Genome Biology</prism:publicationName>
    <prism:issn>1465-6906</prism:issn>
    <prism:volume>8</prism:volume>
    <prism:startingPage>R143</prism:startingPage>
    <prism:category>454</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/2086009">
    <title>Conservation patterns in different functional sequence categories of divergent Drosophila species.</title>
    <link>http://www.citeulike.org/user/azazello/article/2086009</link>
    <description>&lt;i&gt;Genomics, Vol. 88, No. 4. (October 2006), pp. 431-442.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;We have explored the distributions of fully conserved ungapped blocks in genome-wide pair-wise alignments of recently completed species of Drosophila: D. melanogaster, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis, and D. mojavensis. Based on these distributions we have found that nearly every functional sequence category possesses its own distinctive conservation pattern, sometimes independent of the overall sequence conservation level. In the coding and regulatory regions, the ungapped blocks were longer than in introns, UTRs, and nonfunctional sequences. At the same time, the blocks in the coding regions carried a 3N + 2 signature characteristic of synonymous substitutions in the third-codon position. Larger block sizes in transcription regulatory regions can be explained by the presence of conserved arrays of binding sites for transcription factors. We also have shown that the longest ungapped blocks, or &#34;ultraconserved&#34; sequences, are associated with specific gene groups, including those encoding ion channels and components of the cytoskeleton. We discuss how restraining conservation patterns may help in mapping functional sequence categories and improve genome annotation.</description>
    <dc:title>Conservation patterns in different functional sequence categories of divergent Drosophila species.</dc:title>

    <dc:creator>D Papatsenko</dc:creator>
    <dc:creator>A Kislyuk</dc:creator>
    <dc:creator>M Levine</dc:creator>
    <dc:creator>I Dubchak</dc:creator>
    <dc:identifier>doi:10.1016/j.ygeno.2006.03.012</dc:identifier>
    <dc:source>Genomics, Vol. 88, No. 4. (October 2006), pp. 431-442.</dc:source>
    <dc:date>2007-12-10T15:18:15-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Genomics</prism:publicationName>
    <prism:issn>0888-7543</prism:issn>
    <prism:volume>88</prism:volume>
    <prism:number>4</prism:number>
    <prism:startingPage>431</prism:startingPage>
    <prism:endingPage>442</prism:endingPage>
    <prism:category>no-tag</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/azazello/article/238188">
    <title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</title>
    <link>http://www.citeulike.org/user/azazello/article/238188</link>
    <description>&lt;i&gt;Nucleic Acids Res, Vol. 25, No. 17. (1 September 1997), pp. 3389-3402.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.</description>
    <dc:title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</dc:title>

    <dc:creator>SF Altschul</dc:creator>
    <dc:creator>TL Madden</dc:creator>
    <dc:creator>AA Schäffer</dc:creator>
    <dc:creator>J Zhang</dc:creator>
    <dc:creator>Z Zhang</dc:creator>
    <dc:creator>W Miller</dc:creator>
    <dc:creator>DJ Lipman</dc:creator>
    <dc:identifier>doi:10.1093/nar/25.17.3389</dc:identifier>
    <dc:source>Nucleic Acids Res, Vol. 25, No. 17. (1 September 1997), pp. 3389-3402.</dc:source>
    <dc:date>2005-06-26T00:48:58-00:00</dc:date>
    <prism:publicationYear>1997</prism:publicationYear>
    <prism:publicationName>Nucleic Acids Res</prism:publicationName>
    <prism:issn>0305-1048</prism:issn>
    <prism:volume>25</prism:volume>
    <prism:number>17</prism:number>
    <prism:startingPage>3389</prism:startingPage>
    <prism:endingPage>3402</prism:endingPage>
    <prism:category>no-tag</prism:category>
</item>



</rdf:RDF>

