<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns="http://purl.org/rss/1.0/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
   xmlns:dcterms="http://purl.org/dc/terms/"

>
<channel rdf:about="http://www.citeulike.org/about">
<pubDate>Thu, 21 Aug 2008 07:03:07 BST</pubDate>


	<title>CiteULike: indigoviolet's Abril</title>
	<description>CiteULike: indigoviolet's Abril</description>


	<link>http://www.citeulike.org/user/indigoviolet/author/Abril</link>
	<dc:publisher>CiteULike.org</dc:publisher>
	<dc:language>en-gb</dc:language>
	<dc:rights>Copyright &#169; 2004-2008 citeulike.org</dc:rights>
	<items>
    <rdf:Seq>
        <rdf:li rdf:resource="http://www.citeulike.org/user/indigoviolet/article/936036"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/indigoviolet/article/229"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/indigoviolet/article/1145104"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/indigoviolet/article/972746"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/indigoviolet/article/392827"/>

	</rdf:Seq>
	</items>
	</channel>


<item rdf:about="http://www.citeulike.org/user/indigoviolet/article/936036">
    <title>EGASP: the human ENCODE Genome Annotation Assessment Project.</title>
    <link>http://www.citeulike.org/user/indigoviolet/article/936036</link>
    <description>&lt;i&gt;Genome Biol, Vol. 7 Suppl 1 (2006)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.</description>
    <dc:title>EGASP: the human ENCODE Genome Annotation Assessment Project.</dc:title>

    <dc:creator>R Guigó</dc:creator>
    <dc:creator>P Flicek</dc:creator>
    <dc:creator>JF Abril</dc:creator>
    <dc:creator>A Reymond</dc:creator>
    <dc:creator>J Lagarde</dc:creator>
    <dc:creator>F Denoeud</dc:creator>
    <dc:creator>S Antonarakis</dc:creator>
    <dc:creator>M Ashburner</dc:creator>
    <dc:creator>VB Bajic</dc:creator>
    <dc:creator>E Birney</dc:creator>
    <dc:creator>R Castelo</dc:creator>
    <dc:creator>E Eyras</dc:creator>
    <dc:creator>C Ucla</dc:creator>
    <dc:creator>TR Gingeras</dc:creator>
    <dc:creator>J Harrow</dc:creator>
    <dc:creator>T Hubbard</dc:creator>
    <dc:creator>SE Lewis</dc:creator>
    <dc:creator>MG Reese</dc:creator>
    <dc:identifier>doi:10.1186/gb-2006-7-s1-s2</dc:identifier>
    <dc:source>Genome Biol, Vol. 7 Suppl 1 (2006)</dc:source>
    <dc:date>2006-11-08T10:04:37-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Genome Biol</prism:publicationName>
    <prism:issn>1465-6914</prism:issn>
    <prism:volume>7 Suppl 1</prism:volume>
    <prism:category>evaluation</prism:category>
    <prism:category>gene-prediction</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/indigoviolet/article/229">
    <title>Initial sequencing and comparative analysis of the mouse genome.</title>
    <link>http://www.citeulike.org/user/indigoviolet/article/229</link>
    <description>&lt;i&gt;Nature, Vol. 420, No. 6915. (5 December 2002), pp. 520-562.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.</description>
    <dc:title>Initial sequencing and comparative analysis of the mouse genome.</dc:title>

    <dc:creator>RH Waterston</dc:creator>
    <dc:creator>K Lindblad-Toh</dc:creator>
    <dc:creator>E Birney</dc:creator>
    <dc:creator>J Rogers</dc:creator>
    <dc:creator>JF Abril</dc:creator>
    <dc:creator>P Agarwal</dc:creator>
    <dc:creator>R Agarwala</dc:creator>
    <dc:creator>R Ainscough</dc:creator>
    <dc:creator>M Alexandersson</dc:creator>
    <dc:creator>P An</dc:creator>
    <dc:creator>SE Antonarakis</dc:creator>
    <dc:creator>J Attwood</dc:creator>
    <dc:creator>R Baertsch</dc:creator>
    <dc:creator>J Bailey</dc:creator>
    <dc:creator>K Barlow</dc:creator>
    <dc:creator>S Beck</dc:creator>
    <dc:creator>E Berry</dc:creator>
    <dc:creator>B Birren</dc:creator>
    <dc:creator>T Bloom</dc:creator>
    <dc:creator>P Bork</dc:creator>
    <dc:creator>M Botcherby</dc:creator>
    <dc:creator>N Bray</dc:creator>
    <dc:creator>MR Brent</dc:creator>
    <dc:creator>DG Brown</dc:creator>
    <dc:creator>SD Brown</dc:creator>
    <dc:creator>C Bult</dc:creator>
    <dc:creator>J Burton</dc:creator>
    <dc:creator>J Butler</dc:creator>
    <dc:creator>RD Campbell</dc:creator>
    <dc:creator>P Carninci</dc:creator>
    <dc:creator>S Cawley</dc:creator>
    <dc:creator>F Chiaromonte</dc:creator>
    <dc:creator>AT Chinwalla</dc:creator>
    <dc:creator>DM Church</dc:creator>
    <dc:creator>M Clamp</dc:creator>
    <dc:creator>C Clee</dc:creator>
    <dc:creator>FS Collins</dc:creator>
    <dc:creator>LL Cook</dc:creator>
    <dc:creator>RR Copley</dc:creator>
    <dc:creator>A Coulson</dc:creator>
    <dc:creator>O Couronne</dc:creator>
    <dc:creator>J Cuff</dc:creator>
    <dc:creator>V Curwen</dc:creator>
    <dc:creator>T Cutts</dc:creator>
    <dc:creator>M Daly</dc:creator>
    <dc:creator>R David</dc:creator>
    <dc:creator>J Davies</dc:creator>
    <dc:creator>KD Delehaunty</dc:creator>
    <dc:creator>J Deri</dc:creator>
    <dc:creator>ET Dermitzakis</dc:creator>
    <dc:creator>C Dewey</dc:creator>
    <dc:creator>NJ Dickens</dc:creator>
    <dc:creator>M Diekhans</dc:creator>
    <dc:creator>S Dodge</dc:creator>
    <dc:creator>I Dubchak</dc:creator>
    <dc:creator>DM Dunn</dc:creator>
    <dc:creator>SR Eddy</dc:creator>
    <dc:creator>L Elnitski</dc:creator>
    <dc:creator>RD Emes</dc:creator>
    <dc:creator>P Eswara</dc:creator>
    <dc:creator>E Eyras</dc:creator>
    <dc:creator>A Felsenfeld</dc:creator>
    <dc:creator>GA Fewell</dc:creator>
    <dc:creator>P Flicek</dc:creator>
    <dc:creator>K Foley</dc:creator>
    <dc:creator>WN Frankel</dc:creator>
    <dc:creator>LA Fulton</dc:creator>
    <dc:creator>RS Fulton</dc:creator>
    <dc:creator>TS Furey</dc:creator>
    <dc:creator>D Gage</dc:creator>
    <dc:creator>RA Gibbs</dc:creator>
    <dc:creator>G Glusman</dc:creator>
    <dc:creator>S Gnerre</dc:creator>
    <dc:creator>N Goldman</dc:creator>
    <dc:creator>L Goodstadt</dc:creator>
    <dc:creator>D Grafham</dc:creator>
    <dc:creator>TA Graves</dc:creator>
    <dc:creator>ED Green</dc:creator>
    <dc:creator>S Gregory</dc:creator>
    <dc:creator>R Guigó</dc:creator>
    <dc:creator>M Guyer</dc:creator>
    <dc:creator>RC Hardison</dc:creator>
    <dc:creator>D Haussler</dc:creator>
    <dc:creator>Y Hayashizaki</dc:creator>
    <dc:creator>LW Hillier</dc:creator>
    <dc:creator>A Hinrichs</dc:creator>
    <dc:creator>W Hlavina</dc:creator>
    <dc:creator>T Holzer</dc:creator>
    <dc:creator>F Hsu</dc:creator>
    <dc:creator>A Hua</dc:creator>
    <dc:creator>T Hubbard</dc:creator>
    <dc:creator>A Hunt</dc:creator>
    <dc:creator>I Jackson</dc:creator>
    <dc:creator>DB Jaffe</dc:creator>
    <dc:creator>LS Johnson</dc:creator>
    <dc:creator>M Jones</dc:creator>
    <dc:creator>TA Jones</dc:creator>
    <dc:creator>A Joy</dc:creator>
    <dc:creator>M Kamal</dc:creator>
    <dc:creator>EK Karlsson</dc:creator>
    <dc:creator>D Karolchik</dc:creator>
    <dc:creator>A Kasprzyk</dc:creator>
    <dc:creator>J Kawai</dc:creator>
    <dc:creator>E Keibler</dc:creator>
    <dc:creator>C Kells</dc:creator>
    <dc:creator>WJ Kent</dc:creator>
    <dc:creator>A Kirby</dc:creator>
    <dc:creator>DL Kolbe</dc:creator>
    <dc:creator>I Korf</dc:creator>
    <dc:creator>RS Kucherlapati</dc:creator>
    <dc:creator>EJ Kulbokas</dc:creator>
    <dc:creator>D Kulp</dc:creator>
    <dc:creator>T Landers</dc:creator>
    <dc:creator>JP Leger</dc:creator>
    <dc:creator>S Leonard</dc:creator>
    <dc:creator>I Letunic</dc:creator>
    <dc:creator>R Levine</dc:creator>
    <dc:creator>J Li</dc:creator>
    <dc:creator>M Li</dc:creator>
    <dc:creator>C Lloyd</dc:creator>
    <dc:creator>S Lucas</dc:creator>
    <dc:creator>B Ma</dc:creator>
    <dc:creator>DR Maglott</dc:creator>
    <dc:creator>ER Mardis</dc:creator>
    <dc:creator>L Matthews</dc:creator>
    <dc:creator>E Mauceli</dc:creator>
    <dc:creator>JH Mayer</dc:creator>
    <dc:creator>M McCarthy</dc:creator>
    <dc:creator>WR McCombie</dc:creator>
    <dc:creator>S McLaren</dc:creator>
    <dc:creator>K McLay</dc:creator>
    <dc:creator>JD McPherson</dc:creator>
    <dc:creator>J Meldrim</dc:creator>
    <dc:creator>B Meredith</dc:creator>
    <dc:creator>JP Mesirov</dc:creator>
    <dc:creator>W Miller</dc:creator>
    <dc:creator>TL Miner</dc:creator>
    <dc:creator>E Mongin</dc:creator>
    <dc:creator>KT Montgomery</dc:creator>
    <dc:creator>M Morgan</dc:creator>
    <dc:creator>R Mott</dc:creator>
    <dc:creator>JC Mullikin</dc:creator>
    <dc:creator>DM Muzny</dc:creator>
    <dc:creator>WE Nash</dc:creator>
    <dc:creator>JO Nelson</dc:creator>
    <dc:creator>MN Nhan</dc:creator>
    <dc:creator>R Nicol</dc:creator>
    <dc:creator>Z Ning</dc:creator>
    <dc:creator>C Nusbaum</dc:creator>
    <dc:creator>MJ O'Connor</dc:creator>
    <dc:creator>Y Okazaki</dc:creator>
    <dc:creator>K Oliver</dc:creator>
    <dc:creator>E Overton-Larty</dc:creator>
    <dc:creator>L Pachter</dc:creator>
    <dc:creator>G Parra</dc:creator>
    <dc:creator>KH Pepin</dc:creator>
    <dc:creator>J Peterson</dc:creator>
    <dc:creator>P Pevzner</dc:creator>
    <dc:creator>R Plumb</dc:creator>
    <dc:creator>CS Pohl</dc:creator>
    <dc:creator>A Poliakov</dc:creator>
    <dc:creator>TC Ponce</dc:creator>
    <dc:creator>CP Ponting</dc:creator>
    <dc:creator>S Potter</dc:creator>
    <dc:creator>M Quail</dc:creator>
    <dc:creator>A Reymond</dc:creator>
    <dc:creator>BA Roe</dc:creator>
    <dc:creator>KM Roskin</dc:creator>
    <dc:creator>EM Rubin</dc:creator>
    <dc:creator>AG Rust</dc:creator>
    <dc:creator>R Santos</dc:creator>
    <dc:creator>V Sapojnikov</dc:creator>
    <dc:creator>B Schultz</dc:creator>
    <dc:creator>J Schultz</dc:creator>
    <dc:creator>MS Schwartz</dc:creator>
    <dc:creator>S Schwartz</dc:creator>
    <dc:creator>C Scott</dc:creator>
    <dc:creator>S Seaman</dc:creator>
    <dc:creator>S Searle</dc:creator>
    <dc:creator>T Sharpe</dc:creator>
    <dc:creator>A Sheridan</dc:creator>
    <dc:creator>R Shownkeen</dc:creator>
    <dc:creator>S Sims</dc:creator>
    <dc:creator>JB Singer</dc:creator>
    <dc:creator>G Slater</dc:creator>
    <dc:creator>A Smit</dc:creator>
    <dc:creator>DR Smith</dc:creator>
    <dc:creator>B Spencer</dc:creator>
    <dc:creator>A Stabenau</dc:creator>
    <dc:creator>N Stange-Thomann</dc:creator>
    <dc:creator>C Sugnet</dc:creator>
    <dc:creator>M Suyama</dc:creator>
    <dc:creator>G Tesler</dc:creator>
    <dc:creator>J Thompson</dc:creator>
    <dc:creator>D Torrents</dc:creator>
    <dc:creator>E Trevaskis</dc:creator>
    <dc:creator>J Tromp</dc:creator>
    <dc:creator>C Ucla</dc:creator>
    <dc:creator>A Ureta-Vidal</dc:creator>
    <dc:creator>JP Vinson</dc:creator>
    <dc:creator>AC Von Niederhausern</dc:creator>
    <dc:creator>CM Wade</dc:creator>
    <dc:creator>M Wall</dc:creator>
    <dc:creator>RJ Weber</dc:creator>
    <dc:creator>RB Weiss</dc:creator>
    <dc:creator>MC Wendl</dc:creator>
    <dc:creator>AP West</dc:creator>
    <dc:creator>K Wetterstrand</dc:creator>
    <dc:creator>R Wheeler</dc:creator>
    <dc:creator>S Whelan</dc:creator>
    <dc:creator>J Wierzbowski</dc:creator>
    <dc:creator>D Willey</dc:creator>
    <dc:creator>S Williams</dc:creator>
    <dc:creator>RK Wilson</dc:creator>
    <dc:creator>E Winter</dc:creator>
    <dc:creator>KC Worley</dc:creator>
    <dc:creator>D Wyman</dc:creator>
    <dc:creator>S Yang</dc:creator>
    <dc:creator>SP Yang</dc:creator>
    <dc:creator>EM Zdobnov</dc:creator>
    <dc:creator>MC Zody</dc:creator>
    <dc:creator>ES Lander</dc:creator>
    <dc:creator></dc:creator>
    <dc:identifier>doi:10.1038/nature01262</dc:identifier>
    <dc:source>Nature, Vol. 420, No. 6915. (5 December 2002), pp. 520-562.</dc:source>
    <dc:date>2004-11-22T00:17:30-00:00</dc:date>
    <prism:publicationYear>2002</prism:publicationYear>
    <prism:publicationName>Nature</prism:publicationName>
    <prism:issn>0028-0836</prism:issn>
    <prism:volume>420</prism:volume>
    <prism:number>6915</prism:number>
    <prism:startingPage>520</prism:startingPage>
    <prism:endingPage>562</prism:endingPage>
    <prism:category>genome</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/indigoviolet/article/1145104">
    <title>Comparative gene prediction in human and mouse.</title>
    <link>http://www.citeulike.org/user/indigoviolet/article/1145104</link>
    <description>&lt;i&gt;Genome Res, Vol. 13, No. 1. (January 2003), pp. 108-117.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The completion of the sequencing of the mouse genome promises to help predict human genes with greater accuracy. While current ab initio gene prediction programs are remarkably sensitive (i.e., they predict at least a fragment of most genes), their specificity is often low, predicting a large number of false-positive genes in the human genome. Sequence conservation at the protein level with the mouse genome can help eliminate some of those false positives. Here we describe SGP2, a gene prediction program that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions. The accuracy of SGP2 when used to predict genes by comparing the human and mouse genomes is assessed on a number of data sets, including single-gene data sets, the highly curated human chromosome 22 predictions, and entire genome predictions from ENSEMBL. Results indicate that SGP2 outperforms purely ab initio gene prediction methods. Results also indicate that SGP2 works about as well with 3x shotgun data as it does with fully assembled genomes. SGP2 provides a high enough specificity that its predictions can be experimentally verified at a reasonable cost. SGP2 was used to generate a complete set of gene predictions on both the human and mouse by comparing the genomes of these two species. Our results suggest that another few thousand human and mouse genes currently not in ENSEMBL are worth verifying experimentally.</description>
    <dc:title>Comparative gene prediction in human and mouse.</dc:title>

    <dc:creator>G Parra</dc:creator>
    <dc:creator>P Agarwal</dc:creator>
    <dc:creator>JF Abril</dc:creator>
    <dc:creator>T Wiehe</dc:creator>
    <dc:creator>JW Fickett</dc:creator>
    <dc:creator>R Guigó</dc:creator>
    <dc:identifier>doi:10.1101/gr.871403</dc:identifier>
    <dc:source>Genome Res, Vol. 13, No. 1. (January 2003), pp. 108-117.</dc:source>
    <dc:date>2007-03-07T04:20:50-00:00</dc:date>
    <prism:publicationYear>2003</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:volume>13</prism:volume>
    <prism:number>1</prism:number>
    <prism:startingPage>108</prism:startingPage>
    <prism:endingPage>117</prism:endingPage>
    <prism:category>algorithm</prism:category>
    <prism:category>gene-prediction</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/indigoviolet/article/972746">
    <title>Genome annotation assessment in Drosophila melanogaster.</title>
    <link>http://www.citeulike.org/user/indigoviolet/article/972746</link>
    <description>&lt;i&gt;Genome Res, Vol. 10, No. 4. (April 2000), pp. 483-501.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Computational methods for automated genome annotation are critical to our community's ability to make full use of the large volume of genomic sequence being generated and released. To explore the accuracy of these automated feature prediction tools in the genomes of higher organisms, we evaluated their performance on a large, well-characterized sequence contig from the Adh region of Drosophila melanogaster. This experiment, known as the Genome Annotation Assessment Project (GASP), was launched in May 1999. Twelve groups, applying state-of-the-art tools, contributed predictions for features including gene structure, protein homologies, promoter sites, and repeat elements. We evaluated these predictions using two standards, one based on previously unreleased high-quality full-length cDNA sequences and a second based on the set of annotations generated as part of an in-depth study of the region by a group of Drosophila experts. Although these standard sets only approximate the unknown distribution of features in this region, we believe that when taken in context the results of an evaluation based on them are meaningful. The results were presented as a tutorial at the conference on Intelligent Systems in Molecular Biology (ISMB-99) in August 1999. Over 95% of the coding nucleotides in the region were correctly identified by the majority of the gene finders, and the correct intron/exon structures were predicted for &#62;40% of the genes. Homology-based annotation techniques recognized and associated functions with almost half of the genes in the region; the remainder were only identified by the ab initio techniques. This experiment also presents the first assessment of promoter prediction techniques for a significant number of genes in a large contiguous region. We discovered that the promoter predictors' high false-positive rates make their predictions difficult to use. Integrating gene finding and cDNA/EST alignments with promoter predictions decreases the number of false-positive classifications but discovers less than one-third of the promoters in the region. We believe that by establishing standards for evaluating genomic annotations and by assessing the performance of existing automated genome annotation tools, this experiment establishes a baseline that contributes to the value of ongoing large-scale annotation projects and should guide further research in genome informatics.</description>
    <dc:title>Genome annotation assessment in Drosophila melanogaster.</dc:title>

    <dc:creator>MG Reese</dc:creator>
    <dc:creator>G Hartzell</dc:creator>
    <dc:creator>NL Harris</dc:creator>
    <dc:creator>U Ohler</dc:creator>
    <dc:creator>JF Abril</dc:creator>
    <dc:creator>SE Lewis</dc:creator>
    <dc:identifier>doi:10.1101/gr.10.4.483</dc:identifier>
    <dc:source>Genome Res, Vol. 10, No. 4. (April 2000), pp. 483-501.</dc:source>
    <dc:date>2006-12-04T03:44:36-00:00</dc:date>
    <prism:publicationYear>2000</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:volume>10</prism:volume>
    <prism:number>4</prism:number>
    <prism:startingPage>483</prism:startingPage>
    <prism:endingPage>501</prism:endingPage>
    <prism:category>annotation</prism:category>
    <prism:category>evaluation</prism:category>
    <prism:category>gene-prediction</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/indigoviolet/article/392827">
    <title>An assessment of gene prediction accuracy in large DNA sequences.</title>
    <link>http://www.citeulike.org/user/indigoviolet/article/392827</link>
    <description>&lt;i&gt;Genome Res, Vol. 10, No. 10. (October 2000), pp. 1631-1642.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.</description>
    <dc:title>An assessment of gene prediction accuracy in large DNA sequences.</dc:title>

    <dc:creator>R Guigó</dc:creator>
    <dc:creator>P Agarwal</dc:creator>
    <dc:creator>JF Abril</dc:creator>
    <dc:creator>M Burset</dc:creator>
    <dc:creator>JW Fickett</dc:creator>
    <dc:identifier>doi:10.1101/gr.122800</dc:identifier>
    <dc:source>Genome Res, Vol. 10, No. 10. (October 2000), pp. 1631-1642.</dc:source>
    <dc:date>2005-11-15T00:45:58-00:00</dc:date>
    <prism:publicationYear>2000</prism:publicationYear>
    <prism:publicationName>Genome Res</prism:publicationName>
    <prism:issn>1088-9051</prism:issn>
    <prism:volume>10</prism:volume>
    <prism:number>10</prism:number>
    <prism:startingPage>1631</prism:startingPage>
    <prism:endingPage>1642</prism:endingPage>
    <prism:category>evaluation</prism:category>
    <prism:category>gene-prediction</prism:category>
</item>



</rdf:RDF>

