<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns="http://purl.org/rss/1.0/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
   xmlns:dcterms="http://purl.org/dc/terms/"

>
<channel rdf:about="http://www.citeulike.org/about">
<pubDate>Sat, 26 Jul 2008 08:12:06 BST</pubDate>


	<title>CiteULike: vlachmore's normalization</title>
	<description>CiteULike: vlachmore's normalization</description>


	<link>http://www.citeulike.org/user/vlachmore/tag/normalization</link>
	<dc:publisher>CiteULike.org</dc:publisher>
	<dc:language>en-gb</dc:language>
	<dc:rights>Copyright &#169; 2004-2008 citeulike.org</dc:rights>
	<items>
    <rdf:Seq>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/2670160"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/2146646"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/2137818"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/2137810"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/892146"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/2137707"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/493764"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/2137573"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/604600"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/1626506"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/vlachmore/article/594823"/>

	</rdf:Seq>
	</items>
	</channel>


<item rdf:about="http://www.citeulike.org/user/vlachmore/article/2670160">
    <title>Normalizing biomedical terms by minimizing ambiguity and variability</title>
    <link>http://www.citeulike.org/user/vlachmore/article/2670160</link>
    <description>&lt;i&gt;BMC Bioinformatics, Vol. 9, No. Suppl 3. (2008)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;BACKGROUND:One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach.RESULTS:We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS.CONCLUSIONS:The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known.</description>
    <dc:title>Normalizing biomedical terms by minimizing ambiguity and variability</dc:title>

    <dc:creator>Yoshimasa Tsuruoka</dc:creator>
    <dc:creator>John Mcnaught</dc:creator>
    <dc:creator>Sophia Ananiadou</dc:creator>
    <dc:identifier>doi:10.1186/1471-2105-9-S3-S2</dc:identifier>
    <dc:source>BMC Bioinformatics, Vol. 9, No. Suppl 3. (2008)</dc:source>
    <dc:date>2008-04-14T17:56:30-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>BMC Bioinformatics</prism:publicationName>
    <prism:volume>9</prism:volume>
    <prism:number>Suppl 3</prism:number>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>normalization</prism:category>
    <prism:category>rules</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/2146646">
    <title>Me and my friends: gene mention normalization with background knowledge</title>
    <link>http://www.citeulike.org/user/vlachmore/article/2146646</link>
    <description>&lt;i&gt;&lt;/i&gt;</description>
    <dc:title>Me and my friends: gene mention normalization with background knowledge</dc:title>

    <dc:creator>Jörg Hakenberg</dc:creator>
    <dc:creator>Loic Royer</dc:creator>
    <dc:creator>Conrad Plake</dc:creator>
    <dc:creator>Hendrik Strobelt</dc:creator>
    <dc:creator>Michael Schroeder</dc:creator>
    <dc:date>2007-12-19T13:48:23-00:00</dc:date>
    <prism:category>bionlp</prism:category>
    <prism:category>context</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>normalization</prism:category>
    <prism:category>ranking</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/2137818">
    <title>Human gene name normalization using text matching with automatically extracted synonym dictionaries</title>
    <link>http://www.citeulike.org/user/vlachmore/article/2137818</link>
    <description>&lt;i&gt;(2006), pp. 41-48.&lt;/i&gt;</description>
    <dc:title>Human gene name normalization using text matching with automatically extracted synonym dictionaries</dc:title>

    <dc:creator>Haw-Ren Fang</dc:creator>
    <dc:creator>Kevin Murphy</dc:creator>
    <dc:creator>Yang Jin</dc:creator>
    <dc:creator>Jessica Kim</dc:creator>
    <dc:creator>Peter White</dc:creator>
    <dc:source>(2006), pp. 41-48.</dc:source>
    <dc:date>2007-12-17T18:34:16-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:startingPage>41</prism:startingPage>
    <prism:endingPage>48</prism:endingPage>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>name</prism:category>
    <prism:category>normalization</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/2137810">
    <title>A Graph-Search Framework for GeneId Ranking (Extended Abstract)</title>
    <link>http://www.citeulike.org/user/vlachmore/article/2137810</link>
    <description>&lt;i&gt;(2006)&lt;/i&gt;</description>
    <dc:title>A Graph-Search Framework for GeneId Ranking (Extended Abstract)</dc:title>

    <dc:creator>William Cohen</dc:creator>
    <dc:source>(2006)</dc:source>
    <dc:date>2007-12-17T18:31:47-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>graph</prism:category>
    <prism:category>normalization</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/892146">
    <title>A graph-search framework for associating gene identifiers with documents</title>
    <link>http://www.citeulike.org/user/vlachmore/article/892146</link>
    <description>&lt;i&gt;BMC Bioinformatics, Vol. 7 (10 October 2006), 440.&lt;/i&gt;</description>
    <dc:title>A graph-search framework for associating gene identifiers with documents</dc:title>

    <dc:creator>William Cohen</dc:creator>
    <dc:creator>Einat Minkov</dc:creator>
    <dc:identifier>doi:10.1186/1471-2105-7-440</dc:identifier>
    <dc:source>BMC Bioinformatics, Vol. 7 (10 October 2006), 440.</dc:source>
    <dc:date>2006-10-10T23:31:19-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>BMC Bioinformatics</prism:publicationName>
    <prism:issn>1471-2105</prism:issn>
    <prism:volume>7</prism:volume>
    <prism:startingPage>440</prism:startingPage>
    <prism:category>algorithm</prism:category>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>graph</prism:category>
    <prism:category>normalization</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/2137707">
    <title>Weakly Supervised Learning Methods for Improving the Quality of Gene Name Normalization Data</title>
    <link>http://www.citeulike.org/user/vlachmore/article/2137707</link>
    <description>&lt;i&gt;(2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;A pervasive problem facing many biomedical text mining applications is that of correctly associating mentions of entities in the literature with corresponding concepts in a database or ontology. Attempts to build systems for automating this process have shown promise as demonstrated by the recent BioCreAtIvE Task 1B evaluation. A significant obstacle to improved performance for this task, however, is a lack of high quality training data. In this work, we explore methods for...</description>
    <dc:title>Weakly Supervised Learning Methods for Improving the Quality of Gene Name Normalization Data</dc:title>

    <dc:creator>Ben Wellner</dc:creator>
    <dc:source>(2005)</dc:source>
    <dc:date>2007-12-17T17:50:39-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>normalization</prism:category>
    <prism:category>semi-supervised</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/493764">
    <title>Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles</title>
    <link>http://www.citeulike.org/user/vlachmore/article/493764</link>
    <description>&lt;i&gt;SIGKDD Explor. Newsl., Vol. 4, No. 2. (December 2002), pp. 87-89.&lt;/i&gt;</description>
    <dc:title>Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles</dc:title>

    <dc:creator>Alexander Yeh</dc:creator>
    <dc:creator>Lynette Hirschman</dc:creator>
    <dc:creator>Alexander Morgan</dc:creator>
    <dc:identifier>doi:10.1145/772862.772873</dc:identifier>
    <dc:source>SIGKDD Explor. Newsl., Vol. 4, No. 2. (December 2002), pp. 87-89.</dc:source>
    <dc:date>2006-02-04T09:52:35-00:00</dc:date>
    <prism:publicationYear>2002</prism:publicationYear>
    <prism:publicationName>SIGKDD Explor. Newsl.</prism:publicationName>
    <prism:volume>4</prism:volume>
    <prism:number>2</prism:number>
    <prism:startingPage>87</prism:startingPage>
    <prism:endingPage>89</prism:endingPage>
    <prism:publisher>ACM Press</prism:publisher>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>kdd</prism:category>
    <prism:category>normalization</prism:category>
    <prism:category>shared</prism:category>
    <prism:category>task</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/2137573">
    <title>Evaluating the automatic mapping of human gene and protein mentions to unique identifiers.</title>
    <link>http://www.citeulike.org/user/vlachmore/article/2137573</link>
    <description>&lt;i&gt;(2007)&lt;/i&gt;</description>
    <dc:title>Evaluating the automatic mapping of human gene and protein mentions to unique identifiers.</dc:title>

    <dc:creator>AA Morgan</dc:creator>
    <dc:creator>B Wellner</dc:creator>
    <dc:creator>JB Colombe</dc:creator>
    <dc:creator>R Arens</dc:creator>
    <dc:creator>ME Colosimo</dc:creator>
    <dc:creator>L Hirschman</dc:creator>
    <dc:source>(2007)</dc:source>
    <dc:date>2007-12-17T17:03:15-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>normalization</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/604600">
    <title>Data preparation and interannotator agreement: BioCreAtIvE task 1B.</title>
    <link>http://www.citeulike.org/user/vlachmore/article/604600</link>
    <description>&lt;i&gt;BMC Bioinformatics, Vol. 6 Suppl 1 (2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;BACKGROUND: We prepared and evaluated training and test materials for an assessment of text mining methods in molecular biology. The goal of the assessment was to evaluate the ability of automated systems to generate a list of unique gene identifiers from PubMed abstracts for the three model organisms Fly, Mouse, and Yeast. This paper describes the preparation and evaluation of answer keys for training and testing. These consisted of lists of normalized gene names found in the abstracts, generated by adapting the gene list for the full journal articles found in the model organism databases. For the training dataset, the gene list was pruned automatically to remove gene names not found in the abstract; for the testing dataset, it was further refined by manual annotation by annotators provided with guidelines. A critical step in interpreting the results of an assessment is to evaluate the quality of the data preparation. We did this by careful assessment of interannotator agreement and the use of answer pooling of participant results to improve the quality of the final testing dataset. RESULTS: Interannotator analysis on a small dataset showed that our gene lists for Fly and Yeast were good (87% and 91% three-way agreement) but the Mouse gene list had many conflicts (mostly omissions), which resulted in errors (69% interannotator agreement). By comparing and pooling answers from the participant systems, we were able to add an additional check on the test data; this allowed us to find additional errors, especially in Mouse. This led to 1% change in the Yeast and Fly &#34;gold standard&#34; answer keys, but to an 8% change in the mouse answer key. CONCLUSION: We found that clear annotation guidelines are important, along with careful interannotator experiments, to validate the generated gene lists. Also, abstracts alone are a poor resource for identifying genes in paper, containing only a fraction of genes mentioned in the full text (25% for Fly, 36% for Mouse). We found that there are intrinsic differences between the model organism databases related to the number of synonymous terms and also to curation criteria. Finally, we found that answer pooling was much faster and allowed us to identify more conflicting genes than interannotator analysis.</description>
    <dc:title>Data preparation and interannotator agreement: BioCreAtIvE task 1B.</dc:title>

    <dc:creator>ME Colosimo</dc:creator>
    <dc:creator>AA Morgan</dc:creator>
    <dc:creator>AS Yeh</dc:creator>
    <dc:creator>JB Colombe</dc:creator>
    <dc:creator>L Hirschman</dc:creator>
    <dc:identifier>doi:10.1186/1471-2105-6-S1-S12</dc:identifier>
    <dc:source>BMC Bioinformatics, Vol. 6 Suppl 1 (2005)</dc:source>
    <dc:date>2006-04-27T15:58:22-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>BMC Bioinformatics</prism:publicationName>
    <prism:issn>1471-2105</prism:issn>
    <prism:volume>6 Suppl 1</prism:volume>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>identifier</prism:category>
    <prism:category>normalization</prism:category>
    <prism:category>shared</prism:category>
    <prism:category>task</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/1626506">
    <title>Learning string similarity measures for gene/protein name dictionary look-up using logistic regression.</title>
    <link>http://www.citeulike.org/user/vlachmore/article/1626506</link>
    <description>&lt;i&gt;Bioinformatics (12 August 2007)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;MOTIVATION: One of the bottlenecks of biomedical data integration is variation of terms. Exact string matching often fails to associate a name with its biological concept, i.e. ID or accession number in the database, due to seemingly small differences of names. Soft string matching potentially enables us to find the relevant ID by considering the similarity between the names. However, the accuracy of soft matching highly depends on the similarity measure employed. RESULTS: We used logistic regression for learning a string similarity measure from a dictionary. Experiments using several large-scale gene/protein name dictionaries showed that the logistic regression-based similarity measure outperforms existing similarity measures in dictionary look-uptasks. AVAILABILITY: A dictionary look-up system using the similarity measures described in this paper is available at http://text0.mib.man.ac.uk/software/mldic/ CONTACT: yoshimasa.tsuruoka@manchester.ac.uk.</description>
    <dc:title>Learning string similarity measures for gene/protein name dictionary look-up using logistic regression.</dc:title>

    <dc:creator>Yoshimasa Tsuruoka</dc:creator>
    <dc:creator>John McNaught</dc:creator>
    <dc:creator>Jun'ichi Tsujii</dc:creator>
    <dc:creator>Sophia Ananiadou</dc:creator>
    <dc:identifier>doi:10.1093/bioinformatics/btm393</dc:identifier>
    <dc:source>Bioinformatics (12 August 2007)</dc:source>
    <dc:date>2007-09-06T11:54:19-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Bioinformatics</prism:publicationName>
    <prism:issn>1460-2059</prism:issn>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>name</prism:category>
    <prism:category>normalization</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/vlachmore/article/594823">
    <title>Overview of BioCreAtIvE task 1B: normalized gene lists.</title>
    <link>http://www.citeulike.org/user/vlachmore/article/594823</link>
    <description>&lt;i&gt;BMC Bioinformatics, Vol. 6 Suppl 1 (2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;BACKGROUND: Our goal in BioCreAtIve has been to assess the state of the art in text mining, with emphasis on applications that reflect real biological applications, e.g., the curation process for model organism databases. This paper summarizes the BioCreAtIvE task 1B, the &#34;Normalized Gene List&#34; task, which was inspired by the gene list supplied for each curated paper in a model organism database. The task was to produce the correct list of unique gene identifiers for the genes and gene products mentioned in sets of abstracts from three model organisms (Yeast, Fly, and Mouse). RESULTS: Eight groups fielded systems for three data sets (Yeast, Fly, and Mouse). For Yeast, the top scoring system (out of 15) achieved 0.92 F-measure (harmonic mean of precision and recall); for Mouse and Fly, the task was more difficult, due to larger numbers of genes, more ambiguity in the gene naming conventions (particularly for Fly), and complex gene names (for Mouse). For Fly, the top F-measure was 0.82 out of 11 systems and for Mouse, it was 0.79 out of 16 systems. CONCLUSION: This assessment demonstrates that multiple groups were able to perform a real biological task across a range of organisms. The performance was dependent on the organism, and specifically on the naming conventions associated with each organism. These results hold out promise that the technology can provide partial automation of the curation process in the near future.</description>
    <dc:title>Overview of BioCreAtIvE task 1B: normalized gene lists.</dc:title>

    <dc:creator>L Hirschman</dc:creator>
    <dc:creator>M Colosimo</dc:creator>
    <dc:creator>A Morgan</dc:creator>
    <dc:creator>A Yeh</dc:creator>
    <dc:identifier>doi:10.1186/1471-2105-6-S1-S11</dc:identifier>
    <dc:source>BMC Bioinformatics, Vol. 6 Suppl 1 (2005)</dc:source>
    <dc:date>2006-04-21T22:27:54-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>BMC Bioinformatics</prism:publicationName>
    <prism:issn>1471-2105</prism:issn>
    <prism:volume>6 Suppl 1</prism:volume>
    <prism:category>bionlp</prism:category>
    <prism:category>gene</prism:category>
    <prism:category>name</prism:category>
    <prism:category>normalization</prism:category>
</item>



</rdf:RDF>

