<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns="http://purl.org/rss/1.0/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
   xmlns:dcterms="http://purl.org/dc/terms/"

>
<channel rdf:about="http://www.citeulike.org/about">
<pubDate>Thu, 21 Aug 2008 09:54:38 BST</pubDate>


	<title>CiteULike: jyuh's Rebholz-Schuhmann</title>
	<description>CiteULike: jyuh's Rebholz-Schuhmann</description>


	<link>http://www.citeulike.org/user/jyuh/author/Rebholz-Schuhmann</link>
	<dc:publisher>CiteULike.org</dc:publisher>
	<dc:language>en-gb</dc:language>
	<dc:rights>Copyright &#169; 2004-2008 citeulike.org</dc:rights>
	<items>
    <rdf:Seq>
        <rdf:li rdf:resource="http://www.citeulike.org/user/jyuh/article/2775887"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/jyuh/article/2775897"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/jyuh/article/2653988"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/jyuh/article/1922911"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/jyuh/article/1969823"/>

	</rdf:Seq>
	</items>
	</channel>


<item rdf:about="http://www.citeulike.org/user/jyuh/article/2775887">
    <title>Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.</title>
    <link>http://www.citeulike.org/user/jyuh/article/2775887</link>
    <description>&lt;i&gt;BMC bioinformatics, Vol. 9 Suppl 5 (2008)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;BACKGROUND: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. RESULTS: We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. CONCLUSIONS: We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.</description>
    <dc:title>Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.</dc:title>

    <dc:creator>I Spasić</dc:creator>
    <dc:creator>D Schober</dc:creator>
    <dc:creator>SA Sansone</dc:creator>
    <dc:creator>D Rebholz-Schuhmann</dc:creator>
    <dc:creator>DB Kell</dc:creator>
    <dc:creator>NW Paton</dc:creator>
    <dc:identifier>doi:10.1186/1471-2105-9-S5-S5</dc:identifier>
    <dc:source>BMC bioinformatics, Vol. 9 Suppl 5 (2008)</dc:source>
    <dc:date>2008-05-09T13:43:28-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>BMC bioinformatics</prism:publicationName>
    <prism:issn>1471-2105</prism:issn>
    <prism:volume>9 Suppl 5</prism:volume>
    <prism:category>metabolomics</prism:category>
    <prism:category>text-mining</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/jyuh/article/2775897">
    <title>Assessment of disease named entity recognition on a corpus of annotated sentences.</title>
    <link>http://www.citeulike.org/user/jyuh/article/2775897</link>
    <description>&lt;i&gt;BMC bioinformatics, Vol. 9 Suppl 3 (2008)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;BACKGROUND: In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. RESULTS: As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. CONCLUSIONS: The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found that dictionary look-up already provides competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and the literature. MetaMap generates precise results at the expense of insufficient recall while our statistical method obtains better recall at a lower precision rate. Even better results in terms of precision are achieved by combining at least two of the three methods leading, but this approach again lowers recall. Altogether, our analysis gives a better understanding of the complexity of disease annotations in the literature. MetaMap and the dictionary based approach are available through the Whatizit web service infrastructure (Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A: Text processing through Web services: Calling Whatizit. Bioinformatics 2008, 24:296-298).</description>
    <dc:title>Assessment of disease named entity recognition on a corpus of annotated sentences.</dc:title>

    <dc:creator>A Jimeno</dc:creator>
    <dc:creator>E Jimenez-Ruiz</dc:creator>
    <dc:creator>V Lee</dc:creator>
    <dc:creator>S Gaudan</dc:creator>
    <dc:creator>R Berlanga</dc:creator>
    <dc:creator>D Rebholz-Schuhmann</dc:creator>
    <dc:identifier>doi:10.1186/1471-2105-9-S3-S3</dc:identifier>
    <dc:source>BMC bioinformatics, Vol. 9 Suppl 3 (2008)</dc:source>
    <dc:date>2008-05-09T13:44:46-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>BMC bioinformatics</prism:publicationName>
    <prism:issn>1471-2105</prism:issn>
    <prism:volume>9 Suppl 3</prism:volume>
    <prism:category>text-mining</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/jyuh/article/2653988">
    <title>MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline.</title>
    <link>http://www.citeulike.org/user/jyuh/article/2653988</link>
    <description>&lt;i&gt;Bioinformatics (Oxford, England) (9 April 2008)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;SUMMARY: Search engines running on MEDLINE abstracts have been widely used by biologists to find publications that are related to their research. The existing search engines such as PubMed, however, have limitations when applied for the task of seeking textual evidence of relations between given concepts. The limitations are mainly due to the problem that the search engines do not effectively deal with multi-term queries which may imply semantic relations between the terms. To address this problem, we present MedEvi, a novel search engine that imposes positional restriction on occurrences matching multi-term queries, based on the observation that terms with semantic relations which are explicitly stated in text are not found too far from each other. MedEvi further identifies additional keywords of biological and statistical significance from local context of matching occurrences in order to help users reformulate their queries for better results. AVAILABILITY: http://www.ebi.ac.uk/tc-test/textmining/medevi/ CONTACT: kim@ebi.ac.uk, pezik@ebi.ac.uk.</description>
    <dc:title>MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline.</dc:title>

    <dc:creator>Jung-Jae Kim</dc:creator>
    <dc:creator>Piotr Pezik</dc:creator>
    <dc:creator>Dietrich Rebholz-Schuhmann</dc:creator>
    <dc:identifier>doi:10.1093/bioinformatics/btn117</dc:identifier>
    <dc:source>Bioinformatics (Oxford, England) (9 April 2008)</dc:source>
    <dc:date>2008-04-11T13:37:51-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Bioinformatics (Oxford, England)</prism:publicationName>
    <prism:issn>1460-2059</prism:issn>
    <prism:category>pubmed</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/jyuh/article/1922911">
    <title>Text processing through Web services: Calling Whatizit</title>
    <link>http://www.citeulike.org/user/jyuh/article/1922911</link>
    <description>&lt;i&gt;Bioinformatics (15 November 2007), btm557.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Motivation: Text-mining (TM) solutions could turn are developing into efficient services to researchers in the biomedical research community. Such solutions have to scale with the growing number and size of resources (e.g., available controlled vocabularies), with the amount of literature to be processed (e.g., about 17 million documents in PubMed) and with the demands of the user community (e.g., different methods for fact extraction). These demands induce the development of server-based solutions that can be accessed programmatically. Whatizit is a suite of modules that analyse text for contained information, e.g. any own text documents, scientific publications or Medline abstracts. Each module identifies terms and then links them to the corresponding entries in bioinformatics databases such as UniProtKb/Swiss-Prot data entries and gene ontology concepts. Other modules identify a set of selected annotation types like the set produced by the EBIMed analysis pipeline for proteins. In the case of Medline abstracts, Whatizit offers access to EBI's inhouse installation via PMID or term query. For large quantities of own text, the server can be operated in a streaming mode. (http://www.ebi.ac.uk/webservices/whatizit) 10.1093/bioinformatics/btm557</description>
    <dc:title>Text processing through Web services: Calling Whatizit</dc:title>

    <dc:creator>Dietrich Rebholz-Schuhmann</dc:creator>
    <dc:creator>Miguel Arregui</dc:creator>
    <dc:creator>Sylvain Gaudan</dc:creator>
    <dc:creator>Harald Kirsch</dc:creator>
    <dc:creator>Antonio Yepes</dc:creator>
    <dc:identifier>doi:10.1093/bioinformatics/btm557</dc:identifier>
    <dc:source>Bioinformatics (15 November 2007), btm557.</dc:source>
    <dc:date>2007-11-15T15:47:04-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Bioinformatics</prism:publicationName>
    <prism:startingPage>btm557</prism:startingPage>
    <prism:category>pubmed</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/jyuh/article/1969823">
    <title>EBIMed--text crunching to gather facts for proteins from Medline.</title>
    <link>http://www.citeulike.org/user/jyuh/article/1969823</link>
    <description>&lt;i&gt;Bioinformatics, Vol. 23, No. 2. (15 January 2007)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;To allow efficient and systematic retrieval of statements from Medline we have developed EBIMed, a service that combines document retrieval with co-occurrence-based analysis of Medline abstracts. Upon keyword query, EBIMed retrieves the abstracts from EMBL-EBI's installation of Medline and filters for sentences that contain biomedical terminology maintained in public bioinformatics resources. The extracted sentences and terminology are used to generate an overview table on proteins, Gene Ontology (GO) annotations, drugs and species used in the same biological context. All terms in retrieved abstracts and extracted sentences are linked to their entries in biomedical databases. We assessed the quality of the identification of terms and relations in the retrieved sentences. More than 90% of the protein names found indeed represented a protein. According to the analysis of four protein-protein pairs from the Wnt pathway we estimated that 37% of the statements containing such a pair mentioned a meaningful interaction and clarified the interaction of Dkk with LRP. We conclude that EBIMed improves access to information where proteins and drugs are involved in the same biological process, e.g. statements with GO annotations of proteins, protein-protein interactions and effects of drugs on proteins. AVAILABILITY: Available at http://www.ebi.ac.uk/Rebholz-srv/ebimed</description>
    <dc:title>EBIMed--text crunching to gather facts for proteins from Medline.</dc:title>

    <dc:creator>D Rebholz-Schuhmann</dc:creator>
    <dc:creator>H Kirsch</dc:creator>
    <dc:creator>M Arregui</dc:creator>
    <dc:creator>S Gaudan</dc:creator>
    <dc:creator>M Riethoven</dc:creator>
    <dc:creator>P Stoehr</dc:creator>
    <dc:source>Bioinformatics, Vol. 23, No. 2. (15 January 2007)</dc:source>
    <dc:date>2007-11-24T03:40:34-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Bioinformatics</prism:publicationName>
    <prism:issn>1460-2059</prism:issn>
    <prism:volume>23</prism:volume>
    <prism:number>2</prism:number>
    <prism:category>no-tag</prism:category>
</item>



</rdf:RDF>

