| |
Nucl. Acids Res., Vol. 33, No. suppl_2. (1 July 2005), pp. W783-786.
Abstract
The biomedical literature grows at a tremendous rate and PubMed comprises already over 15 000 000 abstracts. Finding relevant literature is an important and difficult problem. We introduce GoPubMed, a web server which allows users to explore PubMed search results with the Gene Ontology (GO), a hierarchically structured vocabulary for molecular biology. GoPubMed provides the following benefits: first, it gives an overview of the literature abstracts by categorizing abstracts according to the GO and thus allowing users to quickly navigate through ...
|
| |
In AAAI/{IAAI}, Vol. 2 (1996), pp. 1044-1049.
Abstract
Many corpus-based natural language processing systems rely on text corpora that have been manually annotated with syntactic or semantic tags. In particular, all previous dictionary construction systems for information extraction have used an annotated training corpus or some form of annotated input. We have developed a system called AutoSlog-TS that creates dictionaries of extraction patterns using only untagged text. AutoSlog-TS is based on the AutoSlog system, which generated extraction... ...
|
| |
Genome Biology, Vol. 9, No. Suppl 2. (2008), S6.
by Florian Leitner, Martin Krallinger, Carlos R. Penagos, et al.Jorg Hakenberg, Conrad Plake, Cheng J. Kuo, Chun N. Hsu, Richard Tsai, Hsi C. Hung, William Lau, Calvin Johnson, Rune Saetre, Kazuhiro Yoshida, Yan Chen, Sun Kim, Soo Y. Shin, Byoung T. Zhang, William Baumgartner, Lawrence Hunter, Barry Haddow, Michael Matthews, Xinglong Wang, Patrick Ruch, Frederic Ehrler, Arzucan Ozgur, Gunes Erkan, Dragomir Radev, Michael Krauthammer, ThaiBinh Luong, Robert Hoffmann, Chris Sander, Alfonso Valencia
Abstract
We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; http://bcms.bioinfo.cnio.es/). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform ...
|
| |
Nucl. Acids Res., Vol. 38, No. suppl_1. (1 January 2010), pp. D586-592.
Abstract
The Mouse Genome Database (MGD) is a major component of the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) database resource and serves as the primary community model organism database for the laboratory mouse. MGD is the authoritative source for mouse gene, allele and strain nomenclature and for phenotype and functional annotations of mouse genes. MGD contains comprehensive data and information related to mouse genes and their functions, standardized descriptions of mouse phenotypes, extensive integration of DNA and protein sequence data, normalized representation of ...
|
| |
Nucl. Acids Res., Vol. 34, No. suppl_1. (1 January 2006), pp. D562-567.
Abstract
The Mouse Genome Database (MGD) integrates genetic and genomic data for the mouse in order to facilitate the use of the mouse as a model system for understanding human biology and disease processes. A core component of the MGD effort is the acquisition and integration of genomic, genetic, functional and phenotypic information about mouse genes and gene products. MGD works within the broader bioinformatics community to define referential and semantic standards to facilitate data exchange between resources including the incorporation of ...
|
| |
PLoS Biol, Vol. 2, No. 11. (21 September 2004), e309.
Abstract
We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two ...
|
| |
Bioinformatics, Vol. 24, No. 16. (15 August 2008), pp. i126-132.
Abstract
Motivation: Text mining in the biomedical domain aims at helping researchers to access information contained in scientific publications in a faster, easier and more complete way. One step towards this aim is the recognition of named entities and their subsequent normalization to database identifiers. Normalization helps to link objects of potential interest, such as genes, to detailed information not contained in a publication; it is also key for integrating different knowledge sources. From an information retrieval perspective, normalization facilitates indexing and ...
|
| |
Comput. Appl. Biosci., Vol. 17, No. suppl_1. (1 June 2001), pp. S74-82.
Abstract
Systems that extract structured information from natural language passages have been highly successful in specialized domains. The time is opportune for developing analogous applications for molecular biology and genomics. We present a system, GENIES, that extracts and structures information about cellular pathways from the biological literature in accordance with a knowledge model that we developed earlier. We implemented GENIES by modifying an existing medical natural language processing system, MedLEE, and performed a preliminary evaluation study. Our results demonstrate the value ...
|
| |
CoRR, Vol. cs.CL/9907013 (1999)
|
| |
|
| |
Abstract
We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources. 1 Introduction The evaluation of individual language-processing components forming... ...
|
| |
In Proceedings of the 19th International Joint Conference on Artificial Intelligence (2005)
|
| |
In NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (2003), pp. 40-47.
|
| |
|
| |
In Proceedings of the first IEEE International Conference on Semantic Computing (ICSC 2007) (2007)
|
| |
In Advances in Neural Information Processing Systems 17 (2005), pp. 1097-1104.
Abstract
We present a discriminative part-based approach for the recognition of object classes from unsegmented cluttered scenes. Objects are modeled as flexible constellations of parts conditioned on local observations found by an interest operator. For each object class the probability of a given assignment of parts to local features is modeled by a Conditional Random Field (CRF). We propose an extension of the CRF framework that incorporates hidden variables and combines class conditional CRFs into a unified framework for part-based object recognition. ...
|
| |
In Advances in Neural Information Processing Systems (NIPS 2006) (2007)
|
| |
IEEE Internet Computing, Vol. 11, No. 4. (2007), pp. 77-81.
|
| |
(September 2006)
Abstract
In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and ...
|
| |
In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers (April 2007), pp. 109-112.
|
| |
|
| |
In ACL '06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL (2006), pp. 209-216.
|
| |
BMC Bioinformatics 2006, Vol. 7, No. 5. (18 December 2006)
|
| |
BMC Bioinformatics 2006, Vol. 7, No. Suppl 5.
|
| |
|
| |
In Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics (EACL), Bergen, Norway, 1999 (1999), pp. 1-8.
Abstract
It is often claimed that Named Entity recognition systems need extensive gazetteers--lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems. We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the system's performance with gazetteers of different types and... ...
|
| |
In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03) (2003)
Abstract
Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, non-independent features of the input. ...
|
| |
In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001), pp. 82-90.
|
| |
In Proceedings of the 5th International Workshop on Computational Semantics (2003)
|
| |
In HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005), pp. 443-450.
|
| |
In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 (2003), pp. 188-191.
|
| |
In ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning (2001), pp. 282-289.
|
| |
In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005) (June 2005), pp. 181-184.
|
| |
Bioinformatics (April 2005)
Abstract
SUMMARY: POSBIOTM-NER is a trainable biomedical named entity recognition system. POSBIOTM-NER can be automatically trained and adapted to new data sets without performance degradation, using CRF (Conditional Random Field) machine learning techniques and automatic linguistic feature analysis. Currently we have trained our system on three different datasets. GENIA-NER was trained based on GENIA Corpus (Kim et al., 2003), GENE-NER was trained based on BioCreative (Blaschke et al., 2004) data and GPCR-NER was trained based on our own POSBIOTM/NE corpus, which would ...
|
| |
In Tenth Conference on Computational Natural Language Learning (CoNLL-X) (2006)
|
| |
Bioinformatics, Vol. 20 (June 2004)
|
| |
In ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (2005), pp. 363-370.
|
| |
In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (2004), pp. 216-225.
|
| |
In HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005), pp. 724-731.
|
| |
In COLING '04: Proceedings of the 20th international conference on Computational Linguistics (2004)
|
| |
|
| |
|
| |
|
| |
In VLDB '05: Proceedings of the 31st international conference on Very large data bases (2005), pp. 1216-1227.
|
| |
Comput. Linguist., Vol. 20, No. 4. (1994), pp. 535-561.
|
| |
Comput. Linguist., Vol. 27, No. 4. (2001), pp. 521-544.
|
| |
In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (1999), pp. 1-8.
|
| |
In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data (2005), pp. 85-96.
|
| |
In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (2006), pp. 260-267.
|
| |
Commun. ACM, Vol. 38, No. 11. (1995), pp. 33-38.
|