CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Assigning Roles to Protein Mentions: the Case of Transcription Factors Export

Journal of Biomedical Informatics, Vol. 42, No. 5. (10 October 2009), pp. 887-894.

Citation Format

[Posts]

View FullText article


henk-cul's tags for this article

abner biomedical context-features crf crfplusplus curation dependencies genia-tagger information-extraction keywords lingpipe local-features long-distance machine-learning named-entities parsing pattern-learning phrasal-representation phrases phrase-window semantic-features semantic-roles sentence-classification shallow-parsing syntactic-features transcription-factor

X Reviews [Write a review of this article]

X Notes for this article

henk-cul has 1 private note and 0 public notes for this article. If you are henk-cul then you can log in to see the private note.

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Transcription factors (TFs) play a crucial role in gene regulation, and providing structured and curated information about them is important for genome biology. Manual curation of TF related data is time-consuming and always lags behind the actual knowledge available in the biomedical literature. Here we present a machine-learning text mining approach for identification and tagging of protein mentions that play a TF role in a given context to support the curation process. More precisely, the method explicitly identifies those protein mentions in text that refer to their potential TF functions. The prediction features are engineered from the results of shallow parsing and domain-specific processing (recognition of relevant appearing in phrases) and a phrase-based Conditional Random Fields (CRF) model is used to capture the content and context information of candidate entities. The proposed approach for the identification of TF mentions has been tested on a set of evidence sentences from the TRANSFAC and FlyTF databases. It achieved an F-measure of around 51.5% with a precision of 62.5% using 5-fold cross-validation evaluation. The experimental results suggest that the phrase-based CRF model benefits from the flexibility to use correlated domain-specific features that describe the dependencies between TFs and other entities. To the best of our knowledge, this work is one of the first attempts to apply text-mining techniques to the task of assigning semantic roles to protein mentions.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.