Exploring hedge identification in biomedical literature
We investigate automatic identification of speculative language, or ‘hedging’, in scientific literature from the biomedical domain. Our contributions include a precise description of the task including annotation guidelines, theoretical analysis and discussion. We show that good agreement can be achieved using our guidelines and present a publicly available benchmark dataset for the task. We argue for separation of the acquisition and classification phases in semi-supervised machine learning, and present a probabilistic acquisition model which is evaluated both theoretically and experimentally. We explore the impact of different sample representations on classification accuracy across the learning curve and demonstrate the effectiveness of using machine learning for the hedge identification task. Finally, we examine the errors made by our approach and point toward avenues for future research.