Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora
This paper proposes a method for learning ontologies given a corpus of text documents. The method identifies concepts in documents and organizes them into a subsumption hierarchy, without presupposing the existence of a seed ontology. The method uncovers latent topics in terms of which document text is being generated. These topics form the concepts of the new ontology. This is done in a language neutral way, using probabilistic space reduction techniques over the original term space of the corpus. Given multiple sets of concepts (latent topics) being discovered, the proposed method constructs a subsumption hierarchy by performing conditional independence tests among pairs of latent topics, given a third one. The paper provides experimental results over the GENIA corpus from the domain of biomedicine.