Semantically enhanced Information Retrieval: An ontology-based approach
Currently, techniques for content description and query processing in Information Retrieval (IR) are based on keywords, and therefore provide limited capabilities to capture the conceptualizations associated with user needs and contents. Aiming to solve the limitations of keyword-based models, the idea of conceptual search, understood as searching by meanings rather than literal strings, has been the focus of a wide body of research in the IR field. More recently, it has been used as a prototypical scenario (or even envisioned as a potential “killer app”) in the Semantic Web (SW) vision, since its emergence in the late nineties. However, current approaches to semantic search developed in the SW area have not yet taken full advantage of the acquired knowledge, accumulated experience, and technological sophistication achieved through several decades of work in the IR field. Starting from this position, this work investigates the definition of an ontology-based IR model, oriented to the exploitation of domain Knowledge Bases to support semantic search capabilities in large document repositories, stressing on the one hand the use of fully fledged ontologies in the semantic-based perspective, and on the other hand the consideration of unstructured content as the target search space. The major contribution of this work is an innovative, comprehensive semantic search model, which extends the classic IR model, addresses the challenges of the massive and heterogeneous Web environment, and integrates the benefits of both keyword and semantic-based search. Additional contributions include: an innovative rank fusion technique that minimizes the undesired effects of knowledge sparseness on the yet juvenile SW, and the creation of a large-scale evaluation benchmark, based on TREC IR evaluation standards, which allows a rigorous comparison between IR and SW approaches. Conducted experiments show that our semantic search model obtained comparable and better performance results (in terms of MAP and P@10 values) than the best TREC automatic system.