![]() |
CiteULike | ![]() |
Nicolas_Torzec's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
YAWN: A Semantically Annotated Wikipedia XML CorpusIn 12th GI Conference on Databases in Business, Technology and Web (BTW 2007) (8 March 2007)
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractAbstract: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.
BibTeX record
RIS record