CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Journes Internationales d'Analyse Statistique des Donnes Textuelles Using Information Extraction to Classify Newspapers Advertisements Export

Citation Format

[Posts]

View FullText article


wryun's tags for this article

classification classifieds extraction information

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

This paper presents a text classification procedure that has been developed in the context of an information extraction project. In the prototype that has been developed for this project, newspaper advertisements are processed by three main modules: first of all, a classification module associates a category to the advertisement. Then, a tagging module identifies textual information units that are related to the associated category, and finally a predefined form for that category is filled with the tagged text. The classification module, which is the main focus of this paper, consists in using a naive Bayes classifier and, at the same time, trying to fill all the predefined forms associated with all categories. Results of both methods (classification probabilities and filling scores) are then combined to provide a final classification decision. This mixed classification method is described and evaluated on the basis of concrete experiments carried out on real data. The purpose of the presented experiments is to precisely evaluate the impact of the information extraction step on classification accuracy. As one could reasonably expect, classification relying on information extraction alone doesn't perform very well but when used as a complement to the statistical approach it significantly improves the classification results.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.