CiteULike is a free online bibliography manager. Register and you can start organising your references online.

A uniform framework for integration of information from the web, Export

Information Systems, Vol. 29, No. 1. (March 2004), pp. 59-91.

Citation Format

[Posts]

View FullText article


FGIS's tags for this article

form-extraction

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

We discuss a system that implements an integrated framework for Web exploration, wrapping, data integration, and querying. Here, the "integration" applies in three aspects: the data model and the functionality, and the architecture. The core of the approach is a unified framework--i.e., data model and language--in which all tasks are performed. We regard the Web and its contents as a unit, represented in a semi-structured, object-oriented data model: the Web structure, given by its hyperlinks, the parse-trees of Web pages, and its contents are all included in the internal world model of the system. Additionally, the application-level model is immediately generated as an overlay of this source-level model. The model is complemented by a rule-based object-oriented language which is extended by Web accessing capabilities and structured document analysis. This language is implemented by a central reasoning engine. The advantage of our unified approach is that the same data manipulation and query language can be used for all tasks, i.e., accessing Web pages, wrapping, data integration, and querying information. Thus, these tasks are not necessarily separated, but can be closely intertwined. Additionally, by reusing the source-level model for generating the application-level model, there is no overhead for communication and mapping between different data formats. In particular, we present a methodology for reusing generic rule patterns for typical extraction, integration, and restructuring tasks. In an abstract sense, the system contains a universal wrapper, which can be applied to arbitrary Web pages that the system considers during information processing. Equipped with suitably intelligent rules, the system can potentially explore initially unknown parts of the Web, thus coping with the steady growth of the Web. We show the practicability of our approach by using the F system (Proceedings of the Workshop on Deductive Databases and Logic Programming (DDLP'98) 30 (1998) 57).


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.