CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

HMM-Based Alignment of Inaccurate Transcriptions for Historical Documents

by: A. Fischer, E. Indermuhle, V. Frinken, H. Bunke
In Document Analysis and Recognition (ICDAR), 2011 International Conference on (2011), pp. 53-57, doi:10.1109/icdar.2011.20  Key: citeulike:11539513

Formatted Citation


Show HTML

Likes (beta)

This copy of the article hasn't been liked by anyone yet.

View FullText article


Abstract

For historical documents, available transcriptions typically are inaccurate when compared with the scanned document images. Not only the position of the words and sentences are unknown, but also the correct image transcription may not be matched exactly. An error-tolerant alignment is needed to make the document images amenable to browsing and searching in digital libraries. In this paper, we propose a novel multi-pass alignment method based on Hidden Markov Models (HMM) that combines text line recognition, string alignment, and keyword spotting to cope with word substitutions, deletions, and insertions in the transcription. In a segmentation-free approach, transcriptions of complete pages are aligned with sequences of text line images. On the Parzival data set, results are reported for several degrees of artificial distortions. Both the accuracy and the efficiency of the proposed system are promising for real-world applications.


AlisonBabeu's tags for this article

Citations (CiTO)

No CiTO relationships defined

X There are no reviews yet

X Find related articles with these CiteULike tags

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.