Please help support CiteULike by taking part in our marketing survey.
CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Applying the OCRopus OCR System to Scholarly Sanskrit Literature Export

Sanskrit Computational Linguistics (2009), pp. 391-402.

Citation Format

[Posts]

View FullText article


X Reviews [Write a review of this article]

X Notes for this article

yaroslavvb has 0 private notes and 1 public note for this article.
  • Ocropus overview
yaroslavvb (public note) - 2009-04-22 02:30:20

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

OCRopus is an open source OCR system currently being developed, intended to be omni-lingual and omni-script. In addition to modern digital library applications, applications of the system include capturing and recognizing classical literature, as well as the large body of research literature about classics. OCRopus advances the state of the art in a number of ways, including the ability easily to plug in new text recognition and layout analysis modules, the use of adaptive and user extensible character recognition, and statistical and trainable layout analysis. Of particular interest for computational linguistics applications is the consistent use of probability estimates throughout the system and the use of weighted finite state transducers to represent both alternative recognition hypotheses and statistical language models. In this paper, I first give an overview of these technologies and their relevance to digital library applications in the humanities, and then focus on the use of statistical language models and their use for the integration of OCR output with subsequent computational linguistic and information extraction modules.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.