An algorithm for finding maximal whitespace rectangles at arbitrary orientations for document layout analysis
The analysis of the background structure (whitespace) of page images has become an important technique for physical document layout analysis. Globally maximal whites-pace rectangles have been previously demonstrated to constitute a concise representation of the major layout features of documents. However, previous methods for computing maximal whitespace rectangles were limited to axis-aligned rectangles. This paper presents an algorithm that finds globally maximal whitespace rectangles on page images at arbitrary orientations. The new algorithm eliminates the need for page rotation correction prior to background analysis and can be applied to considerably more complex page layouts than previously possible. The algorithm is resolution independent and takes as input a list of foreground shapes (e.g., character or word bounding boxes or polygons) and a set of parameter ranges; it outputs the N largest non-overlapping maximal whitespace rectangles whose parameters (location, width, height, orientation) fall within the required parameter ranges. Examples of applications of the method to severely skewed documents, as well as the UW3 database, are presented.