Separating Lines of Text in Free-Form Handwritten Historical Documents
Citations Over TimeTop 10% of 2006 papers
Abstract
We present an approach to finding (and separating) lines of text in free-form handwritten historical document images. After preprocessing, our method uses the count of foreground/background transitions in a binarized image to determine areas of the document that are likely to be text lines. Alternatively, an adaptive local connectivity map (ALCM) found in the literature can be used for this step of the process. We then use a min-cut/max-flow graph cut algorithm to split up text areas that appear to encompass more than one line of text. After removing text lines containing relatively little text information (or merging them with nearby text lines), we create output images for each line. A grayscale output image is created, as well as a special mask image containing both the foreground and information flagging ambiguous pixels. Foreground pixels that belong to other text lines are removed from the output images to provide cleaner line images useful for further processing. While some refinement is still necessary, the result of early experimentation with our method is encouraging
Related Papers
- → Approach for Preprocessing in Offline Optical Character Recognition (OCR)(2022)12 cited
- A Study on Preprocessing Techniques for the Character Recognition(2014)
- → Unknown-Box Approximation to Improve Optical Character Recognition Performance(2021)1 cited
- → Unknown-box Approximation to Improve Optical Character Recognition\n Performance(2021)1 cited