As the field of digitisation of printed books and handwritten manuscripts is being developed in the Digital Humanities world, an emerging subject is the alignment of transcribed text with the corresponding images.
Considerable efforts have been previously spent in manually transcribing large collections of manuscripts. However, for the transcriptions to be truly useful, there has to necessarily exist a link between the text and the words in the manuscript image, in order to be able to perform systematic analyses in the database. In addition, the direct alignment between words in a manuscript image and the corresponding transcription, could be employed to improve machine learning for automatic recognition of words.
The process of creating links between words in the image, and the corresponding text in the transcription, is known as text alignment. I will present three articles that show different systems to achieve text alignment.The first paper is “Text Line Detection and Transcription Alignment: a Case Study on the Statuti del Doge Tiepolo” . In this article, the authors propose a new method for text alignment starting from a manuscript page and its correspondent transcription (Fig. 1). The manuscript “Statuti del Doge Tiepolo” is taken as reference, and it’s in Latin. The proposed process uses Hidden Markov Models (HMM) for text recognition, which allow the modeling of character shape variations at the word level. The major advantage of this system is that it doesn’t require the segmentation of words into single characters; as a consequence, the process is considerably sped up. However, it starts from the assumption that the transcription is already line-segmented, meaning that each line of the transcription corresponds exactly to one line in the page. The procedure can be divided into four main steps: in the first, the manuscript page is selected, cropping out the nearby pages and everything else outside the page (Fig. 2). The second step consists in the single text lines detection (Fig. 3), followed by the recognition of words, assumed to be continuous sets of characters. The words are contoured with polygons, which are then enlarged by one pixel until every polygon is in touch with at least another one. At this point the polygons are large enough for the alignment step. The system is provided with 53 character models that describe the handwriting present in the manuscript. These models are used to build HMM of the transcribed words in the text, for comparisons with the HMM of the words in the manuscript. If the models match, then the word is aligned. This process was applied to 72 pages of the manuscript “Statuti del Doge Tiepolo” and was observed to be successful, with 98.44% of the words recognized and aligned properly.
In the second article “From Text and Image to Historical Resource: Text-Image Alignment for Digital Humanists”  Stutzmann et al. propose a similar method that utilizes the Handwritten Character Recognition system, based on the Hidden Markov Models. The present procedure starts from the assumption that the transcription is line-segmented, allowing the use of forced alignment; the forced alignment system takes the exact number of words in a line directly from the transcription, and “forces” the number of detected words in the relative manuscript line to correspond to the correct number.
The alignment process consists of a few steps that I will quickly introduce: after a first page recognition, follows line segmentation. The transcribed lines are used to train Hidden Markov Models to align the detected text lines. After this step, new HMM are trained for word recognition, and, with the help of forced alignment, words are aligned to the transcription (Fig. 4). Following the result, a new recognizer is trained. Using this iterative machine learning, the result can improve after each analysis. The present system was tested on two different manuscripts, Graal and Fontenay, and obtained good results, with success percentages of around 80-90%.In the last paper “TILT 2: Text to Image Linking Tool”  TILT 2 is presented, a web-based service that allows the alignment of text images to the corresponding transcriptions. Instead of recognizing the single characters, this service uses the shape of the words for the alignment process. The transcription is assumed to be line segmented(Fig. 5). As far as the page recognition is concerned, the procedure is the same as the other presented processes. However, in simple words, line recognition is achieved through the drawing of lines that cross all the words, without touching each other (Fig. 6). Successively, the word shapes are recognized, and the gaps among horizontally sequenced polygons are measured. A this point, knowing the number of gaps (N) that there are supposed to be in a line (thanks to the line segmentation of the transcription), the first N larger gaps in the line are assumed to be the correct ones; all the other polygons are merged to form word-shapes. This technique, obviously, can only be used for languages that have word division. The last part of the process is the most reliable, as it can correct the previously obtained results: knowing the number of word-shapes and words, TILT 2 can calculate the approximate width of the single characters, thus estimating the expected word length. The algorithm then compares the previously obtained word-shapes with the expected word length to modify the shapes merging or splitting them to match the calculated length. An editing interface allows the recalculation of entire paragraphs between two selected words, or the manual splitting and merging of polygons (Fig. 7). For printed text, a success rate of 98-100% is reported; however, due to frequent variations in spacing in handwritten text, this method presents noticeable limitations for non-printed matter. An alternative method is under development.
Considering the reported processes, important improvements have been made with respect to the methods previously used for text alignment. They have evolved from past systems, where polygons had to be drawn manually around each single word, to automatic processes, allowing a change of scale in the amount of work achievable.
One can see the markedly different objective of the first two projects compared to the last one. The latter, presents a service available to the public, mostly for simple text recognition, while the others have a more complex aim, which is that of aligning transcribed text with large ancient manuscripts, and to implement this alignment technique directly in the handwritten text recognition.
While the first method showed a very high success rate, it is true that it can only be utilized for manuscripts with the same handwriting, since it is based on the character models, which have to be developed for each type of handwriting. The second method is less precise, but can be applied to different manuscripts (with different accuracies for different manuscripts, indeed). In addition, the second method is provided with a self-instructive process, which certainly raises the precision as more analyses are accomplished. Furthermore, this system can be employed for machine learning, to improve the automatic recognition of words.
However, all three methods start from the assumption that the transcribed text is line-segmented, which limits the usage to diplomatic transcriptions only. A diplomatic transcription is one where the text is transcribed exactly as it is written on the manuscript page, including abbreviations and “return” characters.
To conclude, the presented systems make improvements in the field of text-alignment, and show high success rates. It is obvious that further implementations are needed, especially to be able to align any kind of transcription, but this is already an important step. Further studies will be carried out to achieve complete automation and improve the recognition techniques.
- Slimane, F.; Mazzei, A.; Tomasin, L.; Kaplan, F. (2015). Text Line Detection and Transcription Alignment: a Case Study on the Statuti del Doge Tiepolo. Web: http://dh2015.org/abstracts/xml/MAZZEI_Andrea_Text_Line_Detection_and_Transcripti/MAZZEI_Andrea_Text_Line_Detection_and_Transcription_Ali.html – Accessed: 2/Oct/2015
- Stutzmann, D.; Bluche, T.; Lavrentev, A.; Leydier, Y.; Kermorvant, C. (2015). From Text and Image to Historical Resource: Text-Image Alignment for Digital Humanists. Web: http://dh2015.org/abstracts/xml/STUTZMANN_Dominique_From_Text_and_Image_to_Histor/STUTZMANN_Dominique_From_Text_and_Image_to_Historical_R.html – Accessed: 2/Oct/2015
- Schmidt, D. (2015). TILT 2: Text to Image Linking Tool. Web: http://dh2015.org/abstracts/xml/SCHMIDT_Desmond_TILT_2__Text_to_Image_Linking_Too/SCHMIDT_Desmond_TILT_2__Text_to_Image_Linking_Tool.html – Accessed: 2/Oct/2015