, , , , ,

Nowadays, reading a book in an electronic format is preferred rather than an authentic book because of multiple reasons. One of the reasons would be the fact that is easier to search a specific book on the Internet and then read it on a computer, iPad, e-reader or other electronic device. Another reason would be that when you travel, you should not carry heavy books with you, but just your laptop or other device that you use. On the other hand, some people prefer to read from a real book because they consider is more practical or because they want to protect the sense of sight.


To have books in an electronic format, it is necessary to digitize them by scanning. It already exists a big international repository of digital books, namely HathiTrust– which includes books from Google, Internet Archive and a host of digitization programs (York 2010). The main problem, regarding the digitization of published books and unpublished manuscripts, is that through scanning, there may appear some digital image errors which affect the transmission of the information.


In the research domain, some people from the US Institute of Museum and Library Services and The Andrew W. Mellon Foundation studied this problem and designed a model of error to illustrate the gap between the digitization ideal and the realities of repositories acceptance of digitized content. The research was focused on digitization of flawed old source volumes using manual scanning processes or automated scan procedures. The relationship between the source book and the digital surrogate remains very close, although some inevitable traces are produced depending on the scan method used, the quality of the device or other factors related to the book preservation. To answer to the question: “Is the error acceptable?”, there were used four sets of 1000 volumes from more than 20 research libraries which means approximately 350 000 page-images. A group of trained-coders evaluated visually the images and stored for each image an error severity score in a database system. Statistical analysis on the tested dataset showed that only a small part of the digital surrogates are error free – with a very low level of error severity which doesn’t affect the readability of the text. But maybe, the percent of the error is smaller if the source books are scanned when they are in good conditions, not flawed.

Another interesting issue in the digitization is represented by the unpublished manuscripts. The digital Orationes Project, funded by the academy of Finland had the purpose to bring in the academic arena an important unpublished manuscript. The texts include a substantial amount of unpublished sources of English School Drama written in English, Greek and Latin. The aim of the project was to digitize the manuscript in order to be accessed by scholarly users. The digitization also implied creating a tool which allows visualization of the content – handwritten text and images – and an intuitive interface with several useful functions, such as: searching for a word in the handwritten text using a visual recognition method of the letter form, editing the manuscript or translating a passage of text to another language (English, Greek or Latin). This tool was designed in a general way such that it can be used to digitize other manuscripts.


Creating a connection between passages of texts transcriptions and areas in images is very useful, for example, for a digital manuscript which has been heavily corrected or was written long time ago and it is hard to decipher. The absence of text-image links forces the reader to waste time trying to understand or even looking for that fragment in the original document. To have this facilities, there was implemented a tool by the AustESE (Australian electronic scholarly editing) project, which links the text with the image (TILT – Text to Image Linking Tool). The main problems solved were: to display the text-images links at line or word level over the web by overlaying a polygonal form around a word or a passage of text and adding a link to it; to edit and to automatically create links using auto-detection; to store the links in a reusable and efficient form – in XML files.

In conclusion, because of the errors that can disturb the understanding of a digital book, there were created useful tools – like TILT – designed to help when something is unclear. Moreover, there is a tool which offers a search function to facilitate the finding of a certain word, function that can be applied on both printed or handwritten texts. All in all, the digital format of a book is a valuable thing and can be used in various purposes: from reading to processing and preservation.


[1] Surrogacy and Image Error: Transformations in the Value of Digitized Books (http://dh2013.unl.edu/abstracts/ab-363.html)

[2] The Digital Orationes Project: Interfacing a Restoration Manuscript (http://dh2013.unl.edu/abstracts/ab-350.html)

[3] Text to Image Linking Tool (TILT) (http://dh2013.unl.edu/abstracts/ab-112.html)