Images in Digital Humanities are of paramount importance. Images form a large portion of digital archives whether they be scanned documents or images of objects such as sculptures and paintings. However managing and processing such images and extracting meaningful information out of such array of pixels poses numerous challenges. Particularly, given that images of digitized artifacts are to a larger extent different from images for example of some scene in both content and scope these images require special treatment and hence traditional image processing and computer vision algorithms might not be applicable right off the shelf. In this blog we are going to talk about some of the challenges faced in processing and managing such images and briefly go through some solutions proposed in selected publications.
Consider an archive consisting of high-definition images of certain artifacts (print, paintings, etc). While such high-definition images provide much clarity and details they are quite cumbersome to navigate and manipulate given their sizes. Hankinson et. al.  proposes a novel framework for efficient and fast interaction with archives of such large images. The proposed framework provides an alternative to the traditional image gallery mode of interaction and makes use of asynchronous web browser technology in making interaction with high-definition images much faster and efficient. The proposed image viewer framework (Diva.js) breaks down large images into smaller tiles and displays only the tiles present in the current viewport instead of displaying the entire image and hence saving a lot on data download. Use of image tiles instead of the entire image makes zooming in and out more efficient since only a portion of the zoomed image (in the current viewport or viewing window) needs to be re-downloaded instead of the entire zoomed and hence larger document. Also as the user scrolls up or down the document new tiles are downloaded on demand. In order to make sure that the framework works on low memory devices such as tablets or smartphones only 3 pages are stored in memory at any given time. Hence pages (or tiles) are dynamically added and removed as the user navigates through the document. The proposed framework also features various interactions with document such as brightness and contrast adjustment (to enhance faded ink against the background for improved legibility), rotation (to read lines perpendicular to the page), highlighting, annotations, etc. These interactions are implemented using HTML to make sure they are supported on web browsers. Such image viewer framework has also been used as a presentation layer for image search systems. Such framework however at the cost of simplifying the client side adds some more requirements on the server-side. It requires a gigapixel image server to serve image tiles in addition to a standard web server (eg, Apache). Also it supports a wide range of image encodings. The asynchronous interaction between web browser (on the client side), web server and image server increases interaction and reduces latency. The framework also supports integration of image documents with optical character recognition software for image search systems where images can be searched for certain keywords.
Now consider a situation where you have a huge archive of digitized print material. It might be of interest to find out given a sample image what other images in the archive are visually similar. Given the advancement of image classification and object recognition algorithms in recent years it might be tempting to approach this problem from object detection point of view, but there is a caveat. While object detection in ordinary images depends on features extracted from images the same technique cannot be employed for digitized images of print materials because such print materials might be rich in various features that are not representative of the objects (texts or figures) in the image. This is due to the fact that the medium contains different textures which gives rise to various features. However such features serve no more than noise and hence are undesirable in image based search of archives. To mitigate this problem Stahmer et. al.  comes up with a novel approach to augment ordinary feature extraction with contour detection to get rid of noise. Such modified features are then used as indexable markers of objects in the image. In the proposed approach first image is normalized (color and b/w images are converted to a common format) before feature extraction. Normalization of image follows feature extraction step where traditional features are combined with contours of objects in the image to produce modified features which represent objects of interest in the image rather than the texture in the medium. After such transformation of images into set of features, a visual dictionary of features is created which is then used for indexing and image retrieval.
As mentioned earlier, making meanings out of these images is a challenging problem. Analyzing these images one by one manually to look for certain information is a daunting task and is unfeasible. Hence the prospect of being able to analyse these large volumes of images in an automated way sounds quite promising. Lorang et. al  addresses the problem of detecting poetic contents in digitized pieces (images) of historic newspapers. Detection of poetic contents in historic newspaper is of immense importance given the valuable information poetic contents provide about the culture, way of lives and development of literature during different time periods in history. Lorang et. al  proposes the use of classifiers using machine learning techniques to reliably detect poetic contents in images. They exploit the fact that poetic contents follow certain visual patterns different from the regular text, such as left (usually smooth) and right margins (usually jagged), line spacing between lines of texts and so on.
In order to train the classifier (Artificial Neural Network proposed as a classification algorithm in the paper) training data is manually extracted from newspapers which consist of three kinds of image snippets : snippets containing no poetic content, snippets where a small portion contains poetic content and snippets containing text with visual cues similar to the poetic content. In the next step image snippets are blurred considering the fact that digitized images of historic newspapers can be noisy or of low quality. After blurring images are segmented into binary images after which various features are extracted and quantized. Such quantization step involves extracting various metrics for various features, for eg. mean and standard deviation of vertical white spacing between adjacent lines of text, number of columns of dark pixels defining the left and right border, etc. Once the images have been transformed into a vector of such feature metrics the classifier is trained and cross validation using 10 fold cross validation is performed. Finally the classifier is used to classify image snippets into containing or not containing poetic contents.
Now lets wrap up by looking at the problems addressed above and the proposed solutions from a comparison point of view. As we saw Hankinson et. al.  proposes a framework for efficient interaction with high-definition images while Stahmer et. al.  addresses a different problem of converting digital images of print artifacts into searchable archives. Similarly Lorang et. al  suggests a machine learning approach to classifying images in order to extract useful information from images in an automated way. Even though these three publications address three different problems they all revolve around the central theme of managing and extracting useful information from images in digital archives. The first two belong to a somehow similar category in that both address the problem of presentation and management of images. Also we saw that the framework proposed by Hankinson et. al. can be used as a presentation layer for image retrieval systems. However the third publication focuses more on “image mining” and applications of machine learning techniques to image classification problems. Whatever be the approach the ultimate goal of such image based systems should be towards making images more expressive of the content they carry. There is no doubt that there’s a huge potential for research in this field for whatever we have achieved up until now in terms of image processing and computer vision capabilities is just a tip of an iceberg.