After going through the whole programmes of DH2012 conference briefly, I notice many of them are focusing on how to utilize automation tools to help human to understand ancient manuscripts. The preliminary processing works are much time-saving for scholars to analyze ancient handwriting without doing repetitious recognition via naked-eyes.

The first one is Modeling Medieval Handwriting: A New Approach to Digital Palaeography with online video here. This programme mainly focuses how on finding examples of particular ways of writing given letters. In other words, if a letter “q” is written with a long tail or “descender” in a more professional way to express, this programme is to help scholars to locate other letters “q” with similar characteristics. The aggregated examples with same or similar writing characters could be searched and displayed according some specified queries.  All the letters in ancient manuscripts would be tagged by their writing characters, for example, describing how ascenders (in “l, b, h”) or descenders (in “p, q”) goes or how the bowls (in “a, p, b”) looks like. These characters could be searched and located in the original manuscripts to help scholars to perform further analysis. Furthermore, some advanced usages could be applied as well to perform comparison works in one manuscript or among different manuscripts, such as which features are likely to be written together or getting some common characters in one manuscript.

The second one is Retrieving Writing Patterns From Historical Manuscripts Using Local Descriptors with online video here. Different from the previous one which mostly studies on western ancient letters, this research is performed with the experiments on Chinese and Arabic handwritings. Another novel point in this research is that they are trying to recognize the characters completely by computers using their visual features. They try to mark a Chinese character, for example, with several “interest points” which possibly are the “cornerness” of the character.

Example for character's interest points.
Take this Chinese character as an example that it marks 31 interests points as the crucial recognition features for it.

After calculating and locating all the interest points in a manuscript, it will try to find the characters having the similar  visual distribution of interest points by applied a probability model on them, which will soon be identified as one same character.

All the work would be delivered with computers with reasonable computing time which is 120 seconds for 1036 Chinese characters in a case manuscript, which frees human from endless searching and recognizing works. However, this method will made some mistakes as an example illustrated in the abstract.

Experiments results

In this case, there is no false negative which means there is no missing in this manuscripts and one false positive which means there is a character is different to others. (Although as a native Chinese, I still cannot tell what character the false positive one should corresponds to. But for certain, it is not the same character with “故” analyzed above.)

The previous two programmes are both based on visual recognition and the third one expands more contents to semantics. Based on the fact that human sometimes use context to guess or distinguish the ambiguous letter, it is certainly true that semantics plays important rule to manuscript recognition.  Formal Semantic Modeling for Human and Machine-based Decoding of Medieval Manuscripts introduced by Ritsema van Eck, Marianne Petra, Rijksuniversiteit Groningen and Schomaker, Lambert, Rijksuniversiteit Groningen are under the project Monk in which a framework for disclosing the semantics contained in the digital image of the manuscript page was attempted. Both geometrical and logical representation are considered to build up a semantic model to distinct different contents.

These three programmes mainly focus on how to distinguish characters in ancient manuscripts, even though they are applying different methodologies. The first two projects are quite concentrated on visual recognition and aggregation which will greatly help scholars or perform further analysis while the last one combines different approaches to perform semantic processing. Anyway, these methodologies aimed to free scholars from boring and repetitious character recognitions to much advanced research work are good news for palaeographists.