David Cooper, independent scholar, Athanasios Velios and Nicholas Pickwoad, Ligatus, University of the Arts, London

An online catalogue of the manuscripts of the monastery of Saint Catherine, Sinai

Saint Catherine's monastery library is well-known as one of the most important repositories of manuscripts, chiefly relating to the Christian religion, in the world. The collections are especially rich in manuscripts of the first millennium. There are three classes of manuscripts: the Old Collection, containing c. 4300 items, the New Finds, containing c. 1500 items discovered during works in the monastery in the 1970s, and those items formerly in the monastery, now dispersed among numerous repositories across the world, believed to amount to several hundred fragments and volumes.

The combined collections contain items written in Greek, Arabic, Syriac, Slavonic, Georgian, Latin, Ethiopian, Polish, Armenian, Caucasian Albanian and Coptic.

All but a very few of these have some sort of catalogue record which describes them, even if in some cases at only a very rudimentary level. These records have been written over the last 140 years, and, in the case of those manuscripts still held in the monastery, are in a variety of languages (including English, Greek, French, Russian, Latin, Arabic and Georgian). The dispersed items will in most cases have records in the language of the country of the holding repository.

At a meeting held on 9th of September 2005, the Advisory Panel of the St. Catherine Foundation suggested that a comprehensive catalogue of the manuscripts was desirable. The existing printed catalogues of the MSS are in many cases collectors items, and there are very few locations where more than a small selection can be consulted.

The purpose of this project is to make all existing catalogue information available as part of an online resource, which could be enriched in some cases by images, and be the basis for a crowd-sourced, moderated, improved catalogue.

The printed records are, where possible, scanned and read by OCR, otherwise the text is manually entered. The corrected text is then marked up in XML according to the guidelines of the TEI P5. The intention is for the resource to be available in English at first, with Greek and other language versions to be available in due course. One record is constructed for each separate item, and where multiple items form parts of what was initially one volume, the relationships are recorded in each record. Records are transformed by XSLT to HTML5 for display via the Web.

The complexity of these records can be quite daunting, and it was decided from the beginning that to keep control over the data and to assist search and retrieval in the eventual catalogue, authority files would be made of as many as possible of the elements of the data – for example personal names, places, organisations, bibliographic citations etc. - and these cover as many as possible of the various languages of the catalogues, especially important with the many (and sometimes conflicting) transliterations from the many scripts in the original texts. This is done using the methods pioneered by Matthew Driscoll at the University of Copenhagen for the catalogues of the Nordic manuscripts.

By dividing the record into separate elements which can receive entries from authority files, it is then possible to customise searching using these entries from within select and autocomplete fields on an online form. Each field can be considered as a facet and therefore as well as keyword searching, faceted searching can be implemented.

Initially, a small number of records will also have sets of page images attached, with a navigation system to control them. Thus there will be scope for work by any competent person anywhere to suggest additional and/or corrected information for inclusion in the catalogue; this is to be moderated by a panel of specialists selected by the Advisory Panel, and incorporated, with attribution of responsibility, wherever possible.

Because of the large size of the project, and in order to collect data on the time, expertise and cost involved for the average record in each of defined classes, the initial plan is to work on just a few hundred records, spread over all types of input printed material, so as to be able to estimate more accurately the resources required for the complete catalogue, and it is the results of this pilot project which are the subject of this paper.