Digitization of the collections at Ømålsordbogen – the Dictionary of Danish Insular Dialects: challenges and opportunities

Ømålsordbogen (the Dictionary of Danish Insular Dialects, henceforth DID) is an historical dictionary giving thorough descriptions of the dialects, i.e. the spoken vernacular of peasants and fishermen, on the Danish isles Seeland, Funen and surrounding islands. It covers the period from 1750 to 1950, the core period being 1850 to 1920. Publishing began in 1992 and the latest volume (11, kurv-lindorm) appeared in 2013 but the project was initiated in 1909 and data collection dates back to the 1920s and 1930s. The project is currently undergoing an extensive process of digitization: old, outdated editing tools have been replaced with modern (database, xml, Unicode), and the old, printed volumes have been extracted to xml as well and are now searchable as a single xml file. Furthermore, the underlying physical data collections are being digitized.
In the following we give a brief account of the digitization process, and we discuss a number of questions and dilemmas that this process gives rise to. The collections underlying the DID project comprise a variety of subcollections characterized by a large heterogeneity in terms of form as well as content. The information on the paper slips are usually densified, often idiosyncratic, and normally complicated to decode, even for other specialists. The digitization process naturally points towards web publication of the collections, either alone or in combination with the edited data, but it also gives rise to a number of questions. The current digitization process being very basic, only adding very few metadata (1-2 or 3), we point to the obvious fact that web publication of the collections presupposes an addition of further, carefully selected metadata, taking different user needs and qualifications into account. We also discuss the relationship between edited and non-edited data in a publication perspective. Some of the paper slips are very difficult to decipher due to handwriting or idiosyncratic densification and we point out that web publication in a raw, i.e. non-edited or non-annotated form, might be more misleading than helpful for a number of users.
TidsskriftCEUR Workshop Proceedings
Sider (fra-til)341-348
Antal sider8
StatusUdgivet - 3 apr. 2018
BegivenhedDigital Humanities in the Nordic Countries 3rd Conference - HELDIG – the Helsinki Centre for Digital Humanities at the University of Helsinki, the Faculty of Arts, Helsinki, Finland
Varighed: 7 mar. 20189 mar. 2018
Konferencens nummer: 3


KonferenceDigital Humanities in the Nordic Countries 3rd Conference
LokationHELDIG – the Helsinki Centre for Digital Humanities at the University of Helsinki, the Faculty of Arts


