Linking Corpus Data to an Excerpt-based Historical Dictionary

Institut for Nordiske Studier og Sprogvidenskab (NorS)

Linking Corpus Data to an Excerpt-based Historical Dictionary

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Linking Corpus Data to an Excerpt-based Historical Dictionary. / Wills, Tarrin Jon; Jóhannsson, Ellert Þór; Battista, Simonetta.

Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts 17-21 July 2018, Ljubljana. red. / Jaka Čibej; Vojko Gorjanc; Iztok Kosem. EURALEX, 2018. s. 979-988.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Wills, TJ, Jóhannsson, EÞ & Battista, S 2018, Linking Corpus Data to an Excerpt-based Historical Dictionary. i J Čibej, V Gorjanc & I Kosem (red), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts 17-21 July 2018, Ljubljana. EURALEX, s. 979-988, EURALEX International Congress, Ljubjana, Slovenien, 16/07/2018. <https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/book/118>

APA

Wills, T. J., Jóhannsson, E. Þ., & Battista, S. (2018). Linking Corpus Data to an Excerpt-based Historical Dictionary. I J. Čibej, V. Gorjanc, & I. Kosem (red.), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts 17-21 July 2018, Ljubljana (s. 979-988). EURALEX. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/book/118

Vancouver

Wills TJ, Jóhannsson EÞ, Battista S. Linking Corpus Data to an Excerpt-based Historical Dictionary. I Čibej J, Gorjanc V, Kosem I, red., Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts 17-21 July 2018, Ljubljana. EURALEX. 2018. s. 979-988

Author

Wills, Tarrin Jon ; Jóhannsson, Ellert Þór ; Battista, Simonetta. / Linking Corpus Data to an Excerpt-based Historical Dictionary. Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts 17-21 July 2018, Ljubljana. red. / Jaka Čibej ; Vojko Gorjanc ; Iztok Kosem. EURALEX, 2018. s. 979-988

Bibtex

@inproceedings{b88f03619d644edda3fad81eb27e5c1e,

title = "Linking Corpus Data to an Excerpt-based Historical Dictionary",

abstract = "A Dictionary of Old Norse Prose (ONP) is a digital dictionary that derives originally from an excerpt-based index of around 750,000 citations. This paper describes recent attempts to create two-way links between the growing body of digital texts encoded using TEI XML and the dictionary{\textquoteright}s word list, which forms the basis of the published dictionary. The process involves design challenges in bringing together very different digital structures, namely the text in an XML tree structure, and the dictionary in a relational database structure. Because of the very high levels of accuracy demanded by the end-users of the dictionary (particularly researchers in Old Norse studies), the linking process can only be automated for unambiguous cases, with remaining links entered manually. The application and interface that assists this process attempts to minimize the trade-off between automation and accuracy, and adds a range of tools to assist with the human lemmatizing process. We were able to achieve linking of lemmas in 90.4% of instances where the lemma was recorded in the TEI text, with very high levels of accuracy. Where no lemma was recorded, the application allowed an Old Norse scholar to link lemmas to previously unlemmatized words at an average rate of 4-7 seconds per word. ",

author = "Wills, {Tarrin Jon} and J{\'o}hannsson, {Ellert {\TH}{\'o}r} and Simonetta Battista",

year = "2018",

month = jul,

language = "English",

pages = "979--988",

editor = "Jaka {\v C}ibej and Vojko Gorjanc and Iztok Kosem",

booktitle = "Proceedings of the XVIII EURALEX International Congress",

publisher = "EURALEX",

note = "null ; Conference date: 16-07-2018 Through 21-07-2018",

}

RIS

TY - GEN

T1 - Linking Corpus Data to an Excerpt-based Historical Dictionary

AU - Wills, Tarrin Jon

AU - Jóhannsson, Ellert Þór

AU - Battista, Simonetta

N1 - Conference code: XVIII

PY - 2018/7

Y1 - 2018/7

N2 - A Dictionary of Old Norse Prose (ONP) is a digital dictionary that derives originally from an excerpt-based index of around 750,000 citations. This paper describes recent attempts to create two-way links between the growing body of digital texts encoded using TEI XML and the dictionary’s word list, which forms the basis of the published dictionary. The process involves design challenges in bringing together very different digital structures, namely the text in an XML tree structure, and the dictionary in a relational database structure. Because of the very high levels of accuracy demanded by the end-users of the dictionary (particularly researchers in Old Norse studies), the linking process can only be automated for unambiguous cases, with remaining links entered manually. The application and interface that assists this process attempts to minimize the trade-off between automation and accuracy, and adds a range of tools to assist with the human lemmatizing process. We were able to achieve linking of lemmas in 90.4% of instances where the lemma was recorded in the TEI text, with very high levels of accuracy. Where no lemma was recorded, the application allowed an Old Norse scholar to link lemmas to previously unlemmatized words at an average rate of 4-7 seconds per word.

AB - A Dictionary of Old Norse Prose (ONP) is a digital dictionary that derives originally from an excerpt-based index of around 750,000 citations. This paper describes recent attempts to create two-way links between the growing body of digital texts encoded using TEI XML and the dictionary’s word list, which forms the basis of the published dictionary. The process involves design challenges in bringing together very different digital structures, namely the text in an XML tree structure, and the dictionary in a relational database structure. Because of the very high levels of accuracy demanded by the end-users of the dictionary (particularly researchers in Old Norse studies), the linking process can only be automated for unambiguous cases, with remaining links entered manually. The application and interface that assists this process attempts to minimize the trade-off between automation and accuracy, and adds a range of tools to assist with the human lemmatizing process. We were able to achieve linking of lemmas in 90.4% of instances where the lemma was recorded in the TEI text, with very high levels of accuracy. Where no lemma was recorded, the application allowed an Old Norse scholar to link lemmas to previously unlemmatized words at an average rate of 4-7 seconds per word.

M3 - Article in proceedings

SP - 979

EP - 988

BT - Proceedings of the XVIII EURALEX International Congress

A2 - Čibej, Jaka

A2 - Gorjanc, Vojko

A2 - Kosem, Iztok

PB - EURALEX

Y2 - 16 July 2018 through 21 July 2018

ER -

ID: 199754866