The Danish Gigaword Corpus

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

The Danish Gigaword Corpus. / Strømberg-Derczynski, Leon; Ciosici, Manuel Rafael; Christiansen, Morten H.; Baglini, Rebekah Brita; Dalsgaard, Jacob Aarup; Fusaroli, Riccardo; Henrichsen, Peter Juel; Hvingelby, Rasmus; Kirkedal, Andreas; Kjeldsen, Alex Speed; Ladefoged, Claus; Nielsen, Finn Arup; Madsen, Jens; Petersen, Malte Lau; Rystrøm, Jonathan Hvithamar; Varab, Daniel.

Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press, 2021. s. 413-421.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Strømberg-Derczynski, L, Ciosici, MR, Christiansen, MH, Baglini, RB, Dalsgaard, JA, Fusaroli, R, Henrichsen, PJ, Hvingelby, R, Kirkedal, A, Kjeldsen, AS, Ladefoged, C, Nielsen, FA, Madsen, J, Petersen, ML, Rystrøm, JH & Varab, D 2021, The Danish Gigaword Corpus. i Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press, s. 413-421. <https://www.aclweb.org/anthology/2021.nodalida-main.46>

APA

Strømberg-Derczynski, L., Ciosici, M. R., Christiansen, M. H., Baglini, R. B., Dalsgaard, J. A., Fusaroli, R., Henrichsen, P. J., Hvingelby, R., Kirkedal, A., Kjeldsen, A. S., Ladefoged, C., Nielsen, F. A., Madsen, J., Petersen, M. L., Rystrøm, J. H., & Varab, D. (2021). The Danish Gigaword Corpus. I Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) (s. 413-421). Linköping University Electronic Press. https://www.aclweb.org/anthology/2021.nodalida-main.46

Vancouver

Strømberg-Derczynski L, Ciosici MR, Christiansen MH, Baglini RB, Dalsgaard JA, Fusaroli R o.a. The Danish Gigaword Corpus. I Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press. 2021. s. 413-421

Author

Strømberg-Derczynski, Leon ; Ciosici, Manuel Rafael ; Christiansen, Morten H. ; Baglini, Rebekah Brita ; Dalsgaard, Jacob Aarup ; Fusaroli, Riccardo ; Henrichsen, Peter Juel ; Hvingelby, Rasmus ; Kirkedal, Andreas ; Kjeldsen, Alex Speed ; Ladefoged, Claus ; Nielsen, Finn Arup ; Madsen, Jens ; Petersen, Malte Lau ; Rystrøm, Jonathan Hvithamar ; Varab, Daniel. / The Danish Gigaword Corpus. Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press, 2021. s. 413-421

Bibtex

@inproceedings{da3cde90ac0d4296b8da1a51f43c2351,
title = "The Danish Gigaword Corpus",
abstract = "Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers{\textquoteright} socio-economic status, and Danish dialects.",
author = "Leon Str{\o}mberg-Derczynski and Ciosici, {Manuel Rafael} and Christiansen, {Morten H.} and Baglini, {Rebekah Brita} and Dalsgaard, {Jacob Aarup} and Riccardo Fusaroli and Henrichsen, {Peter Juel} and Rasmus Hvingelby and Andreas Kirkedal and Kjeldsen, {Alex Speed} and Claus Ladefoged and Nielsen, {Finn Arup} and Jens Madsen and Petersen, {Malte Lau} and Rystr{\o}m, {Jonathan Hvithamar} and Daniel Varab",
year = "2021",
language = "English",
pages = "413--421",
booktitle = "Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)",
publisher = "Link{\"o}ping University Electronic Press",

}

RIS

TY - GEN

T1 - The Danish Gigaword Corpus

AU - Strømberg-Derczynski, Leon

AU - Ciosici, Manuel Rafael

AU - Christiansen, Morten H.

AU - Baglini, Rebekah Brita

AU - Dalsgaard, Jacob Aarup

AU - Fusaroli, Riccardo

AU - Henrichsen, Peter Juel

AU - Hvingelby, Rasmus

AU - Kirkedal, Andreas

AU - Kjeldsen, Alex Speed

AU - Ladefoged, Claus

AU - Nielsen, Finn Arup

AU - Madsen, Jens

AU - Petersen, Malte Lau

AU - Rystrøm, Jonathan Hvithamar

AU - Varab, Daniel

PY - 2021

Y1 - 2021

N2 - Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers’ socio-economic status, and Danish dialects.

AB - Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers’ socio-economic status, and Danish dialects.

M3 - Article in proceedings

SP - 413

EP - 421

BT - Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

PB - Linköping University Electronic Press

ER -

ID: 270555110