The Danish Gigaword Corpus
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Documents
- 2021.nodalida-main.46v2
Final published version, 168 KB, PDF document
Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers’ socio-economic status, and Danish dialects.
Original language | English |
---|---|
Title of host publication | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) |
Number of pages | 9 |
Publisher | Linköping University Electronic Press |
Publication date | 2021 |
Pages | 413-421 |
Publication status | Published - 2021 |
Links
- https://www.aclweb.org/anthology/2021.nodalida-main.46
Final published version
Number of downloads are based on statistics from Google Scholar and www.ku.dk
No data available
ID: 270555110