The Danish Gigaword Corpus
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Dokumenter
- 2021.nodalida-main.46v2
Forlagets udgivne version, 168 KB, PDF-dokument
Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers{'} socio-economic status, and Danish dialects.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) |
Antal sider | 9 |
Forlag | Linköping University Electronic Press |
Publikationsdato | 2021 |
Sider | 413-421 |
Status | Udgivet - 2021 |
Links
- https://www.aclweb.org/anthology/2021.nodalida-main.46
Forlagets udgivne version
Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk
Ingen data tilgængelig
ID: 270555110