Towards a Gold Standard for Evaluating Danish Word Embeddings
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Towards a Gold Standard for Evaluating Danish Word Embeddings. / Schneidermann, Nina; Hvingelby, Rasmus; Pedersen, Bolette Sandford.
Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, France : European Language Resources Association, 2020. p. 4756-4765.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Towards a Gold Standard for Evaluating Danish Word Embeddings
AU - Schneidermann, Nina
AU - Hvingelby, Rasmus
AU - Pedersen, Bolette Sandford
PY - 2020
Y1 - 2020
N2 - This paper presents the process of compiling a model-agnostic similarity gold standard for evaluating Danish word embeddings basedon human judgments made by 42 native speakers of Danish. Word embeddings resemble semantic similarity solely by distribution(meaning that word vectors do not reflect relatedness as differing from similarity), and we argue that this generalisation poses a problemin most intrinsic evaluation scenarios. In order to be able to evaluate on both dimensions, our human-generated dataset is thereforedesigned to reflect the distinction between relatedness and similarity. The goal standard is applied for evaluating the "goodness" ofsix existing word embedding models for Danish, and it is discussed how a relatively low correlation can be explained by the fact thatsemantic similarity is substantially more challenging to model than relatedness, and that there seems to be a need for future humanjudgements to measure similarity in full context and along more than a single spectrum.
AB - This paper presents the process of compiling a model-agnostic similarity gold standard for evaluating Danish word embeddings basedon human judgments made by 42 native speakers of Danish. Word embeddings resemble semantic similarity solely by distribution(meaning that word vectors do not reflect relatedness as differing from similarity), and we argue that this generalisation poses a problemin most intrinsic evaluation scenarios. In order to be able to evaluate on both dimensions, our human-generated dataset is thereforedesigned to reflect the distinction between relatedness and similarity. The goal standard is applied for evaluating the "goodness" ofsix existing word embedding models for Danish, and it is discussed how a relatively low correlation can be explained by the fact thatsemantic similarity is substantially more challenging to model than relatedness, and that there seems to be a need for future humanjudgements to measure similarity in full context and along more than a single spectrum.
M3 - Article in proceedings
SP - 4756
EP - 4765
BT - Proceedings of the 12th Language Resources and Evaluation Conference
PB - European Language Resources Association
CY - Marseille, France
Y2 - 13 May 2020 through 15 May 2020
ER -
ID: 241358594