Lexical and morpho-syntactic features in word embeddings a case study of nouns in Swedish

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

We apply real-valued word vectors combined with two different types of classifiers (linear discriminant analysis and feed-forward neural network) to scrutinize whether basic nominal categories can be captured by simple word embedding models. We also provide a linguistic analysis of the errors generated by the classifiers. The targeted language is Swedish, in which we investigate three nominal aspects: uter/neuter, common/proper, and count/mass. They represent respectively grammatical, semantic, and mixed types of nominal classification within languages. Our results show that word embeddings can capture typical grammatical and semantic features such as uter/neuter and common/proper nouns. Nevertheless, the model encounters difficulties to identify classes such as count/mass which not only combine both grammatical and semantic properties, but are also subject to conversion and shift. Hence, we answer the call of the Special Session on Natural Language Processing in Artificial Intelligence by approaching the topic of interfaces between morphology, lexicon, semantics, and syntax via interdisciplinary methods combining machine learning of language and general linguistics.

OriginalsprogEngelsk
TitelICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence
RedaktørerAna Paula Rocha, Jaap van den Herik
Antal sider12
ForlagSCITEPRESS (Science and Technology Publications, Lda.)
Publikationsdato2018
Sider663-674
ISBN (Elektronisk)9789897582752
DOI
StatusUdgivet - 2018
Begivenhed10th International Conference on Agents and Artificial Intelligence, ICAART 2018 - Funchal, Madeira, Portugal
Varighed: 16 jan. 201818 jan. 2018

Konference

Konference10th International Conference on Agents and Artificial Intelligence, ICAART 2018
LandPortugal
ByFunchal, Madeira
Periode16/01/201818/01/2018
SponsorInstitute for Systems and Technologies of Information, Control and Communication (INSTICC)
NavnICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence
Vol/bind2

Bibliografisk note

Publisher Copyright:
Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

ID: 366046241