Linguistic information in word embeddings

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.

OriginalsprogEngelsk
TitelAgents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers
RedaktørerJaap van den Herik, Ana Paula Rocha
Antal sider22
UdgivelsesstedCham
ForlagSpringer Verlag
Publikationsdato2019
Sider492-513
ISBN (Trykt)9783030054526
DOI
StatusUdgivet - 2019
Eksternt udgivetJa
Begivenhed10th International Conference on Agents and Artificial Intelligence, ICAART 2018 - Funchal, Madeira, Portugal
Varighed: 16 jan. 201818 jan. 2018

Konference

Konference10th International Conference on Agents and Artificial Intelligence, ICAART 2018
LandPortugal
ByFunchal, Madeira
Periode16/01/201818/01/2018
SponsorInstitute for Systems and Technologies of Information, Control and Communication (INSTICC)
NavnLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vol/bind11352 LNAI
ISSN0302-9743

Bibliografisk note

Publisher Copyright:
© Springer Nature Switzerland AG 2019.

ID: 366048642