Linguistic information in word embeddings

Institut for Nordiske Studier og Sprogvidenskab (NorS)

Linguistic information in word embeddings

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Basirat, Ali
Marc Tang

We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.

Originalsprog	Engelsk
Titel	Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers
Redaktører	Jaap van den Herik, Ana Paula Rocha
Antal sider	22
Udgivelsessted	Cham
Forlag	Springer Verlag
Publikationsdato	2019
Sider	492-513
ISBN (Trykt)	9783030054526
DOI	https://doi.org/10.1007/978-3-030-05453-3_23
Status	Udgivet - 2019
Eksternt udgivet	Ja
Begivenhed	10th International Conference on Agents and Artificial Intelligence, ICAART 2018 - Funchal, Madeira, Portugal Varighed: 16 jan. 2018 → 18 jan. 2018

Konference

Konference	10th International Conference on Agents and Artificial Intelligence, ICAART 2018
Land	Portugal
By	Funchal, Madeira
Periode	16/01/2018 → 18/01/2018
Sponsor	Institute for Systems and Technologies of Information, Control and Communication (INSTICC)

Navn	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vol/bind	11352 LNAI
ISSN	0302-9743

Bibliografisk note

ID: 366048642