Linguistic information in word embeddings

Institut for Nordiske Studier og Sprogvidenskab (NorS)

Linguistic information in word embeddings

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Linguistic information in word embeddings. / Basirat, Ali; Tang, Marc.

Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers. ed. / Jaap van den Herik; Ana Paula Rocha. Cham : Springer Verlag, 2019. p. 492-513 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11352 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Basirat, A & Tang, M 2019, Linguistic information in word embeddings. in J van den Herik & AP Rocha (eds), Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers. Springer Verlag, Cham, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11352 LNAI, pp. 492-513, 10th International Conference on Agents and Artificial Intelligence, ICAART 2018, Funchal, Madeira, Portugal, 16/01/2018. https://doi.org/10.1007/978-3-030-05453-3_23

APA

Basirat, A., & Tang, M. (2019). Linguistic information in word embeddings. In J. van den Herik, & A. P. Rocha (Eds.), Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers (pp. 492-513). Springer Verlag. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11352 LNAI https://doi.org/10.1007/978-3-030-05453-3_23

Vancouver

Basirat A, Tang M. Linguistic information in word embeddings. In van den Herik J, Rocha AP, editors, Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers. Cham: Springer Verlag. 2019. p. 492-513. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11352 LNAI). https://doi.org/10.1007/978-3-030-05453-3_23

Author

Basirat, Ali ; Tang, Marc. / Linguistic information in word embeddings. Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers. editor / Jaap van den Herik ; Ana Paula Rocha. Cham : Springer Verlag, 2019. pp. 492-513 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11352 LNAI).

Bibtex

@inproceedings{d7958b465036498ab81ccc842e364def,

title = "Linguistic information in word embeddings",

abstract = "We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.",

keywords = "Neural network, Nominal classification, Swedish, Word embedding",

author = "Ali Basirat and Marc Tang",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2019.; 10th International Conference on Agents and Artificial Intelligence, ICAART 2018 ; Conference date: 16-01-2018 Through 18-01-2018",

year = "2019",

doi = "10.1007/978-3-030-05453-3_23",

language = "English",

isbn = "9783030054526",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "492--513",

editor = "{van den Herik}, Jaap and Rocha, {Ana Paula}",

booktitle = "Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers",

address = "Germany",

}

RIS

TY - GEN

T1 - Linguistic information in word embeddings

AU - Basirat, Ali

AU - Tang, Marc

N1 - Publisher Copyright: © Springer Nature Switzerland AG 2019.

PY - 2019

Y1 - 2019

N2 - We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.

AB - We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.

KW - Neural network

KW - Nominal classification

KW - Swedish

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=85059677023&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-05453-3_23

DO - 10.1007/978-3-030-05453-3_23

M3 - Article in proceedings

AN - SCOPUS:85059677023

SN - 9783030054526

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 492

EP - 513

BT - Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers

A2 - van den Herik, Jaap

A2 - Rocha, Ana Paula

PB - Springer Verlag

CY - Cham

T2 - 10th International Conference on Agents and Artificial Intelligence, ICAART 2018

Y2 - 16 January 2018 through 18 January 2018

ER -

ID: 366048642