Evaluating Word Expansion for Multilingual Sentiment Analysis of Parliamentary Speech

Department of Nordic Studies and Linguistics (NorS)

Evaluating Word Expansion for Multilingual Sentiment Analysis of Parliamentary Speech

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Yana Nikolova
Navarretta, Costanza

This paper replicates and evaluates the word expansion (WE) method for sentiment lexicon generation from Rheault et al. (2016), applying it to two novel corpora of parliamentary speech from Denmark and Bulgaria. GloVe embeddings and vector similarity are leveraged to expand synonym seed lists with domain-specific terms from the speech corpora. The resulting Danish and Bulgarian lexica are compared to other multilingual lexica by analyzing a gold standard of speech excerpts annotated for sentiment. WE correlates best with hand-coded annotations for Danish, while a machine-translated Lexicoder dictionary does best for Bulgarian. WE performance is also found to be very sensitive to processing and scoring techniques, though this is also an issue with the other lexica. Overall, automatic lexicon translation best balances computational complexity and accuracy across both languages, but robust language-agnosticism remains elusive. Theoretical and practical problems of WE are discussed.

Original language	English
Title of host publication	Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Place of Publication	ACL Anthology
Publisher	European Language Resources Association
Publication date	2024
Pages	6557–6563
Publication status	Published - 2024

Department of Nordic Studies and Linguistics

Evaluating Word Expansion for Multilingual Sentiment Analysis of Parliamentary Speech

Links