Identifying Parties in Manifestos and Parliament Speeches

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt


This paper addresses differences in the word use of two left-winged and two right-winged Danish parties, and how these differences,
which reflect some of the basic stances of the parties, can be used to automatically identify the party of politicians from their speeches.
In the first study, the most frequent and characteristic lemmas in the manifestos of the political parties as well as their language
complexity are analysed. The analysis shows inter alia that the most frequently occurring lemmas in the manifestos reflect either
the ideology or the position of the parties towards specific subjects, confirming for Danish preceding studies of English and German
manifestos. Successively, we scaled our analysis applying NLP methods to the transcribed speeches by members of the same parties
in the Parliament (Hansards) and trained machine learning algorithms in order to determine to what extent it is possible to predict the party of the politicians from the speeches. The speeches are a subset of the Danish Parliament corpus 2009–2017. The best results of the classification experiments gave a weighted F1-score of 0.57. These results are significantly better than the results obtained by the majority classifier (weighted F1-score = 0.11) and by chance results. They show that the party of the politicians can be distinguished from their speeches in nearly 60% of the cases, even if they debate about the same subjects and thus often use the same terminology. In the future, we will include the subject of the speeches in the prediction experiments.
TitelCreating, Using and Linking of Parliamentary Corpora with Other Types of Political Discourse ( ParlaCLARIN II) : LREC2020 Workshop PARLACLARIN 2
RedaktørerDarja Fiser, Maria Eskevich, Franciska de Jong
ForlagEuropean Language Resources Association
ISBN (Trykt)9791095546474
ISBN (Elektronisk)9791095546474
StatusUdgivet - 2020

Antal downloads er baseret på statistik fra Google Scholar og

Ingen data tilgængelig

ID: 241213825