Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Department of Nordic Studies and Linguistics (NorS)

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

W19-4711
Final published version, 302 KB, PDF document

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

Original language	English
Title of host publication	Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Publisher	Association for Computational Linguistics
Publication date	2019
Pages	86-91
DOIs	https://doi.org/10.18653/v1/W19-4711
Publication status	Published - 2019
Event	Computational Approaches to Historical Language Change 2019: Workshop co-located with ACL 2019 - Florence, Italy Duration: 2 Aug 2019 → 2 Aug 2019 https://languagechange.org/events/2019-acl-lcworkshop/

Workshop

Workshop	Computational Approaches to Historical Language Change 2019
Land	Italy
By	Florence
Periode	02/08/2019 → 02/08/2019
Internetadresse	https://languagechange.org/events/2019-acl-lcworkshop/

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 227472498