Improving natural language processing with human data: Eye tracking and other data sources reflecting cognitive text processing
Public Defence of PhD thesis by Maria Barrett.
When humans perform everyday tasks like reading, speaking, and writing, they cognitively also complete many of the tasks that natural language processing strives for computers to replicate. The traces of human cognitive processing can be collected in various data sources such as eye tracking during reading, keystroke logs from typing and acoustic cues, where milliseconds matter.
This thesis shows that there is an unused potential for utilizing eye tracking and other data sources reflecting human cognitive processing of text for natural language processing.
This thesis presents several studies where traces of human text processing can be used to improve a broad range of established natural language processing tasks. The tasks span part-of-speech induction, syntactic parsing, sentiment classification, grammatical error detection and detection of abusive language. The thesis furthermore demonstrates some transfer across related languages by using English eye-tracking recordings to improve French part-of-speech induction.
Technology for recording keystroke logs and prosody features is already common. And the recent advancements of low-cost eye tracking technology promise eye-tracking data to be available in larger quantities, also for low-resource languages. Real-world eye-tracking data poses new challenges compared to laboratory data. One study in this thesis presents first evidence that despite the noise and idiosyncrasies, real-world reading data recorded with a consumer-grade eye tracker can be modelled in machine learning models.
Assessment Committee
- Senior Researcher Patrizia Paggio, chair (University of Copenhagen)
- Professor Jonas Kuhn (Universität Stuttgart)
- Lecturer Aline Villavicencio (University of Essex)
Moderator of the defence
- Deputy Head of Department Bolette Sandford Pedersen (University of Copenhagen)
Copies of the thesis will be available for consultation at the following three places:
- At the Information Desk of the Library of the Faculty of Humanities
- In Reading Room East of the Royal Library (the Black Diamond)
- At the Department of Nordic Studies and Linguistics, Njalsgade 136