The ParlaMint corpora of parliamentary proceedings

Research output: Contribution to journalJournal articleResearchpeer-review

Documents

  • Fulltext

    Final published version, 2.13 MB, PDF document

  • Tomaž Erjavec
  • Maciej Ogrodniczuk
  • Petya Osenova
  • Nikola Ljubešic
  • Kiril Simov
  • Andrej Pancur
  • Michał Rudolf
  • Matyáš Kopp
  • Starkaður Barkarson
  • Steinþór Steingrímsson
  • Çagrı Çöltekin
  • Jesse de Does
  • Katrien Depuydt
  • Tommaso Agnoloni
  • Giulia Venturi
  • María Calzada Pérez
  • Luciana D. de Macedo
  • Giancarlo Luxardo
  • Matthew Coole
  • Paul Rayson
  • Vaidas Morkevicius
  • Tomas Krilavicius
  • Roberts Dargis
  • Orsolya Ring
  • Ruben van Heusden
  • Maarten Marx
  • Darja Fiser
This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.
Original languageEnglish
JournalLanguage Resources and Evaluation
Volume57
Pages (from-to)415-448
ISSN1574-020X
DOIs
Publication statusPublished - 2023

ID: 291220591