From raw text to universal dependencies – Look, no tags!

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

  • Miryam de Lhoneux
  • Yan Shao
  • Basirat, Ali
  • Eliyahu Kiperwasser
  • Sara Stymne
  • Yoav Goldberg
  • Joakim Nivre

We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs joint word and sentence segmentation on raw text; the second predicts dependency trees from raw words. The parser bypasses the need for part-of-speech tagging, but uses word embeddings based on universal tag distributions. We achieved a macro-averaged LAS F1 of 65.11 in the official test run and obtained the 2nd best result for sentence segmentation with a score of 89.03. After fixing two bugs, we obtained an unofficial LAS F1 of 70.49.

Original languageEnglish
Title of host publicationRediger CoNLL 2017 - SIGNLL Conference on Computational Natural Language Learning, Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Number of pages11
PublisherAssociation for Computational Linguistics (ACL)
Publication date2017
Pages207-217
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event2017 SIGNLL Conference on Computational Natural Language Learning- CoNLL Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, CoNLL 2017 - Vancouver, Canada
Duration: 3 Aug 20174 Aug 2017

Conference

Conference2017 SIGNLL Conference on Computational Natural Language Learning- CoNLL Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, CoNLL 2017
LandCanada
ByVancouver
Periode03/08/201704/08/2017
SponsorCRACKER project, DFKI Berlin, et al., Google, Inc., text and form, UFAL

Bibliographical note

Funding Information:
We are grateful to the shared task organizers and to Dan Zeman in particular, and we acknowledge the computational resources provided by CSC in Helsinki and Sigma2 in Oslo through NeIC-NLPL (www.nlpl.eu). Our parser will be made available in the NLPL dependency parsing laboratory.

Publisher Copyright:
© 2017 Association for Computational Linguistics.

ID: 379728064