Nucleus Composition in Transition-based Dependency Parsing

Institut for Nordiske Studier og Sprogvidenskab (NorS)

Nucleus Composition in Transition-based Dependency Parsing

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Nucleus Composition in Transition-based Dependency Parsing. / Nivre, Joakim; Basirat, Ali; Dürlich, Luise; Moss, Adam.

In: Computational Linguistics, Vol. 48, No. 4, 12.2022, p. 849-886.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Nivre, J, Basirat, A, Dürlich, L & Moss, A 2022, 'Nucleus Composition in Transition-based Dependency Parsing', Computational Linguistics, vol. 48, no. 4, pp. 849-886. https://doi.org/10.1162/coli_a_00450

APA

Nivre, J., Basirat, A., Dürlich, L., & Moss, A. (2022). Nucleus Composition in Transition-based Dependency Parsing. Computational Linguistics, 48(4), 849-886. https://doi.org/10.1162/coli_a_00450

Vancouver

Nivre J, Basirat A, Dürlich L, Moss A. Nucleus Composition in Transition-based Dependency Parsing. Computational Linguistics. 2022 Dec;48(4):849-886. https://doi.org/10.1162/coli_a_00450

Author

Nivre, Joakim ; Basirat, Ali ; Dürlich, Luise ; Moss, Adam. / Nucleus Composition in Transition-based Dependency Parsing. In: Computational Linguistics. 2022 ; Vol. 48, No. 4. pp. 849-886.

Bibtex

@article{951e70380985448fb62d01959c70f6bd,

title = "Nucleus Composition in Transition-based Dependency Parsing",

abstract = "Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesni{\`e}re instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesni{\`e}re. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type.",

author = "Joakim Nivre and Ali Basirat and Luise D{\"u}rlich and Adam Moss",

note = "Funding Information: We are grateful to Miryam de Lhoneux, Artur Kulmizev, and Sara Stymne for valuable comments and suggestions. We thank the action editor and the three reviewers for constructive comments that helped us improve the final version. The research presented in this article was supported by the Swedish Research Council (grant 2016-01817). Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.",

year = "2022",

month = dec,

doi = "10.1162/coli_a_00450",

language = "English",

volume = "48",

pages = "849--886",

journal = "Computational Linguistics",

issn = "1530-9312",

publisher = "MIT Press",

number = "4",

}

RIS

TY - JOUR

T1 - Nucleus Composition in Transition-based Dependency Parsing

AU - Nivre, Joakim

AU - Basirat, Ali

AU - Dürlich, Luise

AU - Moss, Adam

N1 - Funding Information: We are grateful to Miryam de Lhoneux, Artur Kulmizev, and Sara Stymne for valuable comments and suggestions. We thank the action editor and the three reviewers for constructive comments that helped us improve the final version. The research presented in this article was supported by the Swedish Research Council (grant 2016-01817). Publisher Copyright: © 2022 Association for Computational Linguistics.

PY - 2022/12

Y1 - 2022/12

N2 - Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type.

AB - Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type.

UR - http://www.scopus.com/inward/record.url?scp=85143253082&partnerID=8YFLogxK

U2 - 10.1162/coli_a_00450

DO - 10.1162/coli_a_00450

M3 - Journal article

AN - SCOPUS:85143253082

VL - 48

SP - 849

EP - 886

JO - Computational Linguistics

JF - Computational Linguistics

SN - 1530-9312

IS - 4

ER -

ID: 366045888