A bootstrapping method for development of Treebank

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

A bootstrapping method for development of Treebank. / Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.

In: Journal of Experimental and Theoretical Artificial Intelligence, Vol. 29, No. 1, 02.01.2017, p. 19-42.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Zarei, F, Basirat, A, Faili, H & Mirain, M 2017, 'A bootstrapping method for development of Treebank', Journal of Experimental and Theoretical Artificial Intelligence, vol. 29, no. 1, pp. 19-42. https://doi.org/10.1080/0952813X.2015.1057239

APA

Zarei, F., Basirat, A., Faili, H., & Mirain, M. (2017). A bootstrapping method for development of Treebank. Journal of Experimental and Theoretical Artificial Intelligence, 29(1), 19-42. https://doi.org/10.1080/0952813X.2015.1057239

Vancouver

Zarei F, Basirat A, Faili H, Mirain M. A bootstrapping method for development of Treebank. Journal of Experimental and Theoretical Artificial Intelligence. 2017 Jan 2;29(1):19-42. https://doi.org/10.1080/0952813X.2015.1057239

Author

Zarei, F. ; Basirat, A. ; Faili, H. ; Mirain, M. / A bootstrapping method for development of Treebank. In: Journal of Experimental and Theoretical Artificial Intelligence. 2017 ; Vol. 29, No. 1. pp. 19-42.

Bibtex

@article{a5eec1657eb045029f68a68907e78747,
title = "A bootstrapping method for development of Treebank",
abstract = "Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085–1104].",
keywords = "annotated corpus, bootstrapping, parser, semi-supervised, supertagging, Treebank",
author = "F. Zarei and A. Basirat and H. Faili and M. Mirain",
note = "Funding Information: This research was in part supported by a grant from IPM [grant number CS1393-4-42]. Publisher Copyright: {\textcopyright} 2015 Taylor & Francis.",
year = "2017",
month = jan,
day = "2",
doi = "10.1080/0952813X.2015.1057239",
language = "English",
volume = "29",
pages = "19--42",
journal = "Journal of Experimental and Theoretical Artificial Intelligence",
issn = "0952-813X",
publisher = "Taylor & Francis",
number = "1",

}

RIS

TY - JOUR

T1 - A bootstrapping method for development of Treebank

AU - Zarei, F.

AU - Basirat, A.

AU - Faili, H.

AU - Mirain, M.

N1 - Funding Information: This research was in part supported by a grant from IPM [grant number CS1393-4-42]. Publisher Copyright: © 2015 Taylor & Francis.

PY - 2017/1/2

Y1 - 2017/1/2

N2 - Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085–1104].

AB - Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085–1104].

KW - annotated corpus

KW - bootstrapping

KW - parser

KW - semi-supervised

KW - supertagging

KW - Treebank

UR - http://www.scopus.com/inward/record.url?scp=84939191382&partnerID=8YFLogxK

U2 - 10.1080/0952813X.2015.1057239

DO - 10.1080/0952813X.2015.1057239

M3 - Journal article

AN - SCOPUS:84939191382

VL - 29

SP - 19

EP - 42

JO - Journal of Experimental and Theoretical Artificial Intelligence

JF - Journal of Experimental and Theoretical Artificial Intelligence

SN - 0952-813X

IS - 1

ER -

ID: 366047203