Bridge the gap between statistical and hand-crafted grammars

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Bridge the gap between statistical and hand-crafted grammars. / Basirat, Ali; Faili, Heshaam.

In: Computer Speech and Language, Vol. 27, No. 5, 08.2013, p. 1085-1104.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Basirat, A & Faili, H 2013, 'Bridge the gap between statistical and hand-crafted grammars', Computer Speech and Language, vol. 27, no. 5, pp. 1085-1104. https://doi.org/10.1016/j.csl.2013.02.001

APA

Basirat, A., & Faili, H. (2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27(5), 1085-1104. https://doi.org/10.1016/j.csl.2013.02.001

Vancouver

Basirat A, Faili H. Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language. 2013 Aug;27(5):1085-1104. https://doi.org/10.1016/j.csl.2013.02.001

Author

Basirat, Ali ; Faili, Heshaam. / Bridge the gap between statistical and hand-crafted grammars. In: Computer Speech and Language. 2013 ; Vol. 27, No. 5. pp. 1085-1104.

Bibtex

@article{769e4aa532fb4318a60091e039aa5045,
title = "Bridge the gap between statistical and hand-crafted grammars",
abstract = "LTAG is a rich formalism for performing NLP tasks such as semantic interpretation, parsing, machine translation and information retrieval. Depend on the specific NLP task, different kinds of LTAGs for a language may be developed. Each of these LTAGs is enriched with some specific features such as semantic representation and statistical information that make them suitable to be used in that task. The distribution of these capabilities among the LTAGs makes it difficult to get the benefit from all of them in NLP applications. This paper discusses a statistical model to bridge between two kinds LTAGs for a natural language in order to benefit from the capabilities of both kinds. To do so, an HMM was trained that links an elementary tree sequence of a source LTAG onto an elementary tree sequence of a target LTAG. Training was performed by using the standard HMM training algorithm called Baum-Welch. To lead the training algorithm to a better solution, the initial state of the HMM was also trained by a novel EM-based semi-supervised bootstrapping algorithm. The model was tested on two English LTAGs, XTAG (XTAG-Group, 2001) and MICA's grammar (Bangalore et al., 2009) as the target and source LTAGs, respectively. The empirical results confirm that the model can provide a satisfactory way for linking these LTAGs to share their capabilities together.",
keywords = "Hidden Markov model, LTAG, MICA, Tree adjoining grammar, XTAG",
author = "Ali Basirat and Heshaam Faili",
year = "2013",
month = aug,
doi = "10.1016/j.csl.2013.02.001",
language = "English",
volume = "27",
pages = "1085--1104",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press",
number = "5",

}

RIS

TY - JOUR

T1 - Bridge the gap between statistical and hand-crafted grammars

AU - Basirat, Ali

AU - Faili, Heshaam

PY - 2013/8

Y1 - 2013/8

N2 - LTAG is a rich formalism for performing NLP tasks such as semantic interpretation, parsing, machine translation and information retrieval. Depend on the specific NLP task, different kinds of LTAGs for a language may be developed. Each of these LTAGs is enriched with some specific features such as semantic representation and statistical information that make them suitable to be used in that task. The distribution of these capabilities among the LTAGs makes it difficult to get the benefit from all of them in NLP applications. This paper discusses a statistical model to bridge between two kinds LTAGs for a natural language in order to benefit from the capabilities of both kinds. To do so, an HMM was trained that links an elementary tree sequence of a source LTAG onto an elementary tree sequence of a target LTAG. Training was performed by using the standard HMM training algorithm called Baum-Welch. To lead the training algorithm to a better solution, the initial state of the HMM was also trained by a novel EM-based semi-supervised bootstrapping algorithm. The model was tested on two English LTAGs, XTAG (XTAG-Group, 2001) and MICA's grammar (Bangalore et al., 2009) as the target and source LTAGs, respectively. The empirical results confirm that the model can provide a satisfactory way for linking these LTAGs to share their capabilities together.

AB - LTAG is a rich formalism for performing NLP tasks such as semantic interpretation, parsing, machine translation and information retrieval. Depend on the specific NLP task, different kinds of LTAGs for a language may be developed. Each of these LTAGs is enriched with some specific features such as semantic representation and statistical information that make them suitable to be used in that task. The distribution of these capabilities among the LTAGs makes it difficult to get the benefit from all of them in NLP applications. This paper discusses a statistical model to bridge between two kinds LTAGs for a natural language in order to benefit from the capabilities of both kinds. To do so, an HMM was trained that links an elementary tree sequence of a source LTAG onto an elementary tree sequence of a target LTAG. Training was performed by using the standard HMM training algorithm called Baum-Welch. To lead the training algorithm to a better solution, the initial state of the HMM was also trained by a novel EM-based semi-supervised bootstrapping algorithm. The model was tested on two English LTAGs, XTAG (XTAG-Group, 2001) and MICA's grammar (Bangalore et al., 2009) as the target and source LTAGs, respectively. The empirical results confirm that the model can provide a satisfactory way for linking these LTAGs to share their capabilities together.

KW - Hidden Markov model

KW - LTAG

KW - MICA

KW - Tree adjoining grammar

KW - XTAG

UR - http://www.scopus.com/inward/record.url?scp=84891902884&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2013.02.001

DO - 10.1016/j.csl.2013.02.001

M3 - Journal article

AN - SCOPUS:84891902884

VL - 27

SP - 1085

EP - 1104

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

IS - 5

ER -

ID: 366048786