Residential College | false |
Status | 已發表Published |
Toward better Chinese word segmentation for SMT via bilingual constraints | |
Zeng X.2; Chao L.S.2; Wong D.F.2; Trancoso I.1; Tian L.2 | |
2014 | |
Conference Name | the 52nd Annual Meeting of the Association for Computational Linguistics |
Source Publication | Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |
Volume | 1 |
Pages | 1360–1369 |
Conference Date | 2014 June |
Conference Place | Baltimore, Maryland |
Publisher | Association for Computational Linguistics |
Abstract | This study investigates on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information, automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs model, trained by the treebank data (labeled), on the bilingual data (unlabeled). The induced word boundary information is encoded as a graph propagation constraint. The constrained model induction is accomplished by using posterior regularization algorithm. The experiments on a Chinese-to-English machine translation task reveal that the proposed model can bring positive segmentation effects to translation quality. © 2014 Association for Computational Linguistics. |
DOI | 10.3115/v1/P14-1128 |
URL | View the original |
Indexed By | 其他 |
Language | 英語English |
WOS ID | WOS:000493814100128 |
Scopus ID | 2-s2.0-84906930327 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | 1.Instituto Superior Técnico 2.Universidade de Macau |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Zeng X.,Chao L.S.,Wong D.F.,et al. Toward better Chinese word segmentation for SMT via bilingual constraints[C]:Association for Computational Linguistics, 2014, 1360–1369. |
APA | Zeng X.., Chao L.S.., Wong D.F.., Trancoso I.., & Tian L. (2014). Toward better Chinese word segmentation for SMT via bilingual constraints. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1, 1360–1369. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment