Residential Collegefalse
Status已發表Published
Toward better Chinese word segmentation for SMT via bilingual constraints
Zeng X.2; Chao L.S.2; Wong D.F.2; Trancoso I.1; Tian L.2
2014
Conference Namethe 52nd Annual Meeting of the Association for Computational Linguistics
Source PublicationProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Volume1
Pages1360–1369
Conference Date2014 June
Conference PlaceBaltimore, Maryland
PublisherAssociation for Computational Linguistics
Abstract

This study investigates on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information, automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs model, trained by the treebank data (labeled), on the bilingual data (unlabeled). The induced word boundary information is encoded as a graph propagation constraint. The constrained model induction is accomplished by using posterior regularization algorithm. The experiments on a Chinese-to-English machine translation task reveal that the proposed model can bring positive segmentation effects to translation quality. © 2014 Association for Computational Linguistics.

DOI10.3115/v1/P14-1128
URLView the original
Indexed By其他
Language英語English
WOS IDWOS:000493814100128
Scopus ID2-s2.0-84906930327
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Instituto Superior Técnico
2.Universidade de Macau
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Zeng X.,Chao L.S.,Wong D.F.,et al. Toward better Chinese word segmentation for SMT via bilingual constraints[C]:Association for Computational Linguistics, 2014, 1360–1369.
APA Zeng X.., Chao L.S.., Wong D.F.., Trancoso I.., & Tian L. (2014). Toward better Chinese word segmentation for SMT via bilingual constraints. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1, 1360–1369.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zeng X.]'s Articles
[Chao L.S.]'s Articles
[Wong D.F.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zeng X.]'s Articles
[Chao L.S.]'s Articles
[Wong D.F.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zeng X.]'s Articles
[Chao L.S.]'s Articles
[Wong D.F.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.