Residential Collegefalse
Status已發表Published
Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
Zeng X.2; Wong D.F.2; Chao L.S.2; Trancoso I.1
2013
Conference Namethe 51st Annual Meeting of the Association for Computational Linguistics
Source PublicationProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Volume2
Pages171-176
Conference DateAugust , 2013
Conference PlaceSofia, Bulgaria
Abstract

This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the "segmentation agreements" between the two different types of view are used to overcome the scarcity of the label information on unlabeled data. The proposed approach trains a character-based and word-based model on labeled data, respectively, as the initial models. Then, the two models are constantly updated using unlabeled examples, where the learning objective is maximizing their segmentation agreements. The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data. The segmentation for an input sentence is decoded by using a joint scoring function combining the two induced models. The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-The-art semi-supervised models reported in the literature. © 2013 Association for Computational Linguistics.

URLView the original
Indexed By其他
Language英語English
Fulltext Access
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Instituto Superior Técnico
2.Universidade de Macau
First Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Zeng X.,Wong D.F.,Chao L.S.,et al. Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation[C], 2013, 171-176.
APA Zeng X.., Wong D.F.., Chao L.S.., & Trancoso I. (2013). Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2, 171-176.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zeng X.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zeng X.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zeng X.]'s Articles
[Wong D.F.]'s Articles
[Chao L.S.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.