Status已發表Published
iSentenizer-$\mu$: Multilingual Sentence Boundary Detection Model
Wong, D. F.; Chao, L. S.; Zeng, X.
2014-11-01
Source PublicationThe Scientific World Journal
ISSN1537744X
Pages1-10 (SCI: Q2, IF: 1.219)
AbstractSentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new language domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation process has to be restarted from scratch. In this paper, we present a multilingual sentence boundary detection system (iSentenizer-𝜇) for Danish, German, English, Spanish, Dutch, French, Italian, Portuguese, Greek, Finnish, and Swedish languages.The proposed system is able to detect the sentence boundaries of a mixture of different text genres and languages with high accuracy.We employ i+Learning algorithm, an incremental tree learning architecture, for constructing the system. iSentenizer-𝜇, under the incremental learning framework, is adaptable to text of different topics and Roman-alphabet languages, by merging new data into existing model to learn the new knowledge incrementally by revision instead of retraining. The system has been extensively evaluated on different languages and text genres and has been compared against two state-of-the-art SBD systems, Punkt and MaxEnt. The experimental results show that the proposed system outperforms the other systems on all datasets.
KeywordSentence boundary detection sentenizer multi-language sentence tokenization
Language英語English
The Source to ArticlePB_Publication
PUB ID12104
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorWong, D. F.
Recommended Citation
GB/T 7714
Wong, D. F.,Chao, L. S.,Zeng, X.. iSentenizer-$\mu$: Multilingual Sentence Boundary Detection Model[J]. The Scientific World Journal, 2014, 1-10 (SCI: Q2, IF: 1.219).
APA Wong, D. F.., Chao, L. S.., & Zeng, X. (2014). iSentenizer-$\mu$: Multilingual Sentence Boundary Detection Model. The Scientific World Journal, 1-10 (SCI: Q2, IF: 1.219).
MLA Wong, D. F.,et al."iSentenizer-$\mu$: Multilingual Sentence Boundary Detection Model".The Scientific World Journal (2014):1-10 (SCI: Q2, IF: 1.219).
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wong, D. F.]'s Articles
[Chao, L. S.]'s Articles
[Zeng, X.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wong, D. F.]'s Articles
[Chao, L. S.]'s Articles
[Zeng, X.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wong, D. F.]'s Articles
[Chao, L. S.]'s Articles
[Zeng, X.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.