Residential College | false |
Status | 已發表Published |
An improvement in cross-language document retrieval based on statistical models | |
Wang L.-Y.; Wong D.F.; Chao L.S. | |
2012-12-01 | |
Conference Name | the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012) |
Source Publication | Proceedings of the 24th Conference on Computational Linguistics and Speech Processing, ROCLING 2012 |
Pages | 144-155 |
Conference Date | 2012 September |
Conference Place | Chung-Li, Taiwan |
Abstract | This paper presents a proposed method integrated with three statistical models including Translation model, Query generation model and Document retrieval model for cross-language document retrieval. Given a certain document in the source language, it will be translated into the target language of statistical machine translation model. The query generation model then selects the most relevant words in the translated version of the document as a query. Finally, all the documents in the target language are scored by the document searching model, which mainly computes the similarities between query and document. This method can efficiently solve the problem of translation ambiguity and query expansion for disambiguation, which are critical in Cross-Language Information Retrieval. In addition, the proposed model has been extensively evaluated to the retrieval of documents that: 1) texts are long which, as a result, may cause the model to over generate the queries; and 2) texts are of similar contents under the same topic which is hard to be distinguished by the retrieval model. After comparing different strategies, the experimental results show a significant performance of the method with the average precision close to 100%. It is of a great significance to both cross-language searching on the Internet and the parallel corpus producing for statistical machine translation systems. |
Keyword | Cross-language Document Retrieval Document Translation-based Statistical Machine Translation Tf-idf |
URL | View the original |
Language | 英語English |
Fulltext Access | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | Universidade de Macau |
First Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Wang L.-Y.,Wong D.F.,Chao L.S.. An improvement in cross-language document retrieval based on statistical models[C], 2012, 144-155. |
APA | Wang L.-Y.., Wong D.F.., & Chao L.S. (2012). An improvement in cross-language document retrieval based on statistical models. Proceedings of the 24th Conference on Computational Linguistics and Speech Processing, ROCLING 2012, 144-155. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment