Residential College | false |
Status | 已發表Published |
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis | |
Qingyu Lu1; Liang Ding1; Liping Xie1; Kanjian Zhang1![]() ![]() | |
2023-07 | |
Conference Name | The 61st Annual Meeting of the Association for Computational Linguistics |
Conference Date | 2023-07-08 |
Conference Place | Toronto |
Country | Canada |
Publisher | Association for Computational Linguistics (ACL) |
Abstract | The pretrained language model (PLM) based metrics have been successfully used in evaluating language generation tasks. Recent studies of the human evaluation community show that considering both major errors (e.g. mistranslated tokens) and minor errors (e.g. imperfections in fluency) can produce high-quality judgments. This inspires us to approach the final goal of the automatic metrics (human-like evaluations) by fine-grained error analysis. In this paper, we argue that the ability to estimate sentence confidence is the tip of the iceberg for PLM-based metrics. And it can be used to refine the generated sentence toward higher confidence and more reference-grounded, where the costs of refining and approaching reference are used to determine the major and minor errors, respectively. To this end, we take BARTScore as the testbed and present an innovative solution to marry the unexploited sentence refining capacity of BARTScore and human-like error analysis, where the final score consists of both the evaluations of major and minor errors. Experiments show that our solution consistently improves BARTScore, outperforming top-scoring metrics in 19/25 test settings. Analyses demonstrate our method robustly and efficiently approaches human-like evaluations, enjoying better interpretability. Our code and scripts will be publicly released in https://github.com/Coldmist-Lu/ErrorAnalysis_NLGEvaluation. © 2023 Association for Computational Linguistics. |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Computer Science, Theory & Methods |
WOS ID | WOS:001181086804049 |
Scopus ID | 2-s2.0-85174389907 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Kanjian Zhang |
Affiliation | 1.School of Automation, Southeast University 2.JD Explore Academy 3.University of Macau |
Recommended Citation GB/T 7714 | Qingyu Lu,Liang Ding,Liping Xie,et al. Toward Human-Like Evaluation for Natural Language Generation with Error Analysis[C]:Association for Computational Linguistics (ACL), 2023. |
APA | Qingyu Lu., Liang Ding., Liping Xie., Kanjian Zhang., Derek F. Wong., & Dacheng Tao (2023). Toward Human-Like Evaluation for Natural Language Generation with Error Analysis. . |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment