Residential Collegefalse
Status已發表Published
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu1; Liang Ding1; Liping Xie1; Kanjian Zhang1; Derek F. Wong3; Dacheng Tao2
2023-07
Conference NameThe 61st Annual Meeting of the Association for Computational Linguistics
Conference Date2023-07-08
Conference PlaceToronto
CountryCanada
PublisherAssociation for Computational Linguistics (ACL)
Abstract

The pretrained language model (PLM) based metrics have been successfully used in evaluating language generation tasks. Recent studies of the human evaluation community show that considering both major errors (e.g. mistranslated tokens) and minor errors (e.g. imperfections in fluency) can produce high-quality judgments. This inspires us to approach the final goal of the automatic metrics (human-like evaluations) by fine-grained error analysis. In this paper, we argue that the ability to estimate sentence confidence is the tip of the iceberg for PLM-based metrics. And it can be used to refine the generated sentence toward higher confidence and more reference-grounded, where the costs of refining and approaching reference are used to determine the major and minor errors, respectively. To this end, we take BARTScore as the testbed and present an innovative solution to marry the unexploited sentence refining capacity of BARTScore and human-like error analysis, where the final score consists of both the evaluations of major and minor errors. Experiments show that our solution consistently improves BARTScore, outperforming top-scoring metrics in 19/25 test settings. Analyses demonstrate our method robustly and efficiently approaches human-like evaluations, enjoying better interpretability. Our code and scripts will be publicly released in https://github.com/Coldmist-Lu/ErrorAnalysis_NLGEvaluation. © 2023 Association for Computational Linguistics.

Indexed ByCPCI-S
Language英語English
WOS Research AreaComputer Science
WOS SubjectComputer Science, Artificial Intelligence ; Computer Science, Information Systems ; Computer Science, Theory & Methods
WOS IDWOS:001181086804049
Scopus ID2-s2.0-85174389907
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorKanjian Zhang
Affiliation1.School of Automation, Southeast University
2.JD Explore Academy
3.University of Macau
Recommended Citation
GB/T 7714
Qingyu Lu,Liang Ding,Liping Xie,et al. Toward Human-Like Evaluation for Natural Language Generation with Error Analysis[C]:Association for Computational Linguistics (ACL), 2023.
APA Qingyu Lu., Liang Ding., Liping Xie., Kanjian Zhang., Derek F. Wong., & Dacheng Tao (2023). Toward Human-Like Evaluation for Natural Language Generation with Error Analysis. .
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Qingyu Lu]'s Articles
[Liang Ding]'s Articles
[Liping Xie]'s Articles
Baidu academic
Similar articles in Baidu academic
[Qingyu Lu]'s Articles
[Liang Ding]'s Articles
[Liping Xie]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Qingyu Lu]'s Articles
[Liang Ding]'s Articles
[Liping Xie]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.