Residential College | false |
Status | 已發表Published |
Learning deep transformer models for machine translation | |
Wang, Qiang1; Li, Bei1; Xiao, Tong1; Zhu, Jingbo1,2; Li, Changliang3; Wong, Derek F.4; Chao, Lidia S.4 | |
2020 | |
Conference Name | 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 |
Source Publication | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference |
Pages | 1810-1822 |
Conference Date | 28 July-2 August 2019 |
Conference Place | Florence |
Abstract | Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT'16 English-German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4~2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big. |
URL | View the original |
Indexed By | CPCI-S ; CPCI-SSH |
Language | 英語English |
WOS Research Area | Computer Science ; Linguistics |
WOS Subject | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Linguistics |
WOS ID | WOS:000493046103030 |
Scopus ID | 2-s2.0-85084061446 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Xiao, Tong |
Affiliation | 1.NLP Lab, Northeastern University, Shenyang, China 2.NiuTrans Co., Ltd., Shenyang, China 3.Kingsoft AI Lab, Beijing, China 4.NLP2CT Lab, University of Macau, Macao |
Recommended Citation GB/T 7714 | Wang, Qiang,Li, Bei,Xiao, Tong,et al. Learning deep transformer models for machine translation[C], 2020, 1810-1822. |
APA | Wang, Qiang., Li, Bei., Xiao, Tong., Zhu, Jingbo., Li, Changliang., Wong, Derek F.., & Chao, Lidia S. (2020). Learning deep transformer models for machine translation. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 1810-1822. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment