Learning deep transformer models for machine translation

UM > Faculty of Science and Technology > DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE

Residential College	false
Status	已發表Published
	Learning deep transformer models for machine translation
	Wang, Qiang 1; Li, Bei 1; Xiao, Tong 1; Zhu, Jingbo 1,2; Li, Changliang 3; Wong, Derek F.4 ; Chao, Lidia S.4
	2020
Conference Name	57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
Source Publication	ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages	1810-1822
Conference Date	28 July-2 August 2019
Conference Place	Florence
Abstract	Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT'16 English-German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4~2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big.
URL	View the original
Indexed By	CPCI-S ; CPCI-SSH
Language	英語English
WOS Research Area	Computer Science ; Linguistics
WOS Subject	Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Linguistics
WOS ID	WOS:000493046103030
Scopus ID	2-s2.0-85084061446
Fulltext Access	View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Conference paper
Collection	DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Xiao, Tong
Affiliation	1.NLP Lab, Northeastern University, Shenyang, China 2.NiuTrans Co., Ltd., Shenyang, China 3.Kingsoft AI Lab, Beijing, China 4.NLP2CT Lab, University of Macau, Macao
Recommended Citation GB/T 7714	Wang, Qiang,Li, Bei,Xiao, Tong,et al. Learning deep transformer models for machine translation[C], 2020, 1810-1822.
APA	Wang, Qiang., Li, Bei., Xiao, Tong., Zhu, Jingbo., Li, Changliang., Wong, Derek F.., & Chao, Lidia S. (2020). Learning deep transformer models for machine translation. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 1810-1822.

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh