Multi-view self-attention networks

doi:10.1016/j.knosys.2022.108268

UM > Faculty of Science and Technology > DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE

Residential College	false
Status	已發表Published
	Multi-view self-attention networks
	Xu, Mingzhou1 ; Yang, Baosong2 ; Wong, Derek F.1 ; Chao, Lidia S.1
	2022-04-06
Source Publication	KNOWLEDGE-BASED SYSTEMS
ISSN	0950-7051
Volume	241
Abstract	Self-attention networks (SANs) have attracted an amount of research attention for their outstanding performance under the machine translation community. Recent studies proved that SANs can be further improved by exploiting different inductive biases, each of which guides SANs to learn a specific view of the input sentence, e.g., short-term dependencies, forward and backward views, as well as phrasal patterns. However, less studies investigate how these inductive techniques complementarily improve the capability of SANs and this would be an interesting question to be answered. In this paper we selected five inductive biases which are simple and not over parameterized to investigate their complementarily. We further propose multi-view self-attention networks, which jointly learn different linguistic aspects of the input sentence under a unified framework. Specifically, we propose and exploit a variety of inductive biases to regularize the conventional attention distribution. Different views are then aggregated by a hybrid attention mechanism to quantify and leverage the specific views and their associated representation conveniently. Experiments on various translation tasks demonstrate that different views are able to progressively improve the performance of SANs, and the proposed approach outperforms both the strong TRANSFORMER baseline and related models on TRANSFORMER-BASE and TRANSFORMER-BIG settings. Extensive analyses on 10 linguistic probing tasks verify that different views indeed tend to extract distinct linguistic features and our method gives highly effective improvements in their integration.
Keyword	Self-attention Mechanism Multi-head Attention Mechanism Multi-pattern Linguistics Machine Translation
DOI	10.1016/j.knosys.2022.108268
URL	View the original
Indexed By	SCIE
Language	英語English
WOS Research Area	Computer Science
WOS Subject	Computer Science, Artificial Intelligence
WOS ID	WOS:000788156700018
Scopus ID	2-s2.0-85124215000
Fulltext Access	View Full-Text via DOI View Full-Text via Web of Science View Full-Text via Scopus
Citation statistics
Document Type	Journal article
Collection	DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding Author	Wong, Derek F.
Affiliation	1.NLP CT Laboratory, University of Macau, Macau, China 2.Alibaba Group, Hangzhou, China
First Author Affilication	University of Macau
Corresponding Author Affilication	University of Macau
Recommended Citation GB/T 7714	Xu, Mingzhou,Yang, Baosong,Wong, Derek F.,et al. Multi-view self-attention networks[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 241.
APA	Xu, Mingzhou., Yang, Baosong., Wong, Derek F.., & Chao, Lidia S. (2022). Multi-view self-attention networks. KNOWLEDGE-BASED SYSTEMS, 241.
MLA	Xu, Mingzhou,et al."Multi-view self-attention networks".KNOWLEDGE-BASED SYSTEMS 241(2022).

Files in This Item:
There are no files associated with this item.

If you have any objections to this item, please fill out the form below and the administrator will contact you as soon as possible.
Content:
Email：	*
Affiliation No.
Verification Code:	Refresh

Any comments and suggestions are welcomed.
Title:	*
Content:
Email：	*
Verification Code:	Refresh