Residential College | false |
Status | 已發表Published |
Multi-view self-attention networks | |
Xu, Mingzhou1; Yang, Baosong2; Wong, Derek F.1; Chao, Lidia S.1 | |
2022-04-06 | |
Source Publication | KNOWLEDGE-BASED SYSTEMS |
ISSN | 0950-7051 |
Volume | 241 |
Abstract | Self-attention networks (SANs) have attracted an amount of research attention for their outstanding performance under the machine translation community. Recent studies proved that SANs can be further improved by exploiting different inductive biases, each of which guides SANs to learn a specific view of the input sentence, e.g., short-term dependencies, forward and backward views, as well as phrasal patterns. However, less studies investigate how these inductive techniques complementarily improve the capability of SANs and this would be an interesting question to be answered. In this paper we selected five inductive biases which are simple and not over parameterized to investigate their complementarily. We further propose multi-view self-attention networks, which jointly learn different linguistic aspects of the input sentence under a unified framework. Specifically, we propose and exploit a variety of inductive biases to regularize the conventional attention distribution. Different views are then aggregated by a hybrid attention mechanism to quantify and leverage the specific views and their associated representation conveniently. Experiments on various translation tasks demonstrate that different views are able to progressively improve the performance of SANs, and the proposed approach outperforms both the strong TRANSFORMER baseline and related models on TRANSFORMER-BASE and TRANSFORMER-BIG settings. Extensive analyses on 10 linguistic probing tasks verify that different views indeed tend to extract distinct linguistic features and our method gives highly effective improvements in their integration. |
Keyword | Self-attention Mechanism Multi-head Attention Mechanism Multi-pattern Linguistics Machine Translation |
DOI | 10.1016/j.knosys.2022.108268 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Artificial Intelligence |
WOS ID | WOS:000788156700018 |
Scopus ID | 2-s2.0-85124215000 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Wong, Derek F. |
Affiliation | 1.NLP CT Laboratory, University of Macau, Macau, China 2.Alibaba Group, Hangzhou, China |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Xu, Mingzhou,Yang, Baosong,Wong, Derek F.,et al. Multi-view self-attention networks[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 241. |
APA | Xu, Mingzhou., Yang, Baosong., Wong, Derek F.., & Chao, Lidia S. (2022). Multi-view self-attention networks. KNOWLEDGE-BASED SYSTEMS, 241. |
MLA | Xu, Mingzhou,et al."Multi-view self-attention networks".KNOWLEDGE-BASED SYSTEMS 241(2022). |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment