Residential Collegefalse
Status已發表Published
Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition
Dong, Guan Nan1; Pun, Chi Man1; Zhang, Zheng2
2022
Source PublicationIEEE Transactions on Circuits and Systems for Video Technology
ISSN1051-8215
Volume32Issue:9Pages:6472-6485
Abstract

Speech emotion recognition (SER) is a non-trivial task for humans, while it remains challenging for automatic SER due to the linguistic complexity and contextual distortion. Notably, previous automatic SER systems always regarded multi-modal information and temporal relations of speech as two independent tasks, ignoring their association. We argue that the valid semantic features and temporal relations of speech are both meaningful event relationships. This paper proposes a novel temporal relation inference network (TRIN) to help tackle multi-modal SER, which fully considers the underlying hierarchy of phonetic structure and its associations between various modalities under the sequential temporal guidance. Mainly, we design a temporal reasoning calibration module to imitate real and abundant contextual conditions. Unlike the previous works, which assume all multiple modalities are related, it infers the dependency relationship between the semantic information from the temporal level and learns to handle the multi-modal interaction sequence with a flexible order. To enhance the feature representation, an innovative temporal attentive fusion unit is developed to magnify the details embedded in a single modality from semantic level. Meanwhile, it aggregates the feature representation from both the temporal and semantic levels to maximize the integrity of feature representation by an adaptive feature fusion mechanism to selectively collect the implicit complementary information to strengthen the dependencies between different information subspaces. Extensive experiments conducted on two benchmark datasets demonstrate the superiority of our TRIN method against some state-of-the-art SER methods.

KeywordCognition Correlation Emotion Recognition Feature Extraction Hidden Markov Models Multi-modal Learning Relation Inference Network Speech Emotion Recognition Speech Recognition Task Analysis Temporal Learning
DOI10.1109/TCSVT.2022.3163445
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaEngineering
WOS SubjectEngineering, Electrical & Electronic
WOS IDWOS:000849300000061
Scopus ID2-s2.0-85127503787
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorPun, Chi Man
Affiliation1.Department of Computer and Information Science, University of Macau, Macau 999078, China.
2.Department of Computer and Information Science, University of Macau, Macau 999078, China, and Harbin Institute of Technology, Shenzhen, China.
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Dong, Guan Nan,Pun, Chi Man,Zhang, Zheng. Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9), 6472-6485.
APA Dong, Guan Nan., Pun, Chi Man., & Zhang, Zheng (2022). Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32(9), 6472-6485.
MLA Dong, Guan Nan,et al."Temporal Relation Inference Network for Multi-modal Speech Emotion Recognition".IEEE Transactions on Circuits and Systems for Video Technology 32.9(2022):6472-6485.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Dong, Guan Nan]'s Articles
[Pun, Chi Man]'s Articles
[Zhang, Zheng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Dong, Guan Nan]'s Articles
[Pun, Chi Man]'s Articles
[Zhang, Zheng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Dong, Guan Nan]'s Articles
[Pun, Chi Man]'s Articles
[Zhang, Zheng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.