Residential Collegefalse
Status已發表Published
A nonparametric model for online topic discovery with word embeddings
Junyang Chen1; Zhiguo Gong1; Weiwen Liu2
2019-12-01
Source PublicationInformation Sciences
ISSN0020-0255
Volume504Pages:32-47
Abstract

With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.

KeywordClustering Data Mining Nonparametric Model Online Topic Discovery Topic Model Word Embeddings
DOI10.1016/j.ins.2019.07.048
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science
WOS SubjectComputer Science, Information Systems
WOS IDWOS:000483636900003
PublisherELSEVIER SCIENCE INC, STE 800, 230 PARK AVE, NEW YORK, NY 10169
Scopus ID2-s2.0-85068823373
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Corresponding AuthorZhiguo Gong
Affiliation1.State Key Laboratory of Internet of Things for Smart City,Department of Computer Information Science,University of Macau,Macau,China
2.Department of Computer Science and Engineering,The Chinese University of Hong Kong,Hong Kong,Hong Kong
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Junyang Chen,Zhiguo Gong,Weiwen Liu. A nonparametric model for online topic discovery with word embeddings[J]. Information Sciences, 2019, 504, 32-47.
APA Junyang Chen., Zhiguo Gong., & Weiwen Liu (2019). A nonparametric model for online topic discovery with word embeddings. Information Sciences, 504, 32-47.
MLA Junyang Chen,et al."A nonparametric model for online topic discovery with word embeddings".Information Sciences 504(2019):32-47.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Junyang Chen]'s Articles
[Zhiguo Gong]'s Articles
[Weiwen Liu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Junyang Chen]'s Articles
[Zhiguo Gong]'s Articles
[Weiwen Liu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Junyang Chen]'s Articles
[Zhiguo Gong]'s Articles
[Weiwen Liu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.