Residential College | false |
Status | 已發表Published |
A nonparametric model for online topic discovery with word embeddings | |
Junyang Chen1; Zhiguo Gong1; Weiwen Liu2 | |
2019-12-01 | |
Source Publication | Information Sciences |
ISSN | 0020-0255 |
Volume | 504Pages:32-47 |
Abstract | With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. |
Keyword | Clustering Data Mining Nonparametric Model Online Topic Discovery Topic Model Word Embeddings |
DOI | 10.1016/j.ins.2019.07.048 |
URL | View the original |
Indexed By | SCIE |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Information Systems |
WOS ID | WOS:000483636900003 |
Publisher | ELSEVIER SCIENCE INC, STE 800, 230 PARK AVE, NEW YORK, NY 10169 |
Scopus ID | 2-s2.0-85068823373 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Corresponding Author | Zhiguo Gong |
Affiliation | 1.State Key Laboratory of Internet of Things for Smart City,Department of Computer Information Science,University of Macau,Macau,China 2.Department of Computer Science and Engineering,The Chinese University of Hong Kong,Hong Kong,Hong Kong |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Junyang Chen,Zhiguo Gong,Weiwen Liu. A nonparametric model for online topic discovery with word embeddings[J]. Information Sciences, 2019, 504, 32-47. |
APA | Junyang Chen., Zhiguo Gong., & Weiwen Liu (2019). A nonparametric model for online topic discovery with word embeddings. Information Sciences, 504, 32-47. |
MLA | Junyang Chen,et al."A nonparametric model for online topic discovery with word embeddings".Information Sciences 504(2019):32-47. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment