• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Peng Min, Huang Jiajia, Zhu Jiahui, Huang Jimin, Liu Jiping. Mass of Short Texts Clustering and Topic Extraction Based on Frequent Itemsets[J]. Journal of Computer Research and Development, 2015, 52(9): 1941-1953. DOI: 10.7544/issn1000-1239.2015.20140533
Citation: Peng Min, Huang Jiajia, Zhu Jiahui, Huang Jimin, Liu Jiping. Mass of Short Texts Clustering and Topic Extraction Based on Frequent Itemsets[J]. Journal of Computer Research and Development, 2015, 52(9): 1941-1953. DOI: 10.7544/issn1000-1239.2015.20140533

Mass of Short Texts Clustering and Topic Extraction Based on Frequent Itemsets

More Information
  • Published Date: August 31, 2015
  • Short texts generated in social media have the characteristics of volume, velocity, low quality and variety, thus make the vector-space-based clustering methods face the challenges of high-dimensions, features sparsity and noisy disturbing. In this paper, we propose a short texts clustering and topic extraction (STC-TE) framework based on the frequent itemsets mined from the texts. This framework firstly studies the impact of multi-features on the short texts’ quality. Then, a large amount of frequent itemsets are dug out from the high quality short text set via setting a low support level, and a similar itemsets filtering strategy is devised to discard most of the unimportant frequent itemsets. Furthermore, based on the frequent itemsets similarity evaluated by relevant texts, we proposed a cluster self-adaptive spectral clustering (CSA_SC) algorithm to form the itemsets into different topic clusters. At last, the large-scale of short texts are classified into associated clusters according to the topic words extracted from the frequent itemset clusters. The framework is tested on one million of SinaWeibo dataset to evaluate the performance of the important frequent itemset selection and clustering, the topic words extraction, and the large scale of short texts classification. Experimental results show that the STC-TE framework can achieve topic extraction and large-scale short texts clustering with high accuracy.
  • Related Articles

    [1]Dai Weiqi, Li Ming, Zhao Kexuan, Jiang Wenchao, Zhou Weilin, Zou Deqing, Jin Hai. Blockchain Marketing Label Trading System for E-Commerce Alliance[J]. Journal of Computer Research and Development, 2025, 62(1): 269-280. DOI: 10.7544/issn1000-1239.202330217
    [2]Chen Xiao, Huang Muhong, Tian Yifan, Wang Yan, Cao Sheng, Zhang Xiaosong. Internet of Vehicles Data Sharing Scheme via Blockchain Sharding[J]. Journal of Computer Research and Development, 2024, 61(9): 2246-2260. DOI: 10.7544/issn1000-1239.202330899
    [3]Lu Yuxuan, Kong Lanju, Zhang Baochen, Min Xinping. MC-RHotStuff: Multi-Chain Oriented HotStuff Consensus Mechanism Based on Reputation[J]. Journal of Computer Research and Development, 2024, 61(6): 1559-1572. DOI: 10.7544/issn1000-1239.202330195
    [4]Zhang Baochen, Huang Yue, Kong Lanju, Li Qingzhong, Li Wenquan, Guo Qiuman. A Trustworthy and Fair Blockchain Framework Supporting Adaptive Federated Learning Task[J]. Journal of Computer Research and Development, 2023, 60(11): 2504-2519. DOI: 10.7544/issn1000-1239.202330274
    [5]Wang Yang, Shen Shiyu, Zhao Yunlei, Wang Mingqiang. Comparisons and Optimizations of Key Encapsulation Mechanisms Based on Module Lattices[J]. Journal of Computer Research and Development, 2020, 57(10): 2086-2103. DOI: 10.7544/issn1000-1239.2020.20200452
    [6]Wang Zuan, Tian Youliang, Yue Chaoyue, Zhang Duo. Consensus Mechanism Based on Threshold Cryptography Scheme[J]. Journal of Computer Research and Development, 2019, 56(12): 2671-2683. DOI: 10.7544/issn1000-1239.2019.20190053
    [7]Wei Songjie, Li Shuai, Mo Bing, Wang Jiahe. Regional Cooperative Authentication Protocol for LEO Satellite Networks Based on Consensus Mechanism[J]. Journal of Computer Research and Development, 2018, 55(10): 2244-2255. DOI: 10.7544/issn1000-1239.2018.20180431
    [8]Liu Yiran, Ke Junming, Jiang Han, Song Xiangfu. Improvement of the PoS Consensus Mechanism in Blockchain Based on Shapley Value[J]. Journal of Computer Research and Development, 2018, 55(10): 2208-2218. DOI: 10.7544/issn1000-1239.2018.20180439
    [9]Yang Hongyong, Cao Kecai, and Zhang Siying. Flocking Movement of Delayed Multi-Agent Systems with Leader-Following[J]. Journal of Computer Research and Development, 2011, 48(2): 203-208.
    [10]Lin Jianning, Wu Huizhong. Research on a Trust Model Based on the Subjective Logic Theory[J]. Journal of Computer Research and Development, 2007, 44(8): 1365-1370.
  • Cited by

    Periodical cited type(2)

    1. 李学成,王力. 新型水果切片机结构的发展研究. 南方农机. 2020(02): 3+5 .
    2. 方旭东,吴俊杰. 基于忆阻器的计算存储融合体系结构研究进展. 计算机工程与科学. 2020(11): 1929-1940 .

    Other cited types(6)

Catalog

    Article views (2150) PDF downloads (2051) Cited by(8)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return