• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Liang Jiye, Qiao Jie, Cao Fuyuan, Liu Xiaolin. A Distributed Representation Model for Short Text Analysis[J]. Journal of Computer Research and Development, 2018, 55(8): 1631-1640. DOI: 10.7544/issn1000-1239.2018.20180233
Citation: Liang Jiye, Qiao Jie, Cao Fuyuan, Liu Xiaolin. A Distributed Representation Model for Short Text Analysis[J]. Journal of Computer Research and Development, 2018, 55(8): 1631-1640. DOI: 10.7544/issn1000-1239.2018.20180233

A Distributed Representation Model for Short Text Analysis

More Information
  • Published Date: July 31, 2018
  • The distributed representation of short texts has become an important task in text mining. However, the direct application of the traditional Paragraph Vector may not be suitable, and the fundamental reason is that it does not make use of the information of corpus in training process, so it can not effectively improve the situation of insufficient contextual information in short texts. In view of this, in this paper we propose a novel distributed representation model for short texts called BTPV (biterm topic paragraph vector). BTPV adds the topic information of BTM (biterm topic model) to the Paragraph Vector model. This method not only uses the global information of corpus, but also perfects the implicit vector of Paragraph Vector with the explicit topic information of BTM. At last, we crawl popular news comments from the Internet as experimental data sets, using K-Means clustering algorithm to compare the models’ representation performance. Experimental results have shown that the BTPV model can get better clustering results compared with the common distributed representation models such as word2vec and Paragraph Vector, which indicates the advantage of the proposed model for short text analysis.
  • Related Articles

    [1]Feng Yuhong, Wu Kunhan, Huang Zhihong, Feng Yangzhou, Chen Huanhuan, Bai Jiancong, Ming Zhong. A Set Similarity Self-Join Algorithm with FP-tree and MapReduce[J]. Journal of Computer Research and Development, 2023, 60(12): 2890-2906. DOI: 10.7544/issn1000-1239.202111239
    [2]Xiao Zhongzheng, Chen Ningjiang, Jia Jionghao, Zhang Wenbo. A Dynamic Replica Management Mechanism Based on File Support Degree[J]. Journal of Computer Research and Development, 2016, 53(2): 431-442. DOI: 10.7544/issn1000-1239.2016.20148327
    [3]Wang Xianghai, Wei Tingting, Zhou Zhiguang, Song Chuanming. Research of Remote Sensing Image Fusion Method Based on the Contourlet Coefficients' Correlativity[J]. Journal of Computer Research and Development, 2013, 50(8): 1778-1786.
    [4]Xiong Gangqiang, Yu Jiande, Xiong Changzhen, Qi Dongxu. Reversible Factorization of U Orthogonal Transform and Image Lossless Coding[J]. Journal of Computer Research and Development, 2012, 49(4): 856-863.
    [5]Wang Junwen, Liu Guangjie, Dai Yuewei, Zhang Zhan, and Wang Zhiquan. Image Forensics for Blur Detection Based on Nonsubsampled Contourlet Transform[J]. Journal of Computer Research and Development, 2009, 46(9): 1549-1555.
    [6]Zhao Xiaoming, Ye Xijian. A New Approach to Ridgelet Transform[J]. Journal of Computer Research and Development, 2008, 45(5): 915-922.
    [7]Wen Guihua. Relative Transformation for Machine Learning[J]. Journal of Computer Research and Development, 2008, 45(4): 612-618.
    [8]Liao Bin, He Fazhi, and Jing Shuxu. Survey of Operational Transformation Algorithms in Real-Time Computer-Supported Cooperative Work[J]. Journal of Computer Research and Development, 2007, 44(2): 326-333.
    [9]Chen Tao, Yi Mo, Liu Zhongxuan, and Peng Silong. Image Fusion at Similar Scale[J]. Journal of Computer Research and Development, 2005, 42(12): 2126-2130.
    [10]Long Gang, Xiao Lei, and Chen Xuequan. Overview of the Applications of Curvelet Transform in Image Processing[J]. Journal of Computer Research and Development, 2005, 42(8): 1331-1337.

Catalog

    Article views (1511) PDF downloads (623) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return