• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.
Citation: Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.

A Partial Comparative Cross Collections LDA Model

More Information
  • Published Date: September 14, 2013
  • Comparative text mining like spatiotemporal and cross-cultural text mining is concerned with extracting common and unique themes from a set of comparable text collections. State-of-the-art cross collections topic models suffer from the important flaw that they can only analyze the common topics among document collections. We introduce a generative topic model PCCLDA(partial comparative cross collections LDA) for multi-collections CTM to detect both common topics and collection-special topics,and model text more exactly based on hierarchical dirichlet processes. We present a Gibbs sampling for model inference, and evaluate the model by a variety of qualitative and quantitative evaluations including model perplexity and log-likelihood measurements. PCCLDA discovers both common topics among collections and collection special topics, and also shows improvements on model perplexity and Held-Out likehood compared with two main CTM topic models.

Catalog

    Article views (1028) PDF downloads (635) Cited by()
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return