Advanced Search
    Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.
    Citation: Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.

    A Partial Comparative Cross Collections LDA Model

    • Comparative text mining like spatiotemporal and cross-cultural text mining is concerned with extracting common and unique themes from a set of comparable text collections. State-of-the-art cross collections topic models suffer from the important flaw that they can only analyze the common topics among document collections. We introduce a generative topic model PCCLDA(partial comparative cross collections LDA) for multi-collections CTM to detect both common topics and collection-special topics,and model text more exactly based on hierarchical dirichlet processes. We present a Gibbs sampling for model inference, and evaluate the model by a variety of qualitative and quantitative evaluations including model perplexity and log-likelihood measurements. PCCLDA discovers both common topics among collections and collection special topics, and also shows improvements on model perplexity and Held-Out likehood compared with two main CTM topic models.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return