Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.
Citation:
Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.
Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.
Citation:
Tan Wentang, Wang Zhenwen, Yin Fengjing, Ge Bin, and Xiao Weidong. A Partial Comparative Cross Collections LDA Model[J]. Journal of Computer Research and Development, 2013, 50(9): 1943-1953.
Comparative text mining like spatiotemporal and cross-cultural text mining is concerned with extracting common and unique themes from a set of comparable text collections. State-of-the-art cross collections topic models suffer from the important flaw that they can only analyze the common topics among document collections. We introduce a generative topic model PCCLDA(partial comparative cross collections LDA) for multi-collections CTM to detect both common topics and collection-special topics,and model text more exactly based on hierarchical dirichlet processes. We present a Gibbs sampling for model inference, and evaluate the model by a variety of qualitative and quantitative evaluations including model perplexity and log-likelihood measurements. PCCLDA discovers both common topics among collections and collection special topics, and also shows improvements on model perplexity and Held-Out likehood compared with two main CTM topic models.