Text Segmentation Based on PLSA Model

Shi Jing and Dai Guozhong. Text Segmentation Based on PLSA ModelJ. Journal of Computer Research and Development, 2007, 44(2): 242-248.

Citation:

Shi Jing and Dai Guozhong. Text Segmentation Based on PLSA ModelJ. Journal of Computer Research and Development, 2007, 44(2): 242-248.

Citation:

Shi Jing and Dai Guozhong. Text Segmentation Based on PLSA ModelJ. Journal of Computer Research and Development, 2007, 44(2): 242-248.

Abstract

Text segmentation is very important for many fields including information retrieval, summarization, language modeling, anaphora resolution and so on. Text segmentation based on PLSA associates different latent topics with observable pairs of word and sentence. In the experiments, Chinese whole sentences are taken as elementary blocks. Variety of similarity metrics and several approaches to discovering boundaries are tried. The influences of repetition of unknown words in adjacent sentences on similarity values are considered. The best results show the error rate is 6.06%, which is far lower than that of other algorithms of text segmentation.

FullText(HTML)

Turn off MathJax

Article Contents

Export File