Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags

Li Peng, Wang Bin, Shi Zhiwei, Cui Yachao, and Li Hengxun

Li Peng, Wang Bin, Shi Zhiwei, Cui Yachao, and Li Hengxun. Tag-TextRank: A Webpage Keyword Extraction Method Based on TagsJ. Journal of Computer Research and Development, 2012, 49(11): 2344-2351.

Citation:

Li Peng, Wang Bin, Shi Zhiwei, Cui Yachao, and Li Hengxun. Tag-TextRank: A Webpage Keyword Extraction Method Based on TagsJ. Journal of Computer Research and Development, 2012, 49(11): 2344-2351.

Citation:

Li Peng, Wang Bin, Shi Zhiwei, Cui Yachao, and Li Hengxun. Tag-TextRank: A Webpage Keyword Extraction Method Based on TagsJ. Journal of Computer Research and Development, 2012, 49(11): 2344-2351.

Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags

Li Peng, Wang Bin, Shi Zhiwei, Cui Yachao, and Li Hengxun

Graphical Abstract

Abstract

Abstract

Keyword extraction is to extract representative keywords from texts and has been widely used in most text processing applications. In this paper, we explore the use of tags for improving the performance of webpage keyword extraction task. Specifically, we first analyze the characteristics of bookmarking behavior and find that people usually use the same tags to label multiple topic-related webpages, which is shown by the fact that over 90% of labeled webpages can find relevant webpages through their tag information. Based on the discovery, we propose a method called Tag-TextRank. As an extension of the classic keyword extraction method TextRank, Tag-TextRank calculates the term importance based on a weighted term graph and the edge weight for a term pair is estimated by the statistics of the relevant documents which are introduced by a certain tag of the target webpage. The final importance score for a term is the combination of the above tag dependent importance scores. Tag-TextRank can measure the term relations by utilizing more documents so as to better estimate the term importance. Experimental results on a publicly available corpus show that Tag-TextRank outperforms TextRank on various metrics.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags

Abstract

Catalog

Export File

Citation

Format

Content