高级检索

    一种改进的自适应文本信息过滤模型

    An Improved Model for Adaptive Text Information Filtering

    • 摘要: 自适应信息过滤技术能够帮助用户从Web等信息海洋中获得感兴趣的内容或过滤无关垃圾信息.针对现有自适应过滤系统的不足,提出了一种改进的自适应文本信息过滤模型.模型中提供了两种相关性检索机制,在此基础上改进了反馈算法,并采用了增量训练的思想,对过滤中的自适应学习机制也提出了新的算法.基于本模型的系统在相关领域的国际评测中取得良好成绩.试验数据说明各项改进是有效的,新模型具有更高的性能.

       

      Abstract: The information filtering technology is usually used to track favorite topics and eliminate garbage content from information stream. The adaptive information filtering, which requires little initial training resource and can actively improve itself in filtering process, provides a better performance and convenience than the old way. But there are still some difficulties in training and adaptive learning. In this paper, an improved filtering model for adaptive text filtering is proposed. In this model, two retrieval/feedback mechanisms are used respectively. One is based on vector space model and Rocchio feedback algorithm, and another mechanism is derived from a latest language model IR system. Based on them, an incremental learning method using multi-step pseudo feedback is introduced in profile training to keep a minimal bias to the original topic, and an adaptive profile adjusting mechanism in filtering process, which newly takes into account the document distribution and the decay rate of the topic feature, is also developed. The running system constructed using the new model got a high evaluation score in related international contest, indicating that the improvements in the filtering model are effective.

       

    /

    返回文章
    返回