高级检索

    企业搜索引擎个性化表示与结果排序算法研究

    Personalized Representation and Rank Algorithm for Enterprise Search Engines

    • 摘要: 针对企业搜索引擎提出一种基于本地文档库的个性化表示与结果排序算法,以帮助用户找到真正感兴趣的结果.首先,采用聚类分析对用户浏览的历史文档聚类;其次,采用模糊推理技术对所形成的分类进行分析,发现用户对各分类的喜好程度;再次,按用户对各分类喜好程度的不同,为各分类分配抽样文档数;最后,采用多种抽样技术,从各分类中抽取典型文档.来自不同分类的典型文档构成了表示用户个性的本地文档库.结果排序算法通过计算通用企业搜索引擎的搜索结果与本地文档库中各文档的相似性,对结果集重新排序,从而体现出用户个性.实验结果表明,与传统的基于关键词的个性化表示与结果排序算法相比,基于本地文档集的个性化表示与结果排序算法可以给出更能反映用户个性的查询结果,且可以对用户偏好的变化作出更迅速的反映.

       

      Abstract: In this paper, a local document set based personalized representation method and a result rank algorithm for enterprise search engines are proposed to help user find the documents he really needs. Firstly, the clustering algorithms are used to cluster the history documents scanned by a user into many classes. Secondly, the fuzzy inference technique is used to analyze each class to detect how much the user likes each class. Thirdly, a different sampling number is allocated to each class according to the degree calculated by the fuzzy inference technique to reflect how much the user likes a class. Finally, the typical documents sampled from each class form a local document set, which is used to represent the personalized information of the user. The personalized rank algorithm re-ranks the document set returned by the general enterprise search engines by calculating the similarity between a result and each document in the local document set to reflect the personalization of the user. Experimental results show that, compared with the traditional keyword based personalized representation and result rank algorithms, the local document set based personalized representation method and the result rank algorithm can provide more accurate results and react faster when the user changes his personality.

       

    /

    返回文章
    返回