高级检索

    微博中基于统计特征与双向投票的垃圾用户发现

    Detecting Spammers with a Bidirectional Vote Algorithm Based on Statistical Features in Microblogs

    • 摘要: 传统微博中垃圾用户发现主要依靠用户的显示统计特征.针对微博中关注网络的有向特性,给出了有向网络中局部三角形数量统计算法DirTriangleC,结合用户博文数量和局部三角形比例发现隐式垃圾用户;针对统计特征方法对垃圾用户误报和漏报的缺点,提出了基于统计特征与双向投票算法AttriBiVote,利用用户信任的双向传播与其邻居节点的统计特征共同决定用户类别.真实的Twitter数据集上验证了DirTriangleC和AttriBiVote算法的有效性,结果表明DirTriangleC算法能够发现约837%的“完全非活跃”状态的隐式垃圾用户,相对依靠显示统计特征方法增加了约2倍数量的疑似垃圾用户;同时AttriBiVote算法发现垃圾用户的数量和准确性均高于依靠统计特征的垃圾用户发现方法;最后实验分析了AttriBiVote算法的时间开销.

       

      Abstract: The existing work mainly focuses on spammers detection in microblogs based on explicit features, such as the interval of tweets, the ratio of mentions in tweets, the ratio of URLs in tweets, and so on. In this paper, the DirTriangleC algorithm which counts local triangles is developed in order to detect the implicit spammers, based on the following directed network. Moreover, the AttriBiVote algorithm, which classifies users by the bidirectional propagation of the trust and statistical features of neighbors' users, is put forward. Experiments are conducted on a real dataset from Twitter containing about 0.26 million users and 10 million tweets, and experimental results show that the method in this paper is more effective than other methods of statistical features. About 83.7% of dead accounts are discovered by the DirTriangleC algorithm, and the number of potential spammers by the DirTriangleC algorithm is about treble others' by explicit features. Moreover, the number of spammers by the AttriBiVote algorithm is more than that of approximation spammers by statistical features. And the precision of our method is higher than that of the methods by the interval of tweets, the ratio of mentions in tweets, and the ratio of URLs in tweets. Finally, the time cost of our method is analyzed.

       

    /

    返回文章
    返回