高级检索

    基于事务日志的社会网络抽取

    Extracting Social Network from Transaction Logs

    • 摘要: 社会网络分析(social network analysis, SNA)是数据挖掘领域的一个重要研究方向,社会网络数据的质量和规模对研究十分重要.在当前的社会网络分析研究中,大多数是基于社交网站生成的社会网络,社交网站生成的在线社会网络只是对真实社会网络近似模拟,其现象、结论无法代表真实社会网络;少数基于真实社会网络的研究中,由于数据采集难度较大,往往只能使用规模有限的社会网络,从而降低了分析结果的可信程度.现代软件系统产生大量的事务日志让构建基于真实环境的社会网络成为可能.以高校学生卡管理系统产生的事务日志为例,探索如何从海量事务日志中抽取社会网络.根据事务日志的特征,建立以共现(co-occurrence)特征为基础的网络抽取模型,抽取出所有可能构成这个社会网络的边;定义了一个基于边的权重和Jaccard相关性系数的边存在系数,识别网络中的噪音边,筛选噪音边;最后,通过同班级比率分析和网络拓扑结构分析,对抽取的网络进行验证.实验结果表明,所抽取的网络具有很高的同班级比率,该抽取模型具有较好效果,同时该网络具有小世界网络(small-world)特征和满足无标度(scale-free)度分布,符合常见社会网络特征.

       

      Abstract: Social network analysis (SNA) is a popular research topic in the field of data mining, and the quality and the scale of networks are extremely important for the research. But most previous studies are conducted on large online social networks or small real social networks. Online social networks are only the approximation of real social networks, and in general they have different properties. Some research conducts on real social networks which are constructed from the survey of quite small population. Social network study expects large real social network data. Transaction logs generated by modern software systems make it possible to construct social networks from large real social data. This paper conducts a case study of extracting student social network (SSN) from school card system transaction logs to explore how to extract social network from transaction logs. Firstly, we build a relation extracting model based on co-occurrence. Then, we define probability coefficient for edge based on the weight of edge and Jaccard coefficient, filtering noisy edges from the network. We conduct our method on real transaction logs data, and the experiments show that our method can generate social networks with high precision. The topology of the network shows that this social network has small-world network features and a scale-free degree distribution.

       

    /

    返回文章
    返回