一种新的基于嵌入集的图分类方法

王桂娟; 印  鉴; 詹卫许

一种新的基于嵌入集的图分类方法

A Novel Graph Classification Approach Based on Embedding Sets

摘要

摘要: 随着图数据收集技术在许多科学领域的发展，对图数据分类已成为机器学习和数据挖掘领域的重要课题.目前已经提出许多图分类方法.其中，一些图分类方法采用3步来构筑分类模型；一些图分类方法采用2步来构筑分类模型.这些方法在挖掘频繁子图或特征子图时，只考虑到子图的结构信息，而没有考虑到子图的嵌入信息.为此，在L-CCAM子图编码的基础上，提出了一种基于嵌入集的图分类方法.该方法采用基于类别信息的特征子图选择策略，不但考虑了子图的结构信息，而且在频繁子图挖掘过程中充分利用嵌入信息——嵌入集，通过一步即直接选择特征子图以及生成分类规则.实验结果表明：在对化合物数据分类时，在分类精度上该方法优于采用3步的图分类方法；在运行效率上该方法优于采用2步和3步的图数据分类方法.

Abstract: With the development of highly efficient graph data collection technology in many scientific application fields, classification of graph data becomes an important topic in the machine learning and data mining community. At present, many graph classification approaches have been proposed. Some of the graph classification approaches take three steps, which are mining frequent subgraphs, selecting feature subgraphs from mined frequent subgraphs, and constructing classification model by frequent subgraphs. Some other graph classification approaches take two steps, which are mining discriminative subgraphs directly from graph data and learning classification model by discriminative subgraphs. However, during mining frequent subgraphs or discriminative subgraphs, these approaches only take advantage of the structural information of the pattern, and do not consider the embedding information. In fact, in some efficient subgraph mining algorithms, the embedding information of a pattern can be maintained. We propose a graph classification approach, in which we employ a novel subgraph encoding approach with category label and adopt a feature subgraph selection strategy based on category information. Meanwhile, during mining frequent subgraphs, we make full use of embedding sets to select the feature subgraphs and by only one step we are able to generate classification rules. Experiment results show that the proposed approach is effective and feasible for classifying chemical compounds.

HTML全文

参考文献(0)

施引文献

资源附件(0)