Abstract:
Currently, data mining techniques have been widely applied in various business and financial fields. The success of data mining techniques in these fields has sparked an interest of applying such analysis techniques to various scientific and engineering fields, such as chemistry, biology and structural mechanism. However, datasets arising in scientific and engineering fields tend to have a strong topological, geometric, and/or relational nature. Most of the existing data mining algorithms can not be directly applied since they usually assume that data can be described either as a set of transactions or as multi-dimensional vectors. As a general data structure, graph model can be used to model complicated relationships among data and has been extensively used in various scientific and engineering fields. So, developing efficient graph-based mining algorithms has become a hot research topic in the data mining community in recent years. Graph classification is an important research branch in graph mining. In this paper, a novel graph classification approach based on frequent closed emerging patterns, called CEP, is proposed. It first mines frequent closed graph patterns in the graph dataset, then obtains emerging patterns from the set of closed graph patterns, and finally constructs classification rules based on emerging patterns. Experimental results show that CEP can achieve better classification performance than the current state-of-the-art graph classification approaches when applied for classifying chemical compounds. Furthermore, classification rules generated by CEP can be easily understood and exploited by domain experts.