基于PPMI的异质属性网络嵌入

东坤杰; 周丽华; 朱月英; 杜国王; 黄通

doi:10.7544/issn1000-1239.20210763

摘要: 属性网络嵌入旨在映射网络中的节点和链接关系到低维空间，同时保留其固有的结构和属性特征.异质属性网络中多种类型的节点和链接关系给网络嵌入学习提供了丰富的辅助信息，同时也带来了新的挑战.提出异质属性网络嵌入模型(heterogeneous attribute network embedding based on the PPMI, HANEP)，旨在将网络中多种类型的节点和(或)多种类型的链接关系映射到低维、紧凑的空间，同时保护节点的属性特征和不同类型对象之间的异质链接承载的复杂、多样且丰富的语义信息.HANEP模型首先基于样本属性的相似性构建属性图、依据元路径抽取异质属性网络的拓扑结构，然后通过随机冲浪获得属性和拓扑概率共现(probabilistic co-occurrence, PCO)矩阵，并计算其正点对互信息(positive point-wise mutual information, PPMI)，进而采用多个自编码器(auto-encoder, AE)捕捉节点属性和异质链接的本质信息.元路径可以捕捉异质网络中多种类型节点间的链接关系,构建属性图可以清晰描述节点属性的非线性流行结构，属性和拓扑的局部成对约束和图表示有助于整合节点属性和网络拓扑的一致性和互补性关系，PPMI表示可以捕捉属性和拓扑的高阶近邻信息及潜在的复杂非线性关系.在3个真实数据集上的实验结果验证了HANEP算法的有效性.

Abstract: Attribute network embedding aims to map nodes and link relationships in a network into a latent low-dimensional space, while preserving the intrinsic essence of node attribute and network topology. Heterogeneous attribute network contains the multiple-typed nodes and link relationships, which provide the rich auxiliary information and bring the new challenges for the network embedding. A novel model named HANEP (heterogeneous attribute network embedding based on the PPMI) is proposed for mapping multiple-typed nodes and link relationship in a heterogeneous attribute network into a latent low-dimensional space, while preserving the attribute features of nodes as well as complex, diverse and rich semantic information of different-typed heterogeneous links. Specifically, HANEP first transforms attribute features into an attribute graph and extracts network topology graphs based on the different meta-paths. Next, it constructs the probabilistic co-occurrence (PCO) matrixes with respect to nodes attribute and multiple topology graphs by the random surfing respectively, calculates the positive point-wise mutual information (PPMI), and then learns representations of nodes by the multiple auto-encoders. Meta-paths can capture the link relationships between the multiple types of nodes in a heterogeneous network, the attribute graph clearly describes the non-linear manifolds structure of node attributes, pairwise constraint is helpful to integrate the consistency and complementary relationships, and PPMI representations can capture the high-order proximity and potentially nonlinear relationships of attribute and topology. Experimental results on three datasets verify the effectiveness of the HANEP.

基于PPMI的异质属性网络嵌入

Heterogeneous Attribute Network Embedding Based on the PPMI