因果机器学习的前沿进展综述

李家宁; 熊睿彬; 兰艳艳; 庞亮; 郭嘉丰; 程学旗

doi:10.7544/issn1000-1239.202110780

因果机器学习的前沿进展综述

李家宁^{1, 2,},
熊睿彬^{1, 2},
兰艳艳^3, ,,
庞亮⁴,
郭嘉丰^{1, 2},
程学旗^{1, 2}

1.
中国科学院网络数据科学与技术重点实验室（中国科学院计算技术研究所）　北京　100190
2.
中国科学院大学　北京　100049
3.
清华大学智能产业研究院　北京　100086
4.
中国科学院计算技术研究所数据智能系统研究中心　北京　100190

基金项目: 国家自然科学基金项目（61722211, 61773362, 61906180）；中国科学院青年创新促进会（20144310）；联想-中科院联合实验室青年科学家项目；重庆市基础科学与前沿技术研究专项项目（重点）（cstc2017jcyjBX0059）

详细信息

作者简介:
李家宁: 1992年生.博士.主要研究方向为机器学习和文本生成

熊睿彬: 1996年生.硕士.主要研究方向为机器学习和自然语言处理

兰艳艳: 1982年生.博士，教授.CCF高级会员.主要研究方向为机器学习、信息检索和自然语言处理

庞亮: 1990年生.博士，副研究员.CCF会员.主要研究方向为自然语言生成和信息检索

郭嘉丰: 1980年生.博士，研究员.CCF会员.主要研究方向为数据挖掘和信息检索．

程学旗: 1971年生.博士，研究员.CCF会士.主要研究方向为网络科学与社会计算、互联网搜索与挖掘、互联网信息安全、分布式系统和大规模仿真平台．

通讯作者:
兰艳艳（lanyanyan@tsinghua.edu.cn）

中图分类号: TP181
计量
- 文章访问数: 2386
- HTML全文浏览量: 238
- PDF下载量: 1140
出版历程
- 收稿日期: 2021-07-22
- 修回日期: 2021-11-14
- 网络出版日期: 2023-02-10
- 刊出日期: 2022-12-31

Overview of the Frontier Progress of Causal Machine Learning

Li Jianing^{1, 2,},
Xiong Ruibin^{1, 2},
Lan Yanyan^3, ,,
Pang Liang⁴,
Guo Jiafeng^{1, 2},
Cheng Xueqi^{1, 2}

1.
CAS Key Laboratory of Network Data Science and Technology (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190
2.
University of Chinese Academy of Sciences, Beijing 100049
3.
Institute for AI Industry Research, Tsinghua University, Beijing 100086
4.
Data Intelligence System Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190

Funds: This work was supported by the National Natural Science Foundation of China (61722211, 61773362, 61906180), the Youth Innovation Promotion Association CAS(20144310), the Lenovo-CAS Joint Lab Youth Scientist Project, and the Project of Chongqing Research Program of Basic Research and Frontier Technology (cstc2017jcyjBX0059).

摘要

摘要:
机器学习是实现人工智能的重要技术手段之一，在计算机视觉、自然语言处理、搜索引擎与推荐系统等领域有着重要应用.现有的机器学习方法往往注重数据中的相关关系而忽视其中的因果关系，而随着应用需求的提高，其弊端也逐渐开始显现，在可解释性、可迁移性、鲁棒性和公平性等方面面临一系列亟待解决的问题.为了解决这些问题，研究者们开始重新审视因果关系建模的必要性，相关方法也成为近期的研究热点之一.在此对近年来在机器学习领域中应用因果技术和思想解决实际问题的工作进行整理和总结，梳理出这一新兴研究方向的发展脉络．首先对与机器学习紧密相关的因果理论做简要介绍；然后以机器学习中的不同问题需求为划分依据对各工作进行分类介绍，从求解思路和技术手段的视角阐释其区别与联系；最后对因果机器学习的现状进行总结，并对未来发展趋势做出预测和展望．
- 因果关系 /
- 伪相关关系 /
- 因果推断 /
- 机器学习 /
- 深度学习 /
- 人工智能
Abstract:
Machine learning is one of the important technical means to realize artificial intelligence, and it has important applications in the fields of computer vision, natural language processing, search engines and recommendation systems. Existing machine learning methods often focus on the correlations in the data and ignore the causality. With the increase in application requirements, their drawbacks have gradually begun to appear, facing a series of urgent problems in terms of interpretability, transferability, robustness, and fairness. In order to solve these problems, researchers have begun to re-examine the necessity of modeling causal relationship, and related methods have become one of the recent research hotspots. We organize and summarize the work of applying causal techniques and ideas to solve practical problems in the field of machine learning in recent years, and sort out the development venation of this emerging research direction. First, we briefly introduce the closely related causal theory to machine learning. Then, we classify and introduce each work based on the needs of different problems in machine learning, explain their differences and connections from the perspective of solution ideas and technical means. Finally, we summarize the current situation of causal machine learning, and make predictions and prospects for future development trends.
- causal relationship /
- spurious correlation /
- causal inference /
- machine learning /
- deep learning /
- artificial intelligence

HTML全文

现实世界中的大量问题都可以抽象成图模型（graph model），也就是节点和边的集合，包括自然语言处理^[1]、异常检测^[2-3]、学术网络分析、生物医疗、推荐系统等. 图是一种不同规则的非欧氏几何数据，图数据的结构错综复杂，包含大量的信息，比如结构信息、节点特征信息等. 通过学习基于图形的嵌入表示，可以获取结构化数据的顺序、拓扑、几何和其他关系特征. 近年来，图深度学习的研究是学术界和产业界的一个热点，主要集中在节点分类^[4]、链路预测^[5]、图分类等. 本文将重点关注的是图分类任务. 图分类任务的关键是学习图与对应标签的映射关系. 图分类在生物化学方面有着广泛的应用，例如对一些化学分子图进行分类来判断其活性、水溶性、毒性等. 因此研究图分类问题有着重要意义.

图分类的重要方法之一是图核方法，它是一种计算图之间相似度的重要方法，把一些在低维空间下非线性不可分的数据转化到高维空间中，使得这些数据线性可分，是专门针对图数据的一种特殊方法. 图核函数一般是根据专家知识设计的，它考虑了不同子结构的相似性，例如随机游走核和最短路径核. 不同的图核函数之间也能相互融合，例如多图核学习^[6]. 这就为图分类引入不同的相似度度量方法和不同的偏差，从而生成具有不同性能的图分类模型. 然而在缺乏专家知识的情况下，执行图分类任务时很难确定选择哪种图核函数是最好的.

随着深度学习的兴起，图卷积神经网络（graph convolution neural networks, GCN）^[7]成为图数据挖掘领域最重要的方法之一. GCN首次提出了卷积的方式融合图结构特征，提供了一个全新的视角，即在节点嵌入表示中将邻域节点的特征融入其中，与将节点特征直接通过全连接层分类的方法相比，在节点分类准确度上得到了很大提升. 然而GCN存在共享权重、灵活性差、可扩展性不强的缺点，此外当网络层数增加时，会出现过平滑现象，导致每个节点的特征结果十分相似. 为了解决GCN领域聚合时权值共享问题，带有注意力机制的图注意力网络（graph attention network, GAT）^[8]被提出，GAT具有高效低存储的优点，GAT是基于邻居节点的计算方式，它属于一种归纳的学习方式. GAT的缺点就是只归纳了1阶邻居，导致GAT的感受野必须依赖深层的网络. 为了解决GCN扩展性差的问题， GraphSAGE（graph sample and aggregate）^[9]提出了多种节点采样和聚合的方法，使图的嵌入表示更加灵活，当图中有新的节点加入时，固定的节点采样方式使得GraphSAGE无需对所有的节点进行重新训练，便可获取最新的嵌入. 图神经网络（graph neural network，GNN）主要是针对节点特征的更新与提取，图分类要在此基础上增加图池化的操作，图池化主要是在图节点嵌入的基础上得到整个图的嵌入，其中主流的图池化方法有2种，即全局池化和分层池化. 全局池化是在叠加图卷积之后运用全局池化操作（如最大池化和平均池化）选出能代表整张图表示的节点信息. 分层池化借鉴了CNN中的池化，每次池化都会缩小数据的规模，对于图数据来说，就是通过某种算法减少节点的数目来完成逐层的抽取，从而实现图的池化，这种算法有Top-k^[10]、图聚类池化等. 图神经网络用于图分类的整个过程如图1所示.

图 1 基于图神经网络的图分类过程

Figure 1. The process of graph classification based on graph neural networks

下载: 全尺寸图片幻灯片

为了提升图神经网络的图分类性能，近年来，一些研究人员致力于把图核与图神经进行融合，提出了许多基于图核的图神经网络框架. 例如图卷积核网络（graph convolutional kernel network，GCKN）^[11]采用随机游走核提取路径并投影到核空间中，然后把核空间中路径信息聚合到起始节点上. 图结构核网络（graph structural kernel network，GSKN）^[12]在GCKN的基础上增加了匿名随机游走核，使得提取局部子结构的能力得到了进一步的加强. 这2种框架虽然能在一定程度上提升表达能力，但提取路径的操作耗费大量时间. 核图神经网络（kernel graph neural network，KerGNN）^[13]也是采用随机游走核与图神经网络进行融合，与先前工作不同的是，它采用可训练隐藏图作为图滤波器，与子图进行结合，并利用图核更新节点嵌入，使得图神经网络具有一定的可解释性，降低了图核计算的时间复杂度，然而对于图分类的性能提升不大.

影响图分类性能主要有2个方面：1）对图节点的特征编码；2）对图的结构编码. 在一些化学分子图中，结构对性能的影响占比很大，这类图的性质与特定子图结构的相关性比较强，对于一些社交网络图，这类图不依赖于特定的局部结构，节点特征分布对图分类性能影响较大. 基于图核的方法，对图的结构编码的方法重点关注图之间的结构相似性，本质上来说也是对图的一种结构相似性编码，因此图核方法在一些化学分子图上表现出良好的性能，但对于一些社交网络图表现出的性能相对较差. 而基于图神经网络的模型更加关注节点的特征. 其本质上也是基于消息传递的框架. 当今的一些图神经网络框架存在3个问题：1）在图神经网络的邻域聚合的过程中，获取了图的树形结构信息和邻域内的节点特征信息，却无法区分例如多元环等高阶子结构；2）为了获得更好的性能，图神经网络会叠加多层特征信息，但是这个层数的设置很难把握，如果设置过大，会产生过平滑问题，也即是，它使得深层的节点嵌入表示都十分相似，因此，把这些相似的节点嵌入堆叠会破坏图节点特征编码与图标签的单射关系，最终会导致图分类性能下降；3）以往图核和图神经网络融合的工作主要是采用随机游走核来提升节点获取邻域内高阶子结构信息的能力，但是这种方法的时间复杂度较大. 此外，随机游走核具有不确定性，无法保证每一次游走的路径都包含了高阶子结构信息. 为了解决这3个问题，本文将WL（Weisfeiler-Lehman）^[14]核与图神经网络融合起来，WL核对图数据进行结构相似性编码，图同构网络（graph isomorphism network，GIN）^[15]对图的节点特征进行编码，并将这2部分编码通过注意力加权方式进行融合，相当于在基于消息传递的图神经网络中增加了图的结构相似性信息，提升图神经网络的表达能力.

本文的主要贡献包括4个方面：

1）提供了一个新的视角，将WL核应用到图神经网络领域中，通过GIN与WL 核方法进行融合，丰富图的结构特征和节点特征，提升GIN对高阶图结构的判别能力；

2）针对不同类型的图数据集，提出基于注意力机制的图结构相似编码和图节点特征编码的融合方法，完成两者权重的自适应学习；

3）在图核中，使用Nyström方法构建一个低秩矩阵去近似原核矩阵，从而大大降低图核矩阵的维度，解决图核矩阵在计算中运算代价大的问题；

4）在7个公开的图数据集上，与一些当前已知性能最好的多种代表性的图分类基准模型相比，所提出的模型在大多数数据集上可以表现出更好的性能.

1. 相关工作

1.1 基于WL核的图分类方法

WL核是当前应用最广泛的图核方法之一，它是基于子树的图核方法，其主要思想是对图进行子树分解，使用子树间的相似度来代替图的相似度，它是一种基于1-WL图同构测试所提出的一种快速特征提取算法，详细的操作步骤为：对于拥有多个节点标签的离散图，首先对各个节点进行邻域聚合；然后对邻居节点进行排序，与此同时，节点标签和排序后的邻居标签共同组成多重集，并对这些多重集进行压缩映射，生成对应的新标签，进而将这些新标签赋给节点，这样就完成了一次迭代. 在迭代过程中，重新标记的过程以及多重集的压缩是所有输入图同时进行的，并且所有图共享标签压缩的对应关系.

具有H次迭代的2个图 ${G_1}$ 和图 ${G_2}$ 上的WL子树核被定义为

$K_{{\mathrm{sub}}}^h({G_1},{G_2}) = \left\langle {\phi _{{\mathrm{WL}}}^h({G_1}),\phi _{{\mathrm{WL}}}^h({G_2})} \right\rangle \text{，}$

(1)

$\phi _{{\mathrm{WL}}}^h(G) = ({c_0}(G,{\sigma _{01}}),{c_1}(G,{\sigma _{02}}), …,{c_h}(G,{\sigma _{0\left| {{\Sigma _h}} \right|}})) \text{，}$

(2)

其中 $\phi _{{\mathrm{WL}}}^h(G)$ 表示 $h$ 次迭代中节点标签出现的次数， ${c_i}(G,{\sigma _{ij}})$ 表示节点 ${\sigma _{ij}}$ 在图 $G$ 中出现的次数. WL核的图同构测试能力被证明是图神经网络的上限，图同构是图论中用来描述2个图在拓扑结构上是否完全等价的一个概念，如果2个图 ${G_1}$ 和 ${G_2}$ 完全等价，那么就称 ${G_1}$ 和 ${G_2}$ 是同构的. 判断2个图是否为同构是一个非常困难的问题，目前还没有一个可以在多项式时间内求解的算法. 除了一些极端的情况外，WL测试可以被用来判断2个图是否同构. 与一些传统的图核方法，如随机游走核^[16]、最短路径核^[17]或REGK核^[18]相比，WL测试的同构检测能力是该领域的一个重要进步，该方法的运行时间仅与节点和边的数量呈线性关系，同时在各种统计学习任务中表现出良好的性能.

1.2 基于图同构网络（GIN）的图分类方法

图神经网络的任务主要是进行图节点表征学习，并基于学习到的节点表征进行下游的任务，例如节点分类或者链路预测等. 而GIN提出了图级别表示学习，即图表征学习. 图表征学习要求根据节点属性、边和边的属性（如果有的话）生成一个向量作为图的表征，基于图表征可以做图的预测. 基于GIN的图表征学习主要包含2个过程：1）计算得到图的节点特征，即每一个节点的特征依次聚合各个邻居节点的特征，常用的图神经网络节点聚集函数有求和函数、平均函数和最大化函数. 2）在GIN中，节点聚集函数选择的是求和函数，而不是选择平均函数，原因是平均函数不能识别某一节点出现的次数，不能精确描述多重集，它只能捕捉实体的特征分布，而最大化函数适合捕捉具有代表性的元素或“骨架”，而不能区分确切的结构或分布的任务. 图神经网络还有其他的节点聚合函数，如加权平均、LSTM池化等，而对判断同构问题来说，本文使用了基于注意力机制的自适应加权求和方法，此方法具有较强的表征能力.

为了实现图节点特征编码与图标签的单射，GIN使用加法作为聚合函数，并使用多层感知机来模拟函数的组合，实现每层之间的映射关系，更新GIN节点表示函数为：

${\boldsymbol{h}}_v^{(k)} = ML{P^{(k)}}\left((1 + {\varepsilon ^{(k)}}){\boldsymbol{h}}_v^{(k - 1)} + \sum_{u \in N(v)} {{\boldsymbol{h}}_u^{(k - 1)}} \right) \text{，}$

(3)

其中 ${\boldsymbol{h}}_v^{(k)}$ 表示在第k层节点v的特征表示，N(v)表示节点v的邻居节点， $\varepsilon$ 为一个可学习的参数或者一个固定的标量，进一步对图上各个节点的表征进行图池化（graph pooling）或图读出（graph readout），得到整个图的表征（graph representation），并将所有层读出的图表征拼接起来：

${{\boldsymbol{h}}_G} = Concat\left( {Readout\left(\left\{ { {{\boldsymbol{h}}_v^{(k)}} |v \in G} \right\}\right)} |k = 0,1, … ,K\right) ，$

(4)

其中 ${{\boldsymbol{h}}_G}$ 是整个图的嵌入表示，这里的读出函数分别使用了求和、求平均和MLP.

1.3 Nyström方法

图核方法的计算代价比较大，对于一个有n个图的数据集，得到的图核矩阵是n²个元素，当n非常大时，图的结构编码维度就比较大，Nyström方法^[19]是作为一种使用简单正交规则离散积分方程的方法而被提出的，是一种广泛使用的降维方法，用于给定的列采样子集逼近核矩阵^[20]. Nyström常用在核空间的计算问题中，对于一个样本集合 $\{ {x_1},{x_2}, …,{x_n}\}$ ,以及它们的核矩阵 ${\boldsymbol{K}} \in {\mathbb{R}^{n \times n}}$ ，Nyström可通过采样的方式，构建一个低秩矩阵去近似表示原核矩阵，降低核矩阵在计算中的运算代价. Nyström可以作为一种无监督的降维编码. 同时，也可以得到核空间中样本的矩阵表示 $\tilde {\boldsymbol{K}} \in {\mathbb{R}^{n \times d}}$ .

2. 图核同构网络KerGIN

本节将重点介绍KerGIN，该模型以GIN为基础，借助图核方法，将图的结构特征和节点特征进行深度融合. 本文提出的模型框架如所示，整个模型分为3个部分：GIN编码器、图核和注意力模块，下面将对模型的每一部分进行详细的介绍. 首先介绍关于图核和相关图同构网络模型的一些基本概念. 一个图可以表示为 $g = (V,\;{\boldsymbol{X}},\;{\boldsymbol{A}})$ ，其中 $V = \{ {v_1}, {v_2}, …,{v_N}\}$ 表示图节点的集合， ${\boldsymbol{X}} \in {\mathbb{R}^{N \times d}}$ 表示图中节点的特征，总共有 $N$ 个节点，每个节点的特征维度都是 $d$ ， ${\boldsymbol{A}} \in {\mathbb{R}^{N \times N}}$ 表示图的邻接矩阵，本文所研究的图都是无权无向图，如果节点 ${v_i}$ 与 ${v_j}$ 之间存在边，则 ${A_{ij}} = 1$ ，否则 ${A_{ij}} = 0$ . 对于图分类问题，给定一个数据集 $\{ ({g_1},{y_1}),({g_2},{y_2}), …,({g_n},{y_n})\}$ ，其中 $y$ 表示图的标签. 图分类任务的目的就是学习到由图 $g$ 到标签 $y$ 的映射函数 ${y_g} = f(g)$ . 本文使用one-hot编码处理离散标签. 例如4个标签分别由4维向量(1, 0, 0, 0)，(0, 1, 0, 0)，(0, 0, 1, 0)，(0, 0, 0, 1)来表示.

图 2 KerGIN总体框架

Figure 2. The overall architecture of KerGIN

下载: 全尺寸图片幻灯片

2.1 GIN编码

根据图的邻接矩阵 ${\boldsymbol{A}}$ 和图的特征 ${\boldsymbol{X}}$ ，使用GIN对图进行编码，首先对每个节点进行邻居的采样和聚合，采样一个节点的所有邻居，聚合邻居采用求和函数，即每个节点的特征加上邻居节点的特征. 节点特征采样函数和聚合函数分别为：

${\boldsymbol{a}}_v^{(k)} = Aggregat{e^{(k)}}(\{ {\boldsymbol{h}}_u^{(k - 1)}:u \in N(v)\} ) \text{，}$

(5)

${\boldsymbol{h}}_u^{(k)} = Combin{e^{(k)}}({\boldsymbol{h}}_u^{(k - 1)},{\boldsymbol{a}}_v^{(k)}) \text{，}$

(6)

其中 $Aggregate$ ()是采样邻居节点函数， $Combine$ ()是求和函数，因为GIN已经证明了求和函数是单射的. 进一步在k层得到的特征向量 ${\boldsymbol{h}}_v^{(k)}$ 经过一个多层感知机：

${\boldsymbol{H}}_v^{(k)} = ML{P^{(k)}}({\boldsymbol{h}}_v^{(k)}) \text{，}$

(7)

这样即可得到每个节点经过消息传递后的特征向量 ${\boldsymbol{H}}_v^{(k)}$ ，然后再将每一层得到的特征向量相加，即把每一层求和后的特征向量拼接起来，这样就得到了图的特征编码 ${{\boldsymbol{H}}_G}$ ，该过程采用的函数表达式为：

${{\boldsymbol{H}}^{(k)}} = \sum\limits_{v \in V} {{\boldsymbol{H}}_v^{(k)}} \text{，}$

(8)

${{\boldsymbol{H}}_G} = Concat(\{ \left. {{{\boldsymbol{H}}^{(k)}}} \right|k = 0,1, …,m\} ) .$

(9)

2.2 WL图核矩阵的生成

在2.1节中GIN已经对图进行了节点特征编码，在本节中重点关注对于图的结构特征编码. 由于GIN对图的结构表征能力有限，所以引入一个图核矩阵即图的结构相似性编码来增强GIN的结构表征能力.

图核用于计算2个图的相似度，对于一个图数据集 $G = \{ {g_1},{g_2}, …,{g_N}\}$ ，计算每2个图的核值，构成核矩阵 ${\boldsymbol{ K}} \in {\mathbb{R}^{N \times N}}$ ，图核矩阵中的 $i$ 行表示的图与其他图 ${g_i}$ 的结构相似度，相当于对图 ${g_i}$ 的结构相似性编码，本文使用的图核是不带节点标签的WL核，即输入2个图的邻接矩阵 ${\boldsymbol{A}} \in {\mathbb{R}^{N \times N}}$ ，不需要节点的特征或者节点的标签. 如图3所示，对于2个原始的图，聚合其邻居节点，然后对聚合后的每一个节点进行重新的哈希编码，即对节点使用新的颜色来表示，这是进行1次迭代所得到的结果，然后按颜色来统计所有节点的个数，这样就把图转化为特征向量，最后2个向量之间求内积，即得到2个图的相似度，求2个图的核值函数为：

图 3 WL核执行过程图

Figure 3. Diagram of the implementation process of WL kernel

下载: 全尺寸图片幻灯片

${K_{{\mathrm{WL}}}}({g_i},{g_j}) = \sum\limits_{i = 0}^k {K_{{\mathrm{subtree}}}^{(i)}} ({g_i},{g_j}) \text{，}$

(10)

$K_{{\mathrm{subtree}}}^{(i)} = \sum\limits_{u \in {V_i}} {\sum\limits_{u' \in {V_j}} {k_{{\mathrm{subtree}}}^{(i)}} } (u,u') .$

(11)

在WL核的计算过程中，用内积来度量2个图的子树模式向量. 本文也选取了其他常用的图核函数进行了对比实验，如最短路径（shortest path, SP）核、随机游走（random walk, RW）核，详见3.4节，最后采用了效果最优的WL核.

2.3 Nyström降秩分解

在2.2节中得到了图数据集的核矩阵，这里的图核矩阵的维度通常较大，空间复杂度为 $O({N^2})$ ，如果图数据集的数量庞大，将导致后续的计算代价很大. Nyström方法常用在核空间的计算问题中，通过降秩分解，可以显著降低核矩阵的维度. 核矩阵 ${\boldsymbol{K}} \in {\mathbb{R}^{N \times N}}$ 是对称正定矩阵，核矩阵的分解过程为：

${{\boldsymbol{K}}} = \left( {\begin{array}{*{20}{c}} {{\boldsymbol{A}}}&{{{{\boldsymbol{B}}}^{\mathrm{T}}}} \\ {{\boldsymbol{B}}}&{{\boldsymbol{C}}} \end{array}} \right)\text{，}$

(12)

其中 ${{\boldsymbol{A}}} \in {\mathbb{R}^{k \times k}}$ ， $k \lt n$ ，假设 ${\boldsymbol{K}} = {\boldsymbol{U}}{\boldsymbol{\varLambda}} {{\boldsymbol{U}}^{\mathrm{T}}}$ ， ${\boldsymbol{ A}} = {{\boldsymbol{U}}_A}{{\boldsymbol{\varLambda}} _A}{\boldsymbol{U}}_A^{\mathrm{T}}$ ，令

$\tilde {\boldsymbol{U}} = \left( {\begin{array}{*{20}{c}} {{{\boldsymbol{U}}_A}} \\ {{\boldsymbol{B}}{{\boldsymbol{U}}_A}{\boldsymbol{\varLambda}} _A^{ - 1}} \end{array}} \right) \text{，}$

(13)

则

$\tilde {\boldsymbol{K}} = \tilde {\boldsymbol{U}}{{\boldsymbol{\varLambda}} _A}{\tilde {\boldsymbol{U}}^{\rm{T}}} = \left( {\begin{array}{*{20}{c}} {\boldsymbol{A}}&{{{\boldsymbol{B}}^{\rm{T}}}} \\ {\boldsymbol{B}}&{{\boldsymbol{B}}{{\boldsymbol{A}}^{ - 1}}{{\boldsymbol{B}}^{\rm{T}}}} \end{array}} \right) \text{，}$

(14)

易得

$\left\| {{\boldsymbol{K}} - \tilde {\boldsymbol{K}}} \right\| = \left\| {{\boldsymbol{C}} - {\boldsymbol{B}}{{\boldsymbol{A}}^{ - 1}}{{\boldsymbol{B}}^{\mathrm{T}}}} \right\| \text{，}$

(15)

此时得到的核矩阵 $\tilde {\boldsymbol{K}}$ 近似于原核矩阵 ${\boldsymbol{K}}$ ，由矩阵分解可得 $\tilde {\boldsymbol{K}} \approx {\boldsymbol{Q}}{{\boldsymbol{Q}}^{\mathrm{T}}}$ ， ${\boldsymbol{Q}}$ 的维度是 ${\boldsymbol{Q}} \in {\mathbb{R}^{N \times k}}$ ，其中 $k \ll N$ ，这时核矩阵的空间复杂度由 $O({N^2})$ 降为 $O(Nk)$ ，从而降低了运算代价.

由于降维后的核矩阵与2.1节中GIN编码维度不一致，在这里使用神经网络对核矩阵的维度与GIN编码的向量对齐，定义一个2层的神经网络，共享1个隐藏层，为了防止梯度消失或梯度爆炸现象的出现，需要对核矩阵 $\tilde {\boldsymbol{K}}$ 中的每一行进行规范化，并使用最小值中心化的方法进行归一化，再经过全连接神经网络得到图核嵌入向量 ${{\boldsymbol{h}}_k}$ ，即图的结构编码，计算的函数表达式为：

$\tilde K'[i,j] = \frac{{\tilde K[i,j] - \min ({r_i}(\tilde {\boldsymbol{K}}))}}{{\max ({r_i}(\tilde {\boldsymbol{K}})) - \min ({r_i}(\tilde {\boldsymbol{K}}))}} \text{，}$

(16)

${{\boldsymbol{h}}_k} = {{{Softmax}}} (ReLU(\tilde {\boldsymbol{K}}'{{\boldsymbol{W}}^0}){{\boldsymbol{W}}^1}) ，$

(17)

其中 $\tilde K[i,j]$ 表示核矩阵 $\tilde {\boldsymbol{K}}$ 中归一化后的第i行第j列元素的值， ${\boldsymbol{W}}$ 是一个可学习的权重矩阵，使得计算更加平滑， ${{\boldsymbol{h}}_k}$ 是经过2层全连接神经网络后所得到的图结构的嵌入表示，与经过GIN得到的图特征编码 ${{\boldsymbol{H}}_G}$ 维度保持一致，这为图分类的下游任务做准备.

2.4 GIN编码和WL核编码的融合

在2.2~2.3节中，通过GIN和图核分别得到了第 $i$ 个图的特征编码 ${\boldsymbol{H}}_G^i \in {\mathbb{R}^{1 \times d}}$ 和图的结构编码 ${\boldsymbol{h}}_k^i \in {\mathbb{R}^{1 \times d}}$ （取核矩阵 $\tilde {\boldsymbol{K}}$ 的第 $i$ 行），将 ${\boldsymbol{H}}_G^i$ 与 ${\boldsymbol{h}}_k^i$ 进行注意力加权求和，函数表达式为：

${{\boldsymbol{H}}_1} = \left( {\begin{array}{*{20}{c}} {{{\boldsymbol{H}}_G}} \\ {{{\boldsymbol{h}}_k}} \end{array}} \right){\boldsymbol{W}} \text{，}$

(18)

${\boldsymbol{c}} = {{{Softmax}}} ({{\boldsymbol{H}}_1}{\boldsymbol{a}}) \text{，}$

(19)

${\boldsymbol{H}} = {\left( {\begin{array}{*{20}{c}} {{{\boldsymbol{H}}_G}} \\ {{{\boldsymbol{h}}_k}} \end{array}} \right)^{\mathrm{T}}}{\boldsymbol{c}} ，$

(20)

其中 ${\boldsymbol{W}} \in {\mathbb{R}^{d \times d}}$ 为权重向量，它可以使图的嵌入计算更加平滑， ${\boldsymbol{a}} \in {\mathbb{R}^{d \times 1}}$ 为注意力权重向量， ${\boldsymbol{c}} \in {\mathbb{R}^{2 \times 1}}$ 为注意力系数， ${\boldsymbol{H}} \in {\mathbb{R}^{d \times 1}}$ 为图特征编码和图结构编码的注意力加权融合后的向量表示，进一步将 ${\boldsymbol{H}}$ 输入到多层感知机或者支持向量机中进行图分类任务.

KerGIN提取图特征的算法描述见算法1，输入为一组图数据集以及图的邻接矩阵和节点的特征. 对每个图使用GIN进行特征编码，即对每一个节点进行邻域聚合，得到节点的嵌入表示，然后把所有节点的嵌入表示加起来，这样就得到了图的表征向量. 与此同时使用WL核求每2个图之间的核值，即对于每一个节点进行h次迭代的哈希编码，这样就把整个图映射成一个向量，将2个图的向量表示进行内积运算，这样就得到了2个图的相似度，也即得到了图的结构编码. 进一步使用注意力机制将图的特征编码和图的结构编码进行加权求和，进而得到图的特征向量表示，并对所有的图都进行上述的操作，最后将这些图的向量表示输入到多层感知机或支持向量机中进行下游的分类任务.

算法1. KerGIN提取特征.

输入：图数据集 $G = \{ {g_1},{g_2}, …,{g_n}\}$ ，其中 $g = (V, E,{\boldsymbol{X}})$ ；

输出：图的嵌入表示 ${\boldsymbol{\phi}} (G) \in {\mathbb{R}^{n \times d}}$ .

①初始化节点嵌入 ${{\boldsymbol{h}}_0} = {\boldsymbol{X}}$ ，核矩阵 ${\boldsymbol{K}} \in {\mathbb{R}^{n \times n}}$ ，标量参数 $\varepsilon = {\varepsilon _0}$ ；

② for $k = 1,2, …,m$ do

③ 　聚合邻居节点的特征 ${\boldsymbol{h}}_v^{(k)}$ ；/*式（3）*/

④ 　将节点特征相加得到图的嵌入表示 ${{\boldsymbol{h}}^{(k)}}$ ；

⑤ 　将 ${{\boldsymbol{h}}^{(k)}}$ 进行拼接得到图的嵌入表示 ${{\boldsymbol{H}}_G}$ ；

⑥ end for

⑦ for $i = 1,2, …,n$ do

⑧ 　for $j = 1,2, …,n$ do

⑨ 　　计算图 ${g_i}$ 和 ${g_j}$ 的结构相似度 $K[{g_i},{g_j}]$ ；

⑩ 　end for

⑪ end for

⑫ 借助Nyström方法对 ${\boldsymbol{K}}$ 进行分解得到 $\tilde {\boldsymbol{K}}$ ；

⑬ 将 ${{\boldsymbol{H}}_G}$ 与 $\tilde {\boldsymbol{K}}$ 通过注意力机制加权求和得到图的嵌入表示 ${\boldsymbol{ \phi }}(G)$ . /*式（18）~（20）*/

3. 实　　验

3.1 数据集的介绍和实验设置

1）数据集. 本文使用7个公开的图分类数据集进行实验，分别为MUTAG^[21]，PTC^[22]，PROTEINS^[23]，NCI1^[24]，IMDB-B^[25]，IMDB-M^[25]，COLLAB^[25]. 前4个为化学分子数据集，后3个为社交网络数据集. 7个数据集的简介为：

① MUTAG^[21]. 该数据集包含了188个化合物结构图，依据它们对细菌的诱变作用，可被分为2类. 图中的节点和节点标签分别表示原子和原子种类，包括C, N, O, F, I, Cl, Br.

② PTC^[22]. 该数据集全称是预测毒理学挑战，用来发展先进的SAR技术预测毒理学模型. 这个数据集包含了针对啮齿动物的致癌性标记的化合物. 图中有2个类别的标签，分别表示有致癌性和无致癌性.

③ PROTEINS. 该数据集^[23]中有1113个蛋白质结构图，图的标签分为2类，分别表示酶或者非酶. 节点表示蛋白质的2级结构，根据2级结构在氨基酸序列或者蛋白质3维空间中是否为邻居来确定节点之间边的存在性.

④ NCI1^[24]. 该数据集是一个关于化学分子的数据集，是根据非小细胞肺癌活性筛选的，图的标签分为2类，表示具有或不具有抗癌活性，共包含4110个化合物的图结构.

⑤ IMDB-B^[25]. 该数据集是一个电影合作数据集，来源于互联网电影数据库IMDB. 图中的节点表示演员，如果2个演员在同一部电影中出现，则他们对应的节点之间就存在一条边，这些合作图分为动作和浪漫2种类型. 合作图是以每个演员为中心的网络图，图分类的任务是判断这些自我中心网络图属于动作类型还是浪漫类型. 此外该数据集还有一个多类型版本.

⑥ IMDB-M^[25]. 该数据集的任务也是对演员子网络图按电影类型进行分类.

⑦ COLLAB^[25]. 该数据集是一个关于科研合作的数据集，涵盖了高能物理、凝聚态物理和天文物理3个领域中生成的不同研究人员的自我中心网络(ego-network)图、对应的图标签为研究人员所属的研究领域. 分类的任务是确定这些自我中心网络图对应的研究人员所属的研究领域.

实验中使用的7个图数据集的信息统计结果如表1所示.

表 1 数据集的信息统计

Table 1. Information Statistic of Datasets

数据集	类型	图数量	平均节点数	平均边数	种类	节点属性	属性维度
MUTAG	化学分子	188	18	20	2	Disc.	1
PTC	化学分子	344	26	51	2	Disc.	1
PROTEINS	化学分子	1113	39	73	2	Disc.	1
NCI1	化学分子	4110	30	65	2	Disc.	1
IMDB-B	社交网络	1000	20	97	2	No	2
IMDB-M	社交网络	1500	13	66	3	No	3
COLLAB	社交网络	5000	74	2458	3	No	5

下载: 导出CSV

| 显示表格

2）基准方法. 本文将选择当前已知性能最好的多种代表性图分类方法作为基准方法进行实验对比，分别为基于图核的方法、基于图神经网络的分类方法、基于图池化的分类方法以及近几年一些与图核结合的图神经网络方法，以此来证明本文模型的有效性. 基于图核的图分类方法有WL核^[14]和DGK核^[25]，基于图神经网络的图分类方法包括GIN^[15]，DCNN^[26]，PATCHY-SAN^[27]. 基于图池化的图分类方法有SUGAR^[28]、AVCN(H)^[29]，SLIM^[30]. 基于图核与图神经网络融合的图分类方法有GCKN^[11]，GSKN^[12]，GSNN^[31]. 本文的分类任务属于有监督学习.

3）参数设置. 模型训练过程中采用常用的参数设置，设置学习率lr=0.0001，训练批次batch_size=16，epoch=600，Nyström方法^[19]中对核矩阵降秩分解后的维度d为数据集中图数量的1/2，全连接层神经网络中隐藏层维度分别为16和8. WL核的迭代次数h=3. 数据集中90%作为训练集，其余的10%作为测试集.

3.2 实验结果

本节将在7个数据集上对KerGIN和其他所有基准方法进行分类评估. 本文采用10次10交叉验证，即将数据集分成10份，每次取1份作为测试集，剩下的9份作为训练集；然后对这10次的结果求平均. 每个数据集的分类准确度如表2所示.

表 2 在各个公开数据集上的分类准确度

Table 2. Classification Accuracy on Each Public Dataset %

方法	数据集
方法	MUTAG	PTC	PROTEINS	NCI1	IMDB-B	IMDB-M	COLLAB
WL	90.4(8)	59.9(8)	75.0(9)	86.0(2)	73.8(7)	50.9(5)	78.9(5)
DGK	82.6(11)	57.3(9)	71.6(10)	62.2(9)	66.9 (10)	44.5(7)	73.1(7)
GIN	89.4(9)	64.6(6)	76.2(6)	82.7(4)	75.1(6)	52.3(4)	80.2(4)
DCNN	67.0(12)	56.6(10)	61.3(11)	62.6(8)	49.1(9)	33.5(8)	52.1(9)
PATCHY-SAN	92.6(6)	60.0(7)	75.9(7)	78.6(7)	71.0(8)	45.2(6)	72.6(8)
SUGAR	96.7(1)	77.5(3)	81.3(3)	84.3(3)
GCKN	91.6(7)	68.4(5)	76.2(6)	82.0(5)	76.5(5)	53.3(3)	82.9(2)
GSKN	93.3(4)	85.2(2)	82.3(2)		79.9(2)	59.3(2)	81.8(3)
GSNN	94.7(3)		78.4(4)		78.1(3)
AVCN(H)	89.3(10)	62.3(8)	75.7(8)		73.4(8)	50.9(5)	80.2(4)
SLIM	93.2(5)	72.4(4)	77.4(5)	80.5(6)	77.2(4)	53.3(3)	78.2(6)
KerGIN（本文）	95.2(2)	88.5(1)	88.4(1)	86.8(1)	81.6(1)	60.1(1)	83.2(1)
注：加粗数字表示最优结果，括号里的数字表示该方法在每个数据集上的分类准确度排名.

下载: 导出CSV

| 显示表格

表2展示所有方法在7个公开数据集的测试准确度. KerGIN在大多数数据集上的表现优于基准方法. 其中MUTAG数据集的平均准确度为95.2%，高于除SUGAR外的所有基准方法. 与GSKN相比，KerGIN的准确度在PTC数据集上提升了3.3个百分比. 在PROTEINS数据集上，KerGIN的准确度相比GSKN方法提升了6.1个百分比. 对于NCI1数据集，KerGIN的准确度比GCKN提升了4.8个百分比. 在IMDB-B和IMDB-M数据集上，KerGIN的准确度比准确度排名第2的GSKN方法分别提升了1.7个百分比和0.8个百分比. 在COLLAB数据集上，KerGIN准确度接近于最先进的GCKN方法. 特别是在一些化学分子数据集上KerGIN表现更突出，与最新的2个基于图核的图神经网络方法相比，KerGIN具有更优越的性能.

为了比较不同方法的综合性能，分别统计了不同方法的平均排名，即对每一个方法，求出其在各个数据集上分类准确度的平均排名情况，相关的计算公式为：

$\overline{R}=\dfrac{1}{n}\left(\sum\limits_{i=1}^nrank(d_i)\right)，$

(21)

其中 $\bar R$ 表示平均排名， $rank({d_i})$ 表示在第i个数据集上的分类准确度排名，n表示数据集的数量.

在图4（a）中，与所有基准方法比较，KerGIN在7个公开的图分类数据集上，相比较最优的基准方法，准确度的平均排名为1. 由此可知KerGIN的图分类性能要优于大多数基准方法. 如图4（b）所示，KerGIN相较于最优的基准方法在6个数据集上的分类准确度都有不同程度的提升. 由于KerGIN在MUTAG数据集上没有提升，所以在图4（b）中只选择了有提升的6个数据集进行展示. 其中，在PROTEINS数据集上提升了7.5%，在IMDB-B数据集上提升了约2.1%，在PTC数据集上提升了约3.8%，在NCI1数据集上提升了0.93%，在IMDB-M数据集上提升了1.34%，在COLLAB数据集上提升了0.36%.

图 4 各种方法的平均排名和KerGIN的准确度提升率

Figure 4. Average rank and accuracy improvement rate of various methods

下载: 全尺寸图片幻灯片

3.3 消融实验

为了验证图核模块是否在整个模型中起关键作用以及MLP对分类准确度的影响，设计了一组消融实验，即将KerGIN模型中的图核模块（GIN-MLP），与本文方法KerGIN和基准模型GIN进行比较. 从表3可以看出，GIN-MLP和GIN的图分类准确度差异并不大，说明在KerGIN中起到关键作用的不是MLP. 比较GIN-MLP和KerGIN模型的图分类准确度可以得出：图核模块在整个模型中起了关键作用.

表 3 基于MLP与图核的消融实验

Table 3. Ablation Experiment Based on MLP and Graph Kernel

方法	MUTAG	PTC	PROTEINS	NCI1	IMDB-B	IMDB-M	COLLAB
WL	90.4	59.9	75.0	86.0	73.8	50.9	78.9
GIN	89.4	68.4	76.2	82.7	75.1	52.3	80.2
GIN-MLP	88.7	68.5	76.8	81.9	75.4	52.6	79.8
KerGIN（本文）	95.2	88.5	88.4	86.8	81.6	60.1	83.2
注：最优结果用加粗数字标识.

下载: 导出CSV

| 显示表格

为了研究注意力机制对图分类结果的影响及注意力机制的作用，选择了2种常见的融合策略进行对比实验，分别为拼接和求和，即把图结构编码和图特征编码拼接或者求和. 如表4所示， KerGIN-con表示采用拼接策略，KerGIN-sum表示采用求和策略，KerGIN-att表示采用注意力机制策略. 可以看出，采用拼接策略的分类效果不及求和策略和注意力机制策略，而采用注意力机制策略的分类准确度在这3种策略中最高. 因此，在本文中注意力机制是较适宜的融合策略. 这是因为不同数据集对于图结构编码和图特征编码的偏重程度不同，因此简单拼接和求和很难获得好的实验效果.

表 4 使用不同融合策略的消融实验

Table 4. Ablation Experiment Using with Different Fusion Strategies

策略	MUTAG	PTC	PROTEINS	NCI1	IMDB-B	IMDB-M	COLLAB
KerGIN-con	94.7	85.2	86.8	86.3	78.9	56.8	79.6
KerGIN-sum	94.9	86.3	87.5	86.3	79.4	58.6	81.7
KerGIN-att（本文）	95.2	88.5	88.4	86.8	81.6	60.1	83.2
注：最优结果用加粗数字标识.

下载: 导出CSV

| 显示表格

3.4 实验分析

本节将分析模型训练的过程以及图结构编码和图特征编码在不同类型数据集上的变化情况. 图5展示了MUTAG，PTC，PROTEINS，NCI1，IMDB-B，IMDB-M这6个数据集在训练和测试过程中的损失值随训练轮数的变化情况. 为便于排版，本文选择了前6个数据集的损失变化情况进行图例展示. 这6个数据集整体在100个训练轮数时损失下降得比较快，在400个训练轮数时损失下降的幅度较少，在600个训练轮数时基本趋向于收敛. 其中MUTAG数据集在500个训练轮数时收敛，PTC数据集在450个训练轮数时收敛，PROTEINS数据集在200个训练轮数时开始收敛，NCI1数据集在100个训练轮数时收敛. IMDB-B数据集大约在600个训练轮数时收敛，IMDB-M在200个训练轮数时开始收敛. 6个数据集在训练的过程中，训练集损失值和测试集损失值之间的差距很小，所以在训练过程中没有过拟合或欠拟合现象.

图 5 6个数据集上训练与测试的损失值变化

Figure 5. Variation of loss values for training and testing on six datasets

下载: 全尺寸图片幻灯片

任何图神经网络编码器都能用作图特征编码，本文除了实验中使用的GIN编码器，还使用了2种流行的图神经网络框架：GCN和图注意力网络GAT进行了对比实验. 实验结果如图6（a）所示，可以观察到GIN的编码效果要好于GCN和GAT. 这可能是因为GIN的图表达能力更加强大，更适用于图的特征编码. 此外，又研究了图核编码器的长度对图分类性能的影响，图6（b）显示了在7个公开数据集上16~160的不同图核编码长度的KerGIN的准确度. 可以看出，在一定范围内，图分类的准确度会随图核编码器长度的增加而增加，当图核编码器长度大于160时，图核编码器长度对图分类准确度的影响较小，所以在一定范围内核编码器长度对分类准确度有重要影响，因此适当降低核编码器的维度不会影响图分类的准确度.

图 6 基于不同图编码器的KerGIN和基于不同长度图核编码器的KerGIN

Figure 6. KerGIN based on different graph encoders and KerGIN based on different length kernel encoders

下载: 全尺寸图片幻灯片

如图7（a）所示，7个数据集权重系数都不相同，但是可以明显看出前4个数据集的图结构编码的权重大于图特征编码，后3个数据集的图结构编码的权重小于图特征编码. 由于前4种图数据集为化学分子、后3种图数据集为社交网络，因此在这2种数据集上，图结构编码和图特征编码的权重系数不同. 进一步，又探讨了不同图核函数对KerGIN分类准确度的影响，因为图核函数的选择通常是根据专家经验进行选取，很难直接确定KerGIN更适用于哪种图核函数，因此，本文选择了3种类型的图核函数RW，SP，WL，并通过实验对比展示了哪种图核更适用于KerGIN的图结构编码. 在图7（b）中，由于RW核在大数据集上运行时间过长，所以选择了在4个小数据集上进行实验. 可以清楚地看出WL核在4个数据集上综合表现最好，其次是SP核，RW核实验效果最差，此外RW核时间复杂度比较大，在一定的时间内无法完成一些大规模图数据集的实验.

图 7 图特征编码和结构编码的权重占比和不同图核下的分类准确度

Figure 7. The weight ratio of graph feature coding and structure coding, and the classification accuracy under different graph kernels

下载: 全尺寸图片幻灯片

通过以上实验分析可以得出，本文方法在化学分子数据集上的分类准确度比在社交网络数据集上具有明显的优势，因此本文方法更适用于化学分子的图分类任务. 此外，化学分子的特性受特定局部子结构的影响较大，所以图结构编码对其分类准确度起至关重要的作用，甚至1个局部子结构就可以成为其分类的主要依据，比如官能团（functional group）. 而对于社交网络数据集来说，它对图特定子结构的依赖性相对较小，而对图节点特征依赖性较大.

4. 结　　论

本文提出了一种基于GIN、图核以及注意力机制相融合的图表征学习和图分类的方法，该方法提升了GIN对图中特定结构的判别能力. 实验结果表明，图结构编码对图分类结果影响较大，将图核作为图结构编码，在一定程度上解决了基于消息传递的图神经网络无法识别图中高阶信息的问题，本文方法能够自适应调节图特征编码与图结构编码的权重，对图分类的准确度有较大的提升，在分类准确度上优于所选的一些基准方法.

作者贡献声明：徐立祥提出了算法思路和实验方案，并撰写论文；葛伟负责完成实验验证，并整理论文；陈恩红和罗斌提出指导意见并修改论文.

图 1 因果图示例

Figure 1. Example of causal graph

下载: 全尺寸图片幻灯片

图 2 前门/后门调整示例

Figure 2. Example of frontdoor/backdoor adjustment

下载: 全尺寸图片幻灯片

图 3 中介分析示例

Figure 3. Example of mediation analysis

下载: 全尺寸图片幻灯片

图 4 因果机器学习的主要研究问题总览

Figure 4. Overview of main research problems in causal machine learning

下载: 全尺寸图片幻灯片

图 5 反事实解释示例^[49]

Figure 5. Example of counterfactual explanation ^[49]

下载: 全尺寸图片幻灯片

图 6 反事实图像混合示例^[62]

Figure 6. Example of counterfactual image hybridization ^[62]

下载: 全尺寸图片幻灯片

图 7 3类反因果迁移问题的因果图^[70]

Figure 7. Causal graphs of three types of anti-causal transfer problems ^[70]

下载: 全尺寸图片幻灯片

图 8 视觉对话任务的因果图和2种校准策略^[97]

Figure 8. Causal graph and two calibration strategies in visual dialogue tasks ^[97]

下载: 全尺寸图片幻灯片

图 9 不变性学习方法的因果图^[134-135]

Figure 9. Causal graph of invariance-learning methods ^[134-135]

下载: 全尺寸图片幻灯片

图 10 广告推荐系统的因果图^[180]

Figure 10. Causal graph in advertising recommendation systems^[180]

下载: 全尺寸图片幻灯片

表 1 因果方法在可解释性问题上的应用

Table 1 Application of Causal Methods on Interpretability Problems

分类	子类别	典型思路和方法
基于归因分析	忽略特征间结构	直接计算每个输入特征对模型输出的因果效应^[40-46]
基于归因分析	考虑特征间结构	引入输入特征间的先验因果图结构，调整特征对模型输出的因果效应^[47-48]
基于反事实	输入数据反事实	在模型输入空间构造反事实样本^[49-61]
	输出数据反事实	对生成模型的中间节点进行反事实，构造反事实生成样本^[62]
	反事实可行性	对反事实操作的约束条件进行额外建模^[63-66]

下载: 导出CSV

表 2 因果方法在可迁移性问题上的应用

Table 2 Application of Causal Methods on Transferability Problems

分类	典型思路和方法
仅考虑输入输出与域变量间的因果图	求解在协变量偏移^[67]、目标偏移^[68]、条件偏移^[69]、广义目标偏移^[70-71]情况下的建模方法
考虑含其他复杂变量的因果图	引入先验因果图^[72-75]或从数据中进行因果发现^[76]

下载: 导出CSV

表 3 因果方法在鲁棒性问题上的应用

Table 3 Application of Causal Methods on Robustness Problems

分类	子类别	典型思路和方法
反事实数据增强	伪相关特征反事实	构造额外训练数据，在保持预测结果不变的前提下微调数据^[88-93]
反事实数据增强	因果特征反事实	构造额外训练数据，更改关键因果特征并修改预测结果^[92-95]
因果效应校准	基于后门调整	根据对问题的认识指出混杂因素，对其估计后消除影响^[96-99]
因果效应校准	基于中介分析	根据对问题的认识指出中介变量，对其估计后消除影响^[97,100-102]
不变性学习	稳定学习	将每个特征视为处理变量，通过样本加权消除混杂，识别因果特征^[103-107]
	不变因果预测	基于多环境训练数据，利用假设检验确定因果特征集合^[108-110]
	不变风险最小化	基于多环境训练数据，在模型优化目标中添加跨环境不变约束，学习因果特征^{[111- 113]}

下载: 导出CSV

表 4 因果方法在公平性问题上的应用

Table 4 Application of Causal Methods on Fairness Problems

分类	典型思路和方法
反事实公平性度量	提出基于反事实的个体公平性指标^[145-152]
公平模型构建	利用先验因果图指导模型的公平化构建^[153-155]

下载: 导出CSV

表 5 因果方法在反事实评估问题上的应用

Table 5 Application of Causal Methods on Counterfactual Evaluation Problems

分类	典型思路和方法
推荐系统非随机缺失问题求解	利用倾向性得分或者反事实风险最小化方法修正策略效用^[161-172]
检索系统位置偏差问题求解	利用倾向性得分方法修正相关性^[173-179]

下载: 导出CSV

参考文献(205)

[1]	LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436−444
[2]	He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification[C] //Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2015: 1026−1034
[3]	Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis [C/OL] //Proc of the 7th Int Conf on Learning Representations. 2019 [2021-11-03]. https://openreview.net/pdf?id=B1xsqj09Fm
[4]	Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1877−1901
[5]	Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484−489
[6]	Senior A W, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706−710
[7]	Gunning D, Aha D. DARPA’s explainable artificial intelligence (XAI) program[J]. AI Magazine, 2019, 40(2): 44−58
[8]	Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[C/OL] //Proc of the 2nd Int Conf on Learning Representations. 2014 [2021-11-03]. https://arxiv.org/abs/1312.6199
[9]	Barocas S, Hardt M, Narayanan A. Fairness in machine learning [EB/OL]. 2017 [2021-11-13]. https://fairmlbook.org/pdf/fairmlbook.pdf
[10]	Pearl J, Mackenzie D. The Book of Why: The New Science of Cause and Effect[M]. New York: Basic Books, 2018
[11]	Pearl J. Theoretical impediments to machine learning with seven sparks from the causal revolution[J]. arXiv preprint, arXiv: 1801.04016, 2018
[12]	苗旺, 刘春辰, 耿直. 因果推断的统计方法[J]. 中国科学: 数学, 2018, 48(12): 1753-1778 Miao Wang, Liu Chunchen, Geng Zhi. Statistical approaches for causal inference [J]. SCIENTIA SINICA Mathematica, 2018, 48(12): 1753-1778 (in Chinese)
[13]	Guo Ruocheng, Cheng Lu, Li Jundong, et al. A survey of learning causality with data: Problems and methods[J]. ACM Computing Surveys, 2020, 53(4): 1−37
[14]	Kuang Kun, Li Lian, Geng Zhi, et al. Causal inference [J]. Engineering, 2020, 6(3): 253−263
[15]	Schölkopf B. Causality for machine learning [J]. arXiv preprint, arXiv: 1911.10500, 2019
[16]	Schölkopf B, Locatello F, Bauer S, et al. Toward causal representation learning[J]. Proceedings of the IEEE, 2021, 109(5): 612−634
[17]	Splawa-Neyman J, Dabrowska D M, Speed T P. On the application of probability theory to agricultural experiments. Essay on principles. Section 9[J]. Statistical Science, 1990, 5(4): 465−472
[18]	Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies[J]. Journal of Educational Psychology, 1974, 66(5): 688−701
[19]	Pearl J. Causality[M]. Cambridge, UK: Cambridge University Press, 2009
[20]	Granger C W J. Investigating causal relations by econometric models and cross-spectral methods[J]. Econometrica, 1969, 37(3): 424−438
[21]	Rubin D B. Randomization analysis of experimental data: The Fisher randomization test comment[J]. Journal of the American Statistical Association, 1980, 75(371): 591−593
[22]	Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects[J]. Biometrika, 1983, 70(1): 41−55
[23]	Hirano K, Imbens G W, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score[J]. Econometrica, 2003, 71(4): 1161−1189
[24]	Robins J M, Rotnitzky A, Zhao Lueping. Estimation of regression coefficients when some regressors are not always observed[J]. Journal of the American Statistical Association, 1994, 89(427): 846−866
[25]	Dudík M, Langford J, Li Lihong. Doubly robust policy evaluation and learning[C] //Proc of the 28th Int Conf on Machine Learning. Madison, WI: Omnipress, 2011: 1097−1104
[26]	Kuang Kun, Cui Peng, Li Bo, et al. Estimating treatment effect in the wild via differentiated confounder balancing[C] //Proc of the 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2017: 265−274
[27]	Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences[M]. Cambridge, UK: Cambridge University Press, 2015
[28]	Yao Liuyi, Chu Zhixuan, Li Sheng, et al. A survey on causal inference [J]. arXiv preprint, arXiv: 2002.02770, 2020
[29]	Pearl J. Causal diagrams for empirical research[J]. Biometrika, 1995, 82(4): 669−688
[30]	Spirtes P, Glymour C. An algorithm for fast recovery of sparse causal graphs[J]. Social Science Computer Review, 1991, 9(1): 62−72
[31]	Verma T, Pearl J. Equivalence and synthesis of causal models[C] //Proc of the 6th Annual Conf on Uncertainty in Artificial Intelligence. Amsterdam: Elsevier, 1990: 255−270
[32]	Spirtes P, Glymour C N, Scheines R, et al. Causation, Prediction, and Search[M]. Cambridge, MA: MIT Press, 2000
[33]	Schwarz G. Estimating the dimension of a model[J]. The Annals of Statistics, 1978, 6(2): 461−464
[34]	Chickering D M. Optimal structure identification with greedy search[J]. Journal of Machine Learning Research, 2002, 3(Nov): 507−554
[35]	Shimizu S, Hoyer P O, Hyvärinen A, et al. A linear non-Gaussian acyclic model for causal discovery[J]. Journal of Machine Learning Research, 2006, 7(10): 2003−2030
[36]	Zhang Kun, Hyvärinen A. On the identifiability of the post-nonlinear causal model[C] //Proc of the 25th Conf on Uncertainty in Artificial Intelligence. Arlington, VA: AUAI Press, 2009: 647−655
[37]	Pearl J. Direct and indirect effects[C] //Proc of the 17th Conf on Uncertainty in Artificial Intelligence. San Francisco, CA: Morgan Kaufmann Publishers Inc, 2001: 411−420
[38]	VanderWeele T. Explanation in Causal Inference: Methods for Mediation and Interaction[M]. Oxford, UK: Oxford University Press, 2015
[39]	陈珂锐,孟小峰. 机器学习的可解释性[J]. 计算机研究与发展,2020,57(9):1971−1986 doi: 10.7544/issn1000-1239.2020.20190456 Chen Kerui, Meng Xiaofeng. Interpretation and understanding in machine learning[J]. Journal of Computer Research and Development, 2020, 57(9): 1971−1986 (in Chinese) doi: 10.7544/issn1000-1239.2020.20190456
[40]	Ribeiro M T, Singh S, Guestrin C. “Why should I trust you?” Explaining the predictions of any classifier[C] //Proc of the 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1135−1144
[41]	Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C] //Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 618-626
[42]	Sundararajan M, Taly A, Yan Qiqi. Axiomatic attribution for deep networks[C] //Proc of the 34th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2017: 3319−3328
[43]	Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C] //Proc of the 31st Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2017: 4765−4774
[44]	Alvarez-Melis D, Jaakkola T. A causal framework for explaining the predictions of black-box sequence-to-sequence models[C] //Proc of the 2017 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2017: 412−421
[45]	Schwab P, Karlen W. CXPlain: Causal explanations for model interpretation under uncertainty[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 10220−10230
[46]	Chattopadhyay A, Manupriya P, Sarkar A, et al. Neural network attributions: A causal perspective[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 981−990
[47]	Frye C, Rowat C, Feige I. Asymmetric Shapley values: Incorporating causal knowledge into model-agnostic explainability[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1229−1239
[48]	Heskes T, Sijben E, Bucur I G, et al. Causal Shapley values: Exploiting causal knowledge to explain individual predictions of complex models [C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 4778−4789
[49]	Goyal Y, Wu Ziyan, Ernst J, et al. Counterfactual visual explanations[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 2376−2384
[50]	Wang Pei, Vasconcelos N. SCOUT: Self-aware discriminant counterfactual explanations[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8981−8990
[51]	Hendricks L A, Hu Ronghang, Darrell T, et al. Generating counterfactual explanations with natural language[J]. arXiv preprint, arXiv: 1806.09809, 2018
[52]	Chang Chunhao, Creager E, Goldenberg A, et al. Explaining image classifiers by counterfactual generation[C/OL] //Proc of the 7th Int Conf on Learning Representations, 2019 [2021-11-03]. https://openreview.net/pdf?id=B1MXz20cYQ
[53]	Kanehira A, Takemoto K, Inayoshi S, et al. Multimodal explanations by predicting counterfactuality in videos[C] //Proc of the 32nd IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 8594−8602
[54]	Akula A R, Wang Shuai, Zhu Songchun. CoCoX: Generating conceptual and counterfactual explanations via fault-lines[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 2594−2601
[55]	Madumal P, Miller T, Sonenberg L, et al. Explainable reinforcement learning through a causal lens[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 2493−2500
[56]	Mothilal R K, Sharma A, Tan C. Explaining machine learning classifiers through diverse counterfactual explanations[C] //Proc of the 2020 Conf on Fairness, Accountability, and Transparency. New York: ACM, 2020: 607−617
[57]	Albini E, Rago A, Baroni P, et al. Relation-based counterfactual explanations for Bayesian network classifiers[C] //Proc of the 29th Int Joint Conf on Artificial Intelligence, Red Hook, NY: Curran Associates Inc, 2020: 451−457
[58]	Kenny E M, Keane M T. On generating plausible counterfactual and semi-factual explanations for deep learning[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 11575−11585
[59]	Abrate C, Bonchi F. Counterfactual graphs for explainable classification of brain networks[J]. arXiv preprint, arXiv: 2106.08640, 2021
[60]	Yang Fan, Alva S S, Chen Jiahao, et al. Model-based counterfactual synthesizer for interpretation[J]. arXiv preprint, arXiv: 2106.08971, 2021
[61]	Parmentier A, Vidal T. Optimal counterfactual explanations in tree ensembles[J]. arXiv preprint, arXiv: 2106.06631, 2021
[62]	Besserve M, Mehrjou A, Sun R, et al. Counterfactuals uncover the modular structure of deep generative models[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-03]. https://openreview.net/pdf?id=SJxDDpEKvH
[63]	Kanamori K, Takagi T, Kobayashi K, et al. DACE: Distribution-aware counterfactual explanation by mixed-integer linear optimization[C] //Proc of the 19th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2020: 2855−2862
[64]	Kanamori K, Takagi T, Kobayashi K, et al. Ordered counterfactual explanation by mixed-integer linear optimization[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 11564−11574
[65]	Tsirtsis S, Gomez-Rodriguez M. Decisions, counterfactual explanations and strategic behavior[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 16749−16760
[66]	Karimi A H, von Kügelgen B J, Schölkopf B, et al. Algorithmic recourse under imperfect causal knowledge: A probabilistic approach[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 265−277
[67]	Rojas-Carulla M, Schölkopf B, Turner R, et al. Invariant models for causal transfer learning[J]. The Journal of Machine Learning Research, 2018, 19(1): 1309−1342
[68]	Guo Jiaxian, Gong Mingming, Liu Tongliang, et al. LTF: A label transformation framework for correcting target shift[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 3843−3853
[69]	Cai Ruichu, Li Zijian, Wei Pengfei, et al. Learning disentangled semantic representation for domain adaptation[C] //Proc of the 28th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2019: 2060−2066
[70]	Zhang Kun, Schölkopf B, Muandet K, et al. Domain adaptation under target and conditional shift[C] //Proc of the 30th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2013: 819−827
[71]	Gong Mingming, Zhang Kun, Liu Tongliang, et al. Domain adaptation with conditional transferable components[C] //Proc of the 33rd Int Conf on Machine Learning. Cambridge, MA: JMLR, 2016: 2839−2848
[72]	Teshima T, Sato I, Sugiyama M. Few-shot domain adaptation by causal mechanism transfer[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 9458−9469
[73]	Edmonds M, Ma Xiaojian, Qi Siyuan, et al. Theory-based causal transfer: Integrating instance-level induction and abstract-level structure learning[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 1283−1291
[74]	Etesami J, Geiger P. Causal transfer for imitation learning and decision making under sensor-shift[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 10118−10125
[75]	Yue Zhongqi, Zhang Hanwang, Sun Qianru, et al. Interventional few-shot learning[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 2734−2746
[76]	Zhang Kun, Gong Mingming, Stojanov P, et al. Domain adaptation as a problem of inference on graphical models[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 4965−4976
[77]	Schölkopf B, Janzing D, Peters J, et al. On causal and anticausal learning[C] //Proc of the 29th Int Conf on Machine Learning. Madison, WI: Omnipress, 2012: 459−466
[78]	Zhang Kun, Gong Mingming, Schölkopf B. Multi-source domain adaptation: A causal view[C] //Proc of the 29th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2015: 3150−3157
[79]	Bagnell J A. Robust supervised learning[C] //Proc of the 20th National Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2005: 714−719
[80]	Hu Weihua, Niu Gang, Sato I, et al. Does distributionally robust supervised learning give robust classifiers?[C] //Proc of the 35th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2018: 2029−2037
[81]	Rahimian H, Mehrotra S. Distributionally robust optimization: A review[J]. arXiv preprint, arXiv: 1908.05659, 2019
[82]	Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[C/OL] //Proc of the 5th Int Conf on Learning Representations. 2017 [2021-11-14]. https://openreview.net/pdf?id=B1xsqj09Fm
[83]	Xu Han, Ma Yao, Liu Haochen, et al. Adversarial attacks and defenses in images, graphs and text: A review[J]. International Journal of Automation and Computing, 2020, 17(2): 151−178
[84]	Gururangan S, Swayamdipta S, Levy O, et al. Annotation artifacts in natural language inference data[C] //Proc of the 16th Conf of the North American Chapter of the ACL: Human Language Technologies, Vol 2. Stroudsburg, PA: ACL, 2018: 107−112
[85]	Zhang Guanhua, Bai Bing, Liang Jian, et al. Selection bias explorations and debias methods for natural language sentence matching datasets[C] //Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4418−4429
[86]	Clark C, Yatskar M, Zettlemoyer L. Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases[C] //Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 4060−4073
[87]	Cadene R, Dancette C, Cord M, et al. Rubi: Reducing unimodal biases for visual question answering[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 841−852
[88]	Lu Kaiji, Mardziel P, Wu Fangjing, et al. Gender bias in neural natural language processing[G] //LNCS 12300: Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday. Berlin: Springer, 2020: 189−202
[89]	Maudslay R H, Gonen H, Cotterell R, et al. It’s all in the name: Mitigating gender bias with name-based counterfactual data substitution[C] //Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 5270−5278
[90]	Zmigrod R, Mielke S J, Wallach H, et al. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology[C] //Proc of the 57th Annual Meeting of the ACL. Stroudsburg, PA: ACL, 2019: 1651−1661
[91]	Kaushik D, Hovy E, Lipton Z. Learning the difference that makes a difference with counterfactually-augmented data[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-14]. https://openreview.net/pdf?id=Sklgs0NFvr
[92]	Agarwal V, Shetty R, Fritz M. Towards causal VQA: Revealing and reducing spurious correlations by invariant and covariant semantic editing[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 9690−9698
[93]	Chang Chunhao, Adam G A, Goldenberg A. Towards robust classification model by counterfactual and invariant data generation[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 15212−15221
[94]	Wang Zhao, Culotta A. Robustness to spurious correlations in text classification via automatically generated counterfactuals[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 14024−14031
[95]	Chen Long, Yan Xin, Xiao Jun, et al. Counterfactual samples synthesizing for robust visual question answering[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10800−10809
[96]	Wu Yiquan, Kuang Kun, Zhang Yating, et al. De-biased court’s view generation with causality[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 763−780
[97]	Qi Jia, Niu Yulei, Huang Jianqiang, et al. Two causal principles for improving visual dialog[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10860−10869
[98]	Wang Tan, Huang Jiangqiang, Zhang Hanwang, et al. Visual commonsense R-CNN[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10760−10770
[99]	Zhang Dong, Zhang Hanwang, Tang Jinhui, et al. Causal intervention for weakly-supervised semantic segmentation[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 655−666
[100]	Tang Kaihua, Niu Yulei, Huang Jianqiang, et al. Unbiased scene graph generation from biased training[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 3716−3725
[101]	Tang Kaihua, Huang Jianqiang, Zhang Hanwang. Long-tailed classification by keeping the good and removing the bad momentum causal effect[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1513−1524
[102]	Niu Yulei, Tang Kaihua, Zhang Hanwang, et al. Counterfactual VQA: A cause-effect look at language bias[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 12700−12710
[103]	Kuang Kun, Cui Peng, Athey S, et al. Stable prediction across unknown environments[C] //Proc of the 24th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining. New York: ACM, 2018: 1617−1626
[104]	Shen Zheyan, Cui Peng, Kuang Kun, et al. Causally regularized learning with agnostic data selection bias[C] //Proc of the 26th ACM Int Conf on Multimedia. New York: ACM, 2018: 411−419
[105]	Kuang Kun, Xiong Ruoxuan, Cui Peng, et al. Stable prediction with model misspecification and agnostic distribution shift[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 4485−4492
[106]	Shen Zheyan, Cui Peng, Zhang Tong, et al. Stable learning via sample reweighting[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 5692−5699
[107]	Zhang Xingxuan, Cui Peng, Xu Renzhe, et al. Deep stable learning for out-of-distribution generalization[J]. arXiv preprint, arXiv: 2104.07876, 2021
[108]	Peters J, Bühlmann P, Meinshausen N. Causal inference by using invariant prediction: Identification and confidence intervals[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2016, 78(5): 947−1012
[109]	Christina H D, Nicolai M, Jonas P. Invariant causal prediction for nonlinear models[J/OL]. Journal of Causal Inference, 2018, 6(2): 20170016 [2021-11-15]. https://www.degruyter.com/document/doi/10.1515/jci-2017-0016/pdf
[110]	Pfister N, Bühlmann P, Peters J. Invariant causal prediction for sequential data[J]. Journal of the American Statistical Association, 2019, 114(527): 1264−1276
[111]	Arjovsky M, Bottou L, Gulrajani I, et al. Invariant risk minimization[J]. arXiv preprint, arXiv: 1907.02893, 2019
[112]	Zhang A, Lyle C, Sodhani S, et al. Invariant causal prediction for block MDPs[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 11214−11224
[113]	Creager E, Jacobsen J H, Zemel R. Environment inference for invariant learning[C] //Proc of the 38th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2021: 2189−2200
[114]	Kaushik D, Setlur A, Hovy E H, et al. Explaining the efficacy of counterfactually augmented data[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2021 [2021-11-14]. https://openreview.net/pdf?id=HHiiQKWsOcV
[115]	Abbasnejad E, Teney D, Parvaneh A, et al. Counterfactual vision and language learning[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10044−10054
[116]	Liang Zujie, Jiang Weitao, Hu Haifeng, et al. Learning to contrast the counterfactual samples for robust visual question answering[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 3285−3292
[117]	Teney D, Abbasnedjad E, van den Hengel A. Learning what makes a difference from counterfactual examples and gradient supervision[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 580−599
[118]	Fu T J, Wang X E, Peterson M F, et al. Counterfactual vision-and-language navigation via adversarial path sampler[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 71−86
[119]	Parvaneh A, Abbasnejad E, Teney D, et al. Counterfactual vision-and-language navigation: Unravelling the unseen[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 5296−5307
[120]	Sauer A, Geiger A. Counterfactual generative networks[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2021 [2021-11-14]. https://openreview.net/pdf?id=BXewfAYMmJw
[121]	Mao Chengzhi, Cha A, Gupta A, et al. Generative interventions for causal learning[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 3947−3956
[122]	Zeng Xiangji, Li Yunliang, Zhai Yuchen, et al. Counterfactual generator: A weakly-supervised method for named entity recognition[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 7270−7280
[123]	Fu T J, Wang Xin, Grafton S, et al. Iterative language-based image editing via self-supervised counterfactual reasoning[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 4413−4422
[124]	Pitis S, Creager E, Garg A. Counterfactual data augmentation using locally factored dynamics[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 3976−3990
[125]	Zhang Junzhe, Kumor D, Bareinboim E. Causal imitation learning with unobserved confounders[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 12263−12274
[126]	Coston A, Kennedy E, Chouldechova A. Counterfactual predictions under runtime confounding[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 4150−4162
[127]	Atzmon Y, Kreuk F, Shalit U, et al. A causal view of compositional zero-shot recognition[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1462−1473
[128]	Yang Zekun, Feng Juan. A causal inference method for reducing gender bias in word embedding relations[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 9434−9441
[129]	Schölkopf B, Hogg D W, Wang Dun, et al. Modeling confounding by half-sibling regression[J]. Proceedings of the National Academy of Sciences, 2016, 113(27): 7391−7398 doi: 10.1073/pnas.1511656113
[130]	Shin S, Song K, Jang J H, et al. Neutralizing gender bias in word embedding with latent disentanglement and counterfactual generation[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing: Findings. Stroudsburg, PA: ACL, 2020: 3126−3140
[131]	Yang Zekun, Liu Tianlin. Causally denoise word embeddings using half-sibling regression[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 9426−9433
[132]	Yang Xu, Zhang Hanwang, Qi Guojin, et al. Causal attention for vision-language tasks[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 9847−9857
[133]	Tople S, Sharma A, Nori A. Alleviating privacy attacks via causal learning[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 9537−9547
[134]	Zhang Cheng, Zhang Kun, Li Yingzhen. A causal view on robustness of neural networks[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 289−301
[135]	Sun Xinwei, Wu Botong, Liu Chang, et al. Latent causal invariant model[J]. arXiv preprint, arXiv: 2011.02203, 2020
[136]	Mitrovic J, McWilliams B, Walker J C, et al. Representation learning via invariant causal mechanisms[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2021 [2021-11-14]. https://openreview.net/pdf?id=9p2ekP904Rs
[137]	Mahajan D, Tople S, Sharma A. Domain generalization using causal matching[J]. arXiv preprint, arXiv: 2006.07500, 2020
[138]	Zhang Weijia, Liu Lin, Li Jiuyong. Robust multi-instance learning with stable instances[C] //Proc of the 24th European Conf on Artificial Intelligence. Ohmsha: IOS, 2020: 1682−1689
[139]	Kleinberg J, Mullainathan S, Raghavan M. Inherent trade-offs in the fair determination of risk scores[J]. arXiv preprint, arXiv: 1609.05807, 2016
[140]	Grgic-Hlaca N, Zafar M B, Gummadi K P, et al. The case for process fairness in learning: Feature selection for fair decision making[C/OL] //Proc of Symp on Machine Learning and the Law at the 30th Conf on Neural Information Processing Systems. 2016 [2021-11-17]. http: //www.mlandthelaw.org/papers/grgic.pdf
[141]	Dwork C, Hardt M, Pitassi T, et al. Fairness through awareness[C] //Proc of the 3rd Innovations in Theoretical Computer Science Conf. New York: ACM, 2012: 214−226
[142]	Calders T, Kamiran F, Pechenizkiy M. Building classifiers with independency constraints[C] //Proc of the 9th IEEE Int Conf on Data Mining Workshops. Piscataway, NJ: IEEE, 2009: 13−18
[143]	Hardt M, Price E, Srebro N. Equality of opportunity in supervised learning[C] //Proc of the 30th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2016: 3315−3323
[144]	Xu Renzhe, Cui Peng, Kuang Kun, et al. Algorithmic decision making with conditional fairness[C] //Proc of the 26th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining. New York: ACM, 2020: 2125−2135
[145]	Kusner M J, Loftus J, Russell C, et al. Counterfactual fairness[C] //Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 4066−4076
[146]	Kilbertus N, Rojas-Carulla M, Parascandolo G, et al. Avoiding discrimination through causal reasoning[C] //Proc of the 31st Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2017: 656−666
[147]	Nabi R, Shpitser I. Fair inference on outcomes[C] //Proc of the 32nd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018: 1931−1940
[148]	Chiappa S. Path-specific counterfactual fairness[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 7801−7808
[149]	Wu Yongkai, Zhang Lu, Wu Xintao, et al. PC-fairness: A unified framework for measuring causality-based fairness[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 3404−3414
[150]	Wu Yongkai, Zhang Lu, Wu Xintao. Counterfactual fairness: Unidentification, bound and algorithm[C] //Proc of the 28th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2019: 1438−1444
[151]	Huang P S, Zhang Huan, Jiang R, et al. Reducing sentiment bias in language models via counterfactual evaluation[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing: Findings. Stroudsburg, PA: ACL, 2020: 65−83
[152]	Garg S, Perot V, Limtiaco N, et al. Counterfactual fairness in text classification through robustness[C] //Proc of the 33rd AAAI/ACM Conf on AI, Ethics, and Society. Menlo Park, CA: AAAI, 2019: 219−226
[153]	Hu Yaowei, Wu Yongkai, Zhang Lu, et al. Fair multiple decision making through soft interventions[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 17965−17975
[154]	Goel N, Amayuelas A, Deshpande A, et al. The importance of modeling data missingness in algorithmic fairness: A causal perspective[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 7564−7573
[155]	Xu Depeng, Wu Yongkai, Yuan Shuhan, et al. Achieving causal fairness through generative adversarial networks[C] //Proc of the 28th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2019: 1452−1458
[156]	Khademi A, Lee S, Foley D, et al. Fairness in algorithmic decision making: An excursion through the lens of causality[C] //Proc of the 28th World Wide Web Conf. New York: ACM, 2019: 2907−2914
[157]	Zhang Junzhe, Bareinboim E. Fairness in decision-making—The causal explanation formula[C] //Proc of the 32nd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018: 2037−2045
[158]	Zhang Junzhe, Bareinboim E. Equality of opportunity in classification: A causal approach[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2018: 3671−3681
[159]	Wang Hao, Ustun B, Calmon F. Repairing without retraining: Avoiding disparate impact with counterfactual distributions[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 6618−6627
[160]	Creager E, Madras D, Pitassi T, et al. Causal modeling for fairness in dynamical systems[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 2185−2195
[161]	Swaminathan A, Joachims T. Batch learning from logged bandit feedback through counterfactual risk minimization[J]. The Journal of Machine Learning Research, 2015, 16(1): 1731−1755
[162]	Swaminathan A, Joachims T. The self-normalized estimator for counterfactual learning[C] //Proc of the 29th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2015: 3231−3239
[163]	Wu Hang, Wang May. Variance regularized counterfactual risk minimization via variational divergence minimization[C] //Proc of the 35th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2018: 5353−5362
[164]	London B, Sandler T. Bayesian counterfactual risk minimization[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 4125−4133
[165]	Faury L, Tanielian U, Dohmatob E, et al. Distributionally robust counterfactual risk minimization[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 3850−3857
[166]	Schnabel T, Swaminathan A, Singh A, et al. Recommendations as treatments: Debiasing learning and evaluation[C] //Proc of the 33rd Int Conf on Machine Learning. Cambridge, MA: JMLR, 2016: 1670−1679
[167]	Yang Longqi, Cui Yin, Xuan Yuan, et al. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback[C] //Proc of the 12th ACM Conf on Recommender Systems. New York: ACM, 2018: 279−287
[168]	Bonner S, Vasile F. Causal embeddings for recommendation[C] //Proc of the 12th ACM Conf on Recommender Systems. New York: ACM, 2018: 104−112
[169]	Narita Y, Yasui S, Yata K. Efficient counterfactual learning from bandit feedback[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 4634−4641
[170]	Zou Hao, Cui Peng, Li Bo, et al. Counterfactual prediction for bundle treatment[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 19705−19715
[171]	Xu Da, Ruan Chuanwei, Korpeoglu E, et al. Adversarial counterfactual learning and evaluation for recommender system[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 13515−13526
[172]	Lopez R, Li Chenchen, Yan Xiang, et al. Cost-effective incentive allocation via structured counterfactual inference[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 4997−5004
[173]	Joachims T, Swaminathan A, Schnabel T. Unbiased learning-to-rank with biased feedback[C] //Proc of the 10th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2017: 781−789
[174]	Wang Xuanhui, Golbandi N, Bendersky M, et al. Position bias estimation for unbiased learning to rank in personal search[C] //Proc of the 11th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2018: 610−618
[175]	Ai Qingyao, Bi Keping, Luo Cheng, et al. Unbiased learning to rank with unbiased propensity estimation[C] //Proc of the 41st Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2018: 385−394
[176]	Agarwal A, Takatsu K, Zaitsev I, et al. A general framework for counterfactual learning-to-rank[C] //Proc of the 42nd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2019: 5−14
[177]	Jagerman R, de Rijke M. Accelerated convergence for counterfactual learning to rank[C] //Proc of the 43rd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2020: 469−478
[178]	Vardasbi A, de Rijke M, Markov I. Cascade model-based propensity estimation for counterfactual learning to rank[C] //Proc of the 43rd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2020: 2089−2092
[179]	Jagerman R, Oosterhuis H, de Rijke M. To model or to intervene: A comparison of counterfactual and online learning to rank from user interactions[C] //Proc of the 42nd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2019: 15−24
[180]	Bottou L, Peters J, Quiñonero-Candela J, et al. Counterfactual reasoning and learning systems: The example of computational advertising[J]. The Journal of Machine Learning Research, 2013, 14(1): 3207−3260
[181]	Lawrence C, Riezler S. Improving a neural semantic parser by counterfactual learning from human bandit feedback[C] //Proc of the 56th Annual Meeting of the ACL, Vol 1. Stroudsburg, PA: ACL, 2018: 1820−1830
[182]	Bareinboim E, Forney A, Pearl J. Bandits with unobserved confounders: A causal approach[C] //Proc of the 29th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2015: 1342−1350
[183]	Lee S, Bareinboim E. Structural causal bandits: Where to intervene?[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2018: 2568−2578
[184]	Lee S, Bareinboim E. Structural causal bandits with non-manipulable variables[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 4164−4172
[185]	Haan P, Jayaraman D, Levine S. Causal confusion in imitation learning[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 11698−11709
[186]	Kyono T, Zhang Yao, van der Schaar M. CASTLE: Regularization via auxiliary causal graph discovery[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1501−1512
[187]	Yang Mengyue, Liu Frurui, Chen Zhitang, et al. CausalVAE: Disentangled representation learning via neural structural causal models[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 9593−9602
[188]	Zinkevich M, Johanson M, Bowling M, et al. Regret minimization in games with incomplete information[C] //Proc of the 20th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2007: 1729−1736
[189]	Brown N, Lerer A, Gross S, et al. Deep counterfactual regret minimization[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 793−802
[190]	Farina G, Kroer C, Brown N, et al. Stable-predictive optimistic counterfactual regret minimization[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 1853−1862
[191]	Brown N, Sandholm T. Solving imperfect-information games via discounted regret minimization[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 1829−1836
[192]	Li Hui, Hu Kailiang, Zhang Shaohua, et al. Double neural counterfactual regret minimization[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-14]. https://openreview.net/pdf?id=ByedzkrKvH
[193]	Oberst M, Sontag D. Counterfactual off-policy evaluation with Gumbel-max structural causal models[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 4881−4890
[194]	Buesing L, Weber T, Zwols Y, et al. Woulda, coulda, shoulda: Counterfactually-guided policy search[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2019 [2021-11-14]. https://openreview.net/pdf?id=BJG0voC9YQ
[195]	Chen Long, Zhang Hanwang, Xiao Jun, et al. Counterfactual critic multi-agent training for scene graph generation[C] //Proc of the 2019 IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 4613−4623
[196]	Zhu Qingfu, Zhang Weinan, Liu Ting, et al. Counterfactual off-policy training for neural dialogue generation[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 3438−3448
[197]	Choi S, Park H, Yeo J, et al. Less is more: Attention supervision with counterfactuals for text classification[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 6695−6704
[198]	Zhang Zhu, Zhao Zhou, Lin Zhejie, et al. Counterfactual contrastive learning for weakly-supervised vision-language grounding[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 655-666
[199]	Kocaoglu M, Snyder C, Dimakis A G, et al. CausalGAN: Learning causal implicit generative models with adversarial training[C] //Proc of the 6th Int Conf on Learning Representations, 2018 [2021-11-03]. https://openreview.net/pdf?id=BJE-4xW0W
[200]	Kim H, Shin S, Jang J H, et al. Counterfactual fairness with disentangled causal effect variational autoencoder[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 8128−8136
[201]	Qin Lianhui, Bosselut A, Holtzman A, et al. Counterfactual story reasoning and generation[C] //Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 5046−5056
[202]	Hao Changying, Pang Liang, Lan Yanyan, et al. Sketch and customize: A counterfactual story generator[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 12955−12962.
[203]	Madaan N, Padhi I, Panwar N, et al. Generate your counterfactuals: Towards controlled counterfactual generation for text[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 13516−13524
[204]	Peysakhovich A, Kroer C, Lerer A. Robust multi-agent counterfactual prediction[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 3083−3093
[205]	Baradel F, Neverova N, Mille J, et al. CoPhy: Counterfactual learning of physical dynamics[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-14]. https://openreview.net/pdf?id=SkeyppEFvS

施引文献(1)

期刊类型引用(0)

其他类型引用(1)

资源附件(0)

图(10) / 表(5)

计量

文章访问数: 2386
HTML全文浏览量: 238
PDF下载量: 1140
被引次数: 1

1. 相关工作
1.1 基于WL核的图分类方法
1.2 基于图同构网络（GIN）的图分类方法
1.3 Nyström方法
2. 图核同构网络KerGIN
2.1 GIN编码
2.2 WL图核矩阵的生成
2.3 Nyström降秩分解
2.4 GIN编码和WL核编码的融合
3. 实　　验
3.1 数据集的介绍和实验设置
3.2 实验结果
3.3 消融实验
3.4 实验分析
4. 结　　论

1. 相关工作
1.1 基于WL核的图分类方法
1.2 基于图同构网络（GIN）的图分类方法
1.3 Nyström方法
2. 图核同构网络KerGIN
2.1 GIN编码
2.2 WL图核矩阵的生成
2.3 Nyström降秩分解
2.4 GIN编码和WL核编码的融合
3. 实　　验
3.1 数据集的介绍和实验设置
3.2 实验结果
3.3 消融实验
3.4 实验分析
4. 结　　论

参考文献(205)

施引文献(1)

资源附件(0)

分类	典型思路和方法
推荐系统非随机缺失问题求解	利用倾向性得分或者反事实风险最小化方法修正策略效用^[161-172]
检索系统位置偏差问题求解	利用倾向性得分方法修正相关性^[173-179]

因果机器学习的前沿进展综述

通讯作者: 兰艳艳（lanyanyan@tsinghua.edu.cn）

计量

出版历程

Overview of the Frontier Progress of Causal Machine Learning

1. 相关工作

1.1 基于WL核的图分类方法

1.2 基于图同构网络（GIN）的图分类方法

1.3 Nyström方法

2. 图核同构网络KerGIN

2.1 GIN编码

2.2 WL图核矩阵的生成

2.3 Nyström降秩分解

2.4 GIN编码和WL核编码的融合

3. 实 验

3.1 数据集的介绍和实验设置

3.2 实验结果

3.3 消融实验

3.4 实验分析

4. 结 论

期刊类型引用(0)

其他类型引用(1)

计量

出版历程

目录

1. 相关工作

1.1 基于WL核的图分类方法

1.2 基于图同构网络（GIN）的图分类方法

1.3 Nyström方法

2. 图核同构网络KerGIN

2.1 GIN编码

2.2 WL图核矩阵的生成

2.3 Nyström降秩分解

2.4 GIN编码和WL核编码的融合

3. 实 验

3.1 数据集的介绍和实验设置

3.2 实验结果

3.3 消融实验

3.4 实验分析

4. 结 论

通讯作者:
兰艳艳（lanyanyan@tsinghua.edu.cn）

3. 实　　验

4. 结　　论

3. 实　　验

4. 结　　论