Processing math: 1%
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

因果机器学习的前沿进展综述

李家宁, 熊睿彬, 兰艳艳, 庞亮, 郭嘉丰, 程学旗

李家宁, 熊睿彬, 兰艳艳, 庞亮, 郭嘉丰, 程学旗. 因果机器学习的前沿进展综述[J]. 计算机研究与发展, 2023, 60(1): 59-84. DOI: 10.7544/issn1000-1239.202110780
引用本文: 李家宁, 熊睿彬, 兰艳艳, 庞亮, 郭嘉丰, 程学旗. 因果机器学习的前沿进展综述[J]. 计算机研究与发展, 2023, 60(1): 59-84. DOI: 10.7544/issn1000-1239.202110780
Li Jianing, Xiong Ruibin, Lan Yanyan, Pang Liang, Guo Jiafeng, Cheng Xueqi. Overview of the Frontier Progress of Causal Machine Learning[J]. Journal of Computer Research and Development, 2023, 60(1): 59-84. DOI: 10.7544/issn1000-1239.202110780
Citation: Li Jianing, Xiong Ruibin, Lan Yanyan, Pang Liang, Guo Jiafeng, Cheng Xueqi. Overview of the Frontier Progress of Causal Machine Learning[J]. Journal of Computer Research and Development, 2023, 60(1): 59-84. DOI: 10.7544/issn1000-1239.202110780
李家宁, 熊睿彬, 兰艳艳, 庞亮, 郭嘉丰, 程学旗. 因果机器学习的前沿进展综述[J]. 计算机研究与发展, 2023, 60(1): 59-84. CSTR: 32373.14.issn1000-1239.202110780
引用本文: 李家宁, 熊睿彬, 兰艳艳, 庞亮, 郭嘉丰, 程学旗. 因果机器学习的前沿进展综述[J]. 计算机研究与发展, 2023, 60(1): 59-84. CSTR: 32373.14.issn1000-1239.202110780
Li Jianing, Xiong Ruibin, Lan Yanyan, Pang Liang, Guo Jiafeng, Cheng Xueqi. Overview of the Frontier Progress of Causal Machine Learning[J]. Journal of Computer Research and Development, 2023, 60(1): 59-84. CSTR: 32373.14.issn1000-1239.202110780
Citation: Li Jianing, Xiong Ruibin, Lan Yanyan, Pang Liang, Guo Jiafeng, Cheng Xueqi. Overview of the Frontier Progress of Causal Machine Learning[J]. Journal of Computer Research and Development, 2023, 60(1): 59-84. CSTR: 32373.14.issn1000-1239.202110780

因果机器学习的前沿进展综述

基金项目: 国家自然科学基金项目(61722211, 61773362, 61906180);中国科学院青年创新促进会(20144310);联想-中科院联合实验室青年科学家项目;重庆市基础科学与前沿技术研究专项项目(重点)(cstc2017jcyjBX0059)
详细信息
    作者简介:

    李家宁: 1992年生.博士.主要研究方向为机器学习和文本生成

    熊睿彬: 1996年生.硕士.主要研究方向为机器学习和自然语言处理

    兰艳艳: 1982年生.博士,教授.CCF高级会员.主要研究方向为机器学习、信息检索和自然语言处理

    庞亮: 1990年生.博士,副研究员.CCF会员.主要研究方向为自然语言生成和信息检索

    郭嘉丰: 1980年生.博士,研究员.CCF会员.主要研究方向为数据挖掘和信息检索.

    程学旗: 1971年生.博士,研究员.CCF会士.主要研究方向为网络科学与社会计算、互联网搜索与挖掘、互联网信息安全、分布式系统和大规模仿真平台.

    通讯作者:

    兰艳艳(lanyanyan@tsinghua.edu.cn

  • 中图分类号: TP181

Overview of the Frontier Progress of Causal Machine Learning

Funds: This work was supported by the National Natural Science Foundation of China (61722211, 61773362, 61906180), the Youth Innovation Promotion Association CAS(20144310), the Lenovo-CAS Joint Lab Youth Scientist Project, and the Project of Chongqing Research Program of Basic Research and Frontier Technology (cstc2017jcyjBX0059).
  • 摘要:

    机器学习是实现人工智能的重要技术手段之一,在计算机视觉、自然语言处理、搜索引擎与推荐系统等领域有着重要应用.现有的机器学习方法往往注重数据中的相关关系而忽视其中的因果关系,而随着应用需求的提高,其弊端也逐渐开始显现,在可解释性、可迁移性、鲁棒性和公平性等方面面临一系列亟待解决的问题.为了解决这些问题,研究者们开始重新审视因果关系建模的必要性,相关方法也成为近期的研究热点之一.在此对近年来在机器学习领域中应用因果技术和思想解决实际问题的工作进行整理和总结,梳理出这一新兴研究方向的发展脉络.首先对与机器学习紧密相关的因果理论做简要介绍;然后以机器学习中的不同问题需求为划分依据对各工作进行分类介绍,从求解思路和技术手段的视角阐释其区别与联系;最后对因果机器学习的现状进行总结,并对未来发展趋势做出预测和展望.

    Abstract:

    Machine learning is one of the important technical means to realize artificial intelligence, and it has important applications in the fields of computer vision, natural language processing, search engines and recommendation systems. Existing machine learning methods often focus on the correlations in the data and ignore the causality. With the increase in application requirements, their drawbacks have gradually begun to appear, facing a series of urgent problems in terms of interpretability, transferability, robustness, and fairness. In order to solve these problems, researchers have begun to re-examine the necessity of modeling causal relationship, and related methods have become one of the recent research hotspots. We organize and summarize the work of applying causal techniques and ideas to solve practical problems in the field of machine learning in recent years, and sort out the development venation of this emerging research direction. First, we briefly introduce the closely related causal theory to machine learning. Then, we classify and introduce each work based on the needs of different problems in machine learning, explain their differences and connections from the perspective of solution ideas and technical means. Finally, we summarize the current situation of causal machine learning, and make predictions and prospects for future development trends.

  • 嵌入式实时系统广泛存在于生产和生活当中,例如航空航天、轨道交通、无人驾驶、互联通信等.在实时系统中,其正确性不仅仅依赖于逻辑结果的正确性,同时还依赖于结果产生的时间[1].实时系统中的任务执行时间具有严格的约束,如果错过任务截止时间将会导致灾难性后果.因此,实时系统在任务调度设计和可调度性分析时都必须保证任务的执行在一个安全的时间上界之内,此上界的计算方法即任务的最坏情况执行时间(worst-case execution time,WCET)分析.当前主流的WCET分析方法分为静态分析和动态分析,动态分析也称为基于测试的分析方法,通过大量的测试用例来获取精确的WCET分析结果,由于无法穷尽所有的测试用例,动态分析无法保证分析结果的安全性,工业界通常会在分析结果的基础上增加一定的裕量(例如20%)[2]作为最终的WCET分析结果.静态分析方法则是通过对处理器硬件结构和程序流信息进行分析估算程序的WCET值,静态分析方法能够保证分析结果的安全性,并且不需要运行程序便可获得WCET结果,所以大多数WCET分析工具都采用静态分析方法.单核处理器WCET分析方法经过多年的研究已经取得较高的分析精度,然而多核处理器中加速部件的使用和共享资源干扰的存在,导致原有的单核处理器WCET分析方法无法直接适用于多核处理器,这对于多核处理器的WCET分析提出了新的挑战.

    在硬件结构方面,多核处理器的WCET分析主要考虑Cache、内存、核间互联和流水线等部件的影响[3],Rapita Systems公司的实验表明,由于L3 Cache的争用导致YOLO算法的帧率降低90%左右.当前对于Cache的研究主要集中在指令Cache带来的干扰分析[4-7],并已经获得了较高的分析精度,而对于数据Cache的关注则不足,其原因有2方面:1)处理器寻址方式的不同导致数据Cache的访问地址分析更加困难;2)数据Cache在循环的不同轮次中访问的数据不同,正是由于数据Cache和指令Cache在时空特性上的差异,造成数据Cache的分析更加复杂.如果简单将指令Cache的分析技术套用到数据Cache上,会导致分析结果过于悲观[8].对于数据Cache的分析除了以上问题之外,另一个需要特别关注的影响因素就是数据一致性协议.有研究表明,由于Cache一致性协议的影响,并行程序执行比串行程序慢74%[9].但是相关的调研表明[10],极少数研究人员关注Cache一致性造成的干扰.

    针对现有多核处理器WCET分析方法对数据Cache一致性协议考虑不足的问题,本文主要研究一种基于多级一致性协议的多核处理器WCET分析方法.该方法通过建立多级一致性域,抽象出一致性协议嵌套的共享数据管理机制,分别从一致性域内部、跨一致性域2个层面来分析内存访问延迟,从而实现对多核处理器中数据Cache的精确分析.

    Ferdinand等人[11]提出基于抽象解释理论[12]的Cache行为分析方法,由于其具有较高的精确度和效率,在基于Cache的WCET分析中占据了统治地位[8].如图1所示,基于抽象解释的Cache分析方法对Cache的物理状态和替换策略进行抽象,得到抽象缓存状态(abstract cache state)和函数Join、函数Update,通过May分析、Must分析和Persistence分析3种方法得出Cache的“Cache命中/失效分类”(cache hit miss classification, CHMC),从而确定Cache的访问延迟.Huynh等人[13]经过分析证明了原有的Persistence分析方法存在不安全性,其原因在于更新函数lhˆs(lh1)(ˆs(lh){m})未将可能被替换出去的内存块的年龄进行增加,导致分析结果不正确.为解决此问题,该小组提出Younger Set的概念,并将Younger Set引入到函数Join和函数Update中,保证了Persistence分析的安全性.在过去的20多年,研究人员对Persistence分析进行了广泛的研究,仍然不能保证其发现所有的Persistent Cache块,直到Stock[14]给出Exact Persistence分析方法,才使得Persistence分析趋于完善.

    图  1  基于抽象解释的Cache分析方法
    Figure  1.  Cache analysis based on abstract interpretation

    模型检测[15]方法是另一个广泛研究的Cache行为分析方法,模型检测作为实时系统中常用的时序验证手段,将各类时序分析问题转化为时间自动机的可达性问题,从而实现对问题的求解.Wilhelm[16]讨论了模型检测技术在WCET分析中的优缺点,Metzner[17]认为模型检测方法能够提高分析结果的精度.Dalsgaard等人[18]基于模型检测方法实现了模块化任务执行时间分析框架;Gustavsson等人[19]的研究表明,UPPAAL可以用于多核处理器中并行任务的WCET分析,Cassez等人[20]将WCET分析问题转换为寻找最长路径的问题,并通过扩展模型检测工具UPPAAL实现了一个任务WCET分析工具WUPPAAL;Lü等人[21]在开源工具Chronos的基础上,将模型检测的方法用于Cache的行为分析,实现了一个多核处理器WCET分析工具McAiT;陈芳园等人[22]改进多核处理器中Cache冲突分析方法,在取指阶段考虑取指操作之间的时序关系,该方法能够排除部分非干扰状态,使得WCET计算精度最高提高12%.上述对于Cache行为分析多是针对指令Cache展开的,而对于数据Cache的分析研究较少,尤其是对于多核处理器中Cache一致性协议的讨论更为少见.直到2015年,Uhrig等人[23]研究数据Cache对多核处理器WCET分析的影响,并研究了基于监听的一致性协议和TDMA总线对多核处理器中延迟时间的影响,最终得出结论:采用写失效的基于总线监听的一致性协议不具备可预测性,而采用写更新双通道直接映射的bus-snarfing一致性协议配合TDMA总线能够实现时间可预测.但文献[23]未对Cache一致性协议带来的内存访问延迟进行更为细致的分析.而Hassan等人[24]的分析也得出了基于总线监听MSI一致性协议配合TDMA总线的多核处理器的WCET值不具有可预测性,重点分析了TDMA总线导致的不可预测性,而对于多核处理器中基于MESI一致性协议的WCET估计,同样未给出详细的分析方法.Tsoupidi[25]针对对称多处理器,提出一种考虑Cache一致性协议的WCET分析方法,该方法在计算共享Cache读访问延迟时认为hsh=max,上述情况仅发生在Cache使用写回(write-back)策略时,对于写直达(write-through)策略,数据同时存在于私有Cache和共享Cache中,该计算方式存在较大的悲观性,且该文仅仅讨论了单级一致性协议下的WCET分析方法,而对于多核处理器中存在多级Cache一致性协议的情况,尚未见到相关的研究.

    为此,本文针对数据Cache中共享数据的访问,提出了基于多级一致性协议的多核处理器WCET分析方法,解决多级一致性协议嵌套情况下的WCET分析问题,完善了多核处理器Cache的分析框架,并实现一个基于MESI(modify exclusive shared invalid)协议的高精度多核处理器WCET分析工具,为多核实时系统的WCET分析提供了支撑.

    本节的核心工作在于提出一种多级一致性域的WCET分析方法,实现对多核处理器中多级一致性域的WCET分析.通过对一致性域内部的Cache访问延迟和状态转换情况进行分析,得出一致性域内部的分析结果,根据是否进行跨域数据访问决定下一步是否需要进行跨域分析,从而实现一致性协议嵌套情况下的共享数据访问延迟分析.本文将该方法与当前主要考虑指令Cache的分析方法相结合,增加了数据Cache中共享数据的访问分析,扩展了多核处理器中Cache的干扰分析框架,进一步完善了多核处理器WCET分析方法.

    共享存储结构是当前应用最为广泛的多核处理器架构,此种结构对于每一个内核而言都是对等的,每一个内核的访问时间都是相同的,因此多核处理器属于对称多处理器(symmetric multi-processor,SMP).最初的多核处理器中只存在私有Cache和共享Cache,随着CPU制造工艺的发展,当前多核处理器已集成多级Cache.

    采用SMP架构的多处理器需要支持共享数据的Cache,通过读写共享数据实现内核之间的通信.在缓存共享数据时,可能会存在多个内核对共享数据的争用,也可能会在多个Cache中复制共享数据,导致多个数据副本不一致,因此共享数据Cache引入了一个新的问题——Cache一致性问题,需要引入一致性协议保证数据一致性.

    MESI是一种典型的Cache一致性协议,也被称为Illinois MESI协议,是在原MSI协议的基础上引入E状态,用于表示独占状态.在写入一个共享数据块时,通常有2种写策略:写直达和写回.对于图2所示的多级Cache组织结构,由于写回策略对存储器的带宽要求低,能够支持更多、更快速的处理器,所以最外层级别(簇(cluster)间)通常采用写回策略[3];而对于簇内的写入共享数据的策略,如果采用写回策略,会造成簇内L1 Cache中数据和L2 Cache中的数据不一致的情况,在维护簇内数据一致性的同时,还要考虑簇间一致性,造成一致性协议非常复杂,不利于硬件的实现,因此簇内采用写直达的策略.

    图  2  多级Cache组织结构
    Figure  2.  Multi-level Cache arrangement

    写直达和写回适用于访问命中的情况,当访问缺失时所采用的写策略为写分配(write allocation)和非写分配(no write allocation)两种方式,通常写回与写分配策略配合使用,写直达和非写分配策略配合使用.

    多核处理器任务调度时存在多个内核共同完成一个任务,或者将某些关键等级高的任务映射到特定的内核上,因此,这些任务之间的数据交换仅仅在特定内核之间进行,如果使用一个一致性协议维护所有内核的数据副本,将会产生大量的数据交互操作,造成网络拥塞,降低执行效率, 一种可行的方法是将所有的内核进行区域划分[26],每个区域使用独立的一致性协议管理数据副本,该方法既能充分利用局部性原理,又能降低网络负载.

    定义1. 一致性域.使用相同一致性协议管理数据副本的内核集合,称之为一致性域.

    一致性域的示意图如图3所示.

    图  3  一致性域
    Figure  3.  Coherence domain

    一致性域具有2个性质:

    性质1.一致性域具有独立性,即任意2个一致性域之间不存在交集.

    性质2.一致性协议在一致性域内具有唯一性.

    由于一致性协议只能维护域内数据的一致性,当存在跨域数据交互时,会导致无法保证域间数据的一致性,需要使用多级一致性协议进行数据维护,相应地,需要定义多级一致性域.多级一致性域采用分层划域的思想,下层一致性域是上层一致性域的一个结点(clump).

    针对如图2所示的硬件架构,根据一致性域的定义,簇内为1级一致性域,整个多核处理器构成2级一致性域.处理器内核在访问数据时,首先会访问私有Cache,如果访问缺失,则会访问下一级共享Cache,即簇内共享Cache,此时数据访问范围仍在一致性域内.如果访问命中,只需进行域内WCET分析即可;如果访问缺失,根据存储器层次结构访问次序,则需要进行跨一致性域WCET分析.

    在进行多核处理器的Cache行为分析时,假设总线是完美的并且不存在读和写队列的限制,在此基础上分析Cache一致性协议对内存访问时间的影响.

    首先假设处理器拥有私有Cache、共享Cache和内存,Cache采用写回和写分配策略,Cache的访问分为读和写,读和写都会出现访问命中或缺失,我们从读和写2个方面对Cache行为进行分析,在开始分析之前定义如下符号:

    1) {\psi _{\text{R}}}(m,m') / {\psi _{\text{W}}}(m,m') .表示域内Cache读/写访问时间延迟和状态转换情况,m表示被访问的数据,m'表示被访问数据的副本.

    2) {H_{{\text{L}}i}} .表示在第i级Cache访问命中时间延迟.

    3) {L_{{\text{L}}i \to {\text{L}}j}} .表示数据从第i级Cache加载到第j级Cache的时间延迟.

    4){H_{{\text{memory}}}}.表示主存访问命中时间延迟.

    5) Inv / Inv' .表示域内/跨域使其他数据副本失效的时间延迟.

    6){\psi '_{\text{R}}}(m,m')/ {\psi '_{\text{W}}}(m,m') .表示跨域Cache读/写访问时间延迟和状态转换情况.

    7) {m_{\text{s}}} / {m'_{\text{s}}} .表示本地/远程Cache状态.

    8) {\text{Replacement}} / \neg {\text{Replacement}} .表示被访问共享数据被替换出Cache,导致替换缺失发生/未发生.

    现有工作对于Cache的分析主要集中在指令Cache,对于数据Cache的分析也仅限于本地数据的分析.本文在基于抽象解释的多级Cache分析框架基础上,扩展了共享数据分析这一功能.

    图4所示,该分析框架的输入为待分析程序和Cache模型.其中待分析程序提供控制流、地址信息,Cache模型中包括了Cache组织结构、替换策略、读写策略以及一致性协议,这些参数作为分析框架的输入信息,提供给Cache分析方法.Cache的分析可分为2部分:数据Cache和指令Cache,其中数据Cache中共享数据的分析为本文的主要工作,将在后文中详细介绍.最后,根据分析结果得出内存访问的分类以及访问的时间延迟,从而得出WCET的分析结果.

    图  4  基于一致性协议的WCET分析框架
    Figure  4.  WCET analysis framework based on coherence protocol

    当内核发出读数据m的读请求时,如果m存在于L1中,其状态可能为E或者S并且未发生替换缺失,在L1中能够读命中,数据访问延迟为 {H_{{\text{L}}1}} ,所有Cache块的状态不会发生改变.

    如果数据m在域内,m的状态为E或者S但是发生替换缺失,此时数据访问延迟为读取L2中数据的时间 {\psi '_{\text{R}}} ,同时需要将数据从L2 Cache加载到本地L1 Cache中,考虑连贯性要求,数据访问延迟为 \max ({\psi '_{\text{R}}},{L_{{\text{L}}2 \to {\text{L}}1}}) ,Cache状态不发生变化,如图5 (a)(b)所示.

    图  5  域内读访问时间延迟及状态转换图
    Figure  5.  Read-visit latency and state transition diagram of intra-domain

    如果数据m在域内,m的状态为I,且副本m'的状态为E或者S时,此时数据访问延迟为读取L2中数据的时间 {\psi '_{\text{R}}} ,同时需要将数据从L2 Cache加载到本地L1 Cache中,考虑连贯性要求,数据访问延迟为\max ({\psi '_{\text{R}}},{L_{{\text{L}}2 \to {\text{L}}1}})m的状态转换为S,副本m'的状态转换为S,如图5 (c)所示.

    如果数据m不在域内,在L2 Cache中会出现访问缺失,假定访问L2 Cache的时间为 {\psi '_{\text{R}}} ,对于 {\psi '_{\text{R}}} 的取值我们将在2.7.1节中进行分析.此时m的状态由I转换为E,L2 Cache的状态需要根据L2的一致性协议进一步分析,如图5(d)所示.

    根据以上分析,可以得出Cache域内读访问时间延迟和状态转换的状态更新函数:

    \begin{aligned}&{\psi }_{{{\rm{R}}}}(m,{m}^{\prime })=\\ &\left\{ {\begin{aligned} &{H}_{{{\rm{L}}}1},{m}_{{{\rm{s}}}}\mapsto {m}_{{{\rm{s}}}},{m}'_{{{\rm{s}}}}\mapsto {{m}'_{{{\rm{s}}}}}; \\ &\ \ \ m存在于\text{L}1\wedge \neg {{\rm{Replacement}}}.\\ &{{\rm{max}}}({{\psi }'_{\text{R}}},{L}_{\text{L}2\to {{\rm{L}}}1}),{m}_{{{\rm{s}}}}\mapsto {m}_{{{\rm{s}}}},{m}'_{{{\rm{s}}}}\mapsto {m}'_{{{\rm{s}}}};\\ &\ \ \ m存在于\text{L}1\wedge {{\rm{Replacement}}}.\\ &{{\rm{max}}}({{\psi }'_{\text{R}}},{L}_{{{\rm{L}}}2\to {{\rm{L}}}1}),{m}_{{{\rm{s}}}}:{{\rm{I}}}\to {{\rm{S}}},{m}'_{{{\rm{s}}}}:{{\rm{E}}}/{{\rm{S}}}\to {{\rm{S}}};\\ &\ \ \ m不存在于{{\rm{L}}}1\wedge m存在于{{\rm{L}}}2\wedge {m}^{\prime }存在于{{\rm{L}}}1.\\ &{{\psi } '_{\text{R}}},{m}_{{{\rm{s}}}}:{{\rm{I}}}\to {{\rm{E}}};m不在域内. \end{aligned}} \right.\end{aligned} (1)

    当内核发出写m的请求时,如果m仅存在本地L1 Cache且状态为E时,此时会发生Cache写命中.首先需要将数据写入到L1中,由于使用写直达策略,需要同时写入到L2当中,因此数据访问时间为 {H_{{\text{L}}1}} + {\psi '_{\text{W}}} ,Cache状态不发生改变,如图6(a)所示;如果m存在于L1 Cache中且状态为S时,此时会发生Cache写命中,数据访问延迟为 {H_{{\text{L}}1}} + {\psi '_{\text{W}}} ,状态转换为E,由于采用写失效的一致性协议,因此需要将副本的状态修改为I,考虑连贯性要求,数据访问最终的延迟为 \max ({H_{{\text{L}}1}} + {\psi '_{\text{W}}},Inv) ,如图6(b)所示.

    图  6  域内写访问时间延迟及状态转换图
    Figure  6.  Write-visit latency and state transition diagram of intra-domain

    如果数据m位于本地L1 Cache中,m的状态为E或者S,并且发生替换缺失,此时会出现写缺失.如果m的状态为E,此时数据访问时间为 {\psi '_{\text{W}}} ,并且状态转换为I,如图6(c)所示;如果m的状态为S,在进行L2 Cache写操作之后,需要对其他副本进行写失效,所以访问时间为 \max ({\psi '_{\text{W}}},Inv) m和副本m'的状态都转换为I,如图6(d)所示.

    如果数据m在域内, m的状态为I且副本m'的状态为E或者S时,此时会对L2进行写操作, 并将其他副本中的数据写失效, 考虑数据连贯性要求, 数据的写延迟为 \max ({\psi '_{\text{W}}},Inv) , 由于采用非写分配的方式, 本地L1的状态不发生改变,其他副本的状态从E/S转换为I, 如图6(e)所示.

    如果数据m不在域内,类比读访问的情况,假定写L2 Cache的访问时间为 {\psi '_{\text{W}}} .由于域内采用非写分配法,所以此时m的状态不发生任何变化,如图6(f)所示.

    根据以上分析,可以得出Cache域内写访问时间延迟和状态转换的状态更新函数:

    \begin{aligned}&{\psi }_{{{\rm{W}}}}(m,{m}^{\prime })=\\ &\left\{ {\begin{aligned} &{H}_{\text{L}1}+{\psi }'_{\text{W}},{m}_{\text{s}}\mapsto {m}_{\text{s}},{{m}}'_{\text{s}}\mapsto {{m}}'_{\text{s}};\\ &\ \ \ m存在于\text{L}1\wedge {m}^{\prime }不存在于\text{L}1\wedge \neg \text{Replacement}.\\ &\mathrm{max}({H}_{\text{L}1}+{{\psi }}'_{\text{W}},Inv),{m}_{\text{s}}:\text{S}\to \text{E},{{m}}'_{\text{s}}:\text{S}\to \text{I};\\ &\ \ \ m存在于\text{L}1\wedge {m}^{\prime }存在于\text{L}1\wedge \neg \text{Replacement}.\\ &{{\psi }}'_{\text{W}},{m}_{\text{s}}:\text{E}\to \text{I};\\ &\ \ \ m存在于\text{L}1\wedge {m}^{\prime }不存在于\text{L}1\wedge \text{Replacement}.\\ &\mathrm{max}({{\psi }}'_{\text{W}},Inv),{m}_{\text{s}}:\text{S}\to \text{I},{{m}}'_{\text{s}}:\\ &\ \ \ m存在于\text{L}1\wedge {m}^{\prime }存在于\text{L}1\wedge \text{Replacement}.\\ &\mathrm{max}({{\psi }}'_{\text{W}},Inv),{m}_{\text{s}}\mapsto {m}_{\text{s}},{{m}}'_{\text{s}}:\text{E}/\text{S}\to \text{I};\\ &\ \ \ m存在于\text{L}1\wedge {m}^{\prime }存在于\text{L}1.\\ &{{\psi }}'_{\text{W}},{m}_{\text{s}}\mapsto {m}_{\text{s}};m不在域内. \end{aligned}} \right.\end{aligned} (2)

    当内核发出读数据m的读请求时,如果需要对L2 Cache进行读访问,此时将会由上级一致性协议管理共享数据.

    如果数据m存在于域内,并且位于本地L2 Cache中,对应域内读访问的图5(a)~(c),此时数据访问时间为 {H_{{\text{L}}2}} ,域间所有的Cache的状态都不发生变化,如图7(a)所示.

    图  7  跨域读访问时间延迟及状态转换图
    Figure  7.  Read-visit latency and state transition diagram of cross-domain

    当被访问数据m存在于L2 Cache中并且发生替换缺失,对应域内读访问的图5(d),此时数据访问延迟为 {H_{{\text{L}}3}} ,同时需要将数据加载到L1 Cache中,考虑到连贯性要求,数据访问的延时为 \mathrm{max}({H}_{\text{L}3},{L}_{\text{L}3\to \text{L}1}) ,如果m的状态为M或者E,则状态会转换为E,如果m的状态为S,则状态不发生改变,副本m'的状态不发生改变,如图7(b)(c)所示.

    如果被访问数据m不存在于域内,对应域内读访问的图5(d),存在2种可能性:1)数据m本身不存在Cache中;2)数据m存在于其他域的Cache中.如果数据m不存在Cache中,此时需要通过内存来读取数据m,数据访问延迟为 H_\text{memory},同时需要将数据从内存中写回到L1 Cache中,考虑连贯性要求,数据访问延迟为 \max ({H_\text{memory}},{L_{m \to {\text{L}}1}}) ,L2的状态从I转换为E,如图7(d)所示.

    如果数据存在于其他域中,即存在数据副本m',如果其状态是M,首先要将数据从其他域的L2 Cache中加载到本地域的L2中,然后再加载到L1中,最后从L1中读取数据,其访问延迟为 {L_{{\text{L}}2 \to {\text{L}}2}} + {L_{{\text{L}}2 \to {\text{L}}1}} + {H_{{\text{L}}1}} ,此时L2中数据m的状态从I转换为S,其他域L2中数据副本m'状态从M转换为S图7(e)所示.

    如果数据副本m'的状态为E或者S,将会在L3中读取命中数据,此时数据访问延迟为 {H_{{\text{L}}3}} ,与此同时,数据将会加载到L1中,考虑连贯性要求,此时数据访问延迟应取 \max ({H_{{\text{L}}3}},{L_{{\text{L}}3 \to {\text{L}}1}}) ,此时L2中数据m的状态从I转换为S,其他域L2中数据副本m'状态换为S图7(f)所示.

    根据以上分析,可以得出Cache跨域读访问时间和状态转换的状态更新函数:

    \begin{aligned}&{{\psi }}_{\text{R}}^{\prime }(m,{m}^{\prime })=\\ &\left\{ {\begin{aligned} &{H}_{\text{L}2},{m}_{\text{s}}\mapsto {m}_{\text{s}},{{m}}'_{\text{s}}\mapsto {{m}}'_{\text{s}}; \\ &\ \ \ m存在于\text{L}2\wedge \neg \text{Replacement}.\\ &\mathrm{max}({H}_{\text{L}3},{L}_{\text{L}3\to \text{L}1}),{m}_{\text{s}}:\text{M}/\text{E}\to \text{E};\\ &\ \ \ m存在于\text{L}2\wedge {m}^{\prime }不存在于\text{L}2\wedge \text{Replacement}.\\ &\mathrm{max}({H}_{\text{L}3},{L}_{\text{L}3\to \text{L}1}),{m}_{\text{s}}\mapsto {m}_{\text{s}},{{m}}'_{\text{s}}\mapsto {{m}}'_{\text{s}};\\ &\ \ \ m存在于\text{L}2\wedge {m}^{\prime }存在于\text{L}2\wedge \text{Replacement}.\\ &\mathrm{max}({H}_\text{memory},{L}_{m\to \text{L}1}),{m}_{\text{s}}:\text{I}\to \text{E};\\ &\ \ \ m不存在于\text{Cache}中.\\ &{L}_{\text{L}2\to \text{L}2}+{L}_{\text{L}2\to \text{L}1}+{H}_{\text{L}1},{m}_{\text{s}}:\text{I}\to \text{S},{{m}}'_{\text{s}}:\text{M}\to \text{S};\\ &\ \ \ m不存在于\text{L}2\wedge {m}^{\prime }存在于\text{L}2\wedge {m}^{\prime }不存在于\text{L}3.\\ &\mathrm{max}({H}_{\text{L}3},{L}_{\text{L}3\to \text{L}1}),{m}_{\text{s}}:\text{I}\to \text{S},{{m}}'_{\text{s}}:\text{E}/\text{S}\to \text{S};\\ &\ \ \ m不存在于\text{L}2\wedge {m}^{\prime }存在于\text{L}2\wedge m存在于\text{L}3. \end{aligned}} \right.\end{aligned} (3)

    当内核发出写数据m的写请求时,如果需要对L2 Cache进行写访问,此时将由上级一致性协议管理共享数据.

    如果数据m在域内(对应域内写访问图6(a)~(e)的情况),并且位于本地L2 Cache中,状态为S,需要将其他副本中的数据写失效,考虑连贯性要求,数据访问时间为 \max ({H_{{\text{L}}2}},Inv') ,状态转换为M,副本的状态由S转换为I,如图8(a)所示;如果m的状态为M或者E,此时数据访问时间为 {H_{{\text{L}}2}} ,状态转换为M,其他Cache状态不变,如图8(b)所示.

    图  8  跨域写访问时间延迟及状态转换图
    Figure  8.  Write latency and state diagram of cross-domain

    如果被访问数据m不存在于域内(对应域内写访问图6(f)的情况), 存在2种可能性:1)数据m本身不存在Cache中;2)数据m存在于其他域的Cache中.如果数据m不存在Cache中, 由于域间采用写分配,首先要将数据从内存加载到L2 Cache中, 然后在L2 Cache中修改数据, 此时数据访问延迟为 {H_{{\text{L}}2}} + {L_{m \to {\text{L}}2}} ,L2 Cache的状态从I转换为M, 如图8(c)所示.

    如果数据存在于其他域内,数据m的状态为I,如果数据副本m'的状态为M,首先要将数据从其他域加载到本地L2中,然后修改本地L2 Cache中的数据,状态转换为M,并且要将其他副本写失效,考虑连贯性要求,数据访问延迟为 \max ({H_{{\text{L}}2}} + {L_{{\text{L}}2 \to {\text{L}}2}},Inv') ,其他副本的状态转换为I,如图8(d)所示;如果副本的状态为E或者S,此时需要将数据从L3 Cache加载到本地L2 Cache中,状态转换为M,并且要将其他副本写失效,考虑连贯性要求,数据访问延迟为 \max ({H_{{\text{L}}2}} + {L_{{\text{L}}3 \to {\text{L}}2}},Inv') ,副本的状态准换为I,如图8(e)所示.

    如果m为M, E, S状态并且发生替换缺失时(对应域内写访问图6(f)的情况), 此时会发生Cache写缺失.首先将数据从L3 Cache加载到L2 Cache, 然后执行写操作.对于m的状态为M和E的情况, 此时数据访问延迟为 {H_{{\text{L}}2}} + {L_{{\text{L}}3 \to {\text{L}}2}} , m的状态转换为M, 如图8(f)所示;如果m的状态为S, 需要将其他副本中的数据写失效, 考虑连贯性要求, 数据访问延迟为 \max ({H_{{\text{L}}2}} + {L_{{\text{L}}3 \to {\text{L}}2}},Inv') , m的状态由S变为I, 如图8(g)所示.

    根据以上分析,可以得出Cache跨域写访问时间和状态转换的状态更新函数:

    \begin{aligned} &{{\psi }}'_{\text{W}}(m,{m}')=\\ &\left\{ {\begin{aligned} &\mathrm{max}({H}_{\text{L}2},In{v}'),{m}_{\text{s}}:\text{S}\to \text{M},{{m}}'_{\text{s}}:\text{S}\to \text{I}; \\ & \ \ \ m{存在于}\text{L}2\wedge {m}'{存在于}\text{L}2\wedge \neg \text{Replacement}.\\ &{H}_{\text{L}2},{m}_\text{s}:\text{M}/\text{E}\to \text{M},{m}'_{\text{s}}\mapsto {m}'_{\text{s}};\\ &\ \ \ m{存在于}\text{L}2\wedge {m}'{不存在于}\text{L}2\wedge \neg \text{Replacement}.\\ &{H}_{\text{L}2}+{L}_{m\to \text{L}2},{m}_{\text{s}}:\text{I}\to \text{M};\\ &\ \ \ m{不存在}\text{Cache}{中}.\\ &\mathrm{max}({H}_{\text{L}2}+{L}_{\text{L}2\to \text{L}2},In{v}'),{m}_{\text{s}}:\text{I}\to \text{M},{m}'_{\text{s}}:\text{M}\to \text{I};\\ &\ \ \ m{不存在于}\text{L}2\wedge {m}'{存在于}\text{L}2\wedge m{不存在于}\text{L}3.\\ &\mathrm{max}({H}_{\text{L}2}+{L}_{\text{L}3\to \text{L}2},In{v}'),{m}_{\text{s}}:\text{I}\to \text{M},{m}'_{\text{s}}:\text{E}/\text{S}\to \text{I};\\ &\ \ \ m{不存在于}\text{L}2\wedge {m}'{存在于}\text{L}2\wedge m{存在于}\text{L}3.\\ &{H}_{\text{L}2}+{L}_{\text{L}3\to \text{L}2},{m}_{\text{s}}:\text{M}/\text{E}\to \text{M};\\ &\ \ \ m{存在于}\text{L}2\wedge {m}'{不存在于}\text{L}2\wedge \text{Replacement}.\\ &\mathrm{max}({H}_{\text{L}2}+{L}_{\text{L}3\to \text{L}2},In{v}''),{m}_{\text{s}}:\text{S}\to \text{M},{m}'_{\text{s}}:\text{S}\to \text{I};\\ &\ \ \ m{存在于}\text{L}2\wedge {m}'{存在于}\text{L}2\wedge \text{Replacement}. \end{aligned}} \right.\end{aligned} (4)

    本文在实验室原有多核处理器WCET分析工具的基础上,扩展其中数据Cache的分析方法,增加了对MESI一致性协议的支持,设计并实现了一个支持Cache一致性协议的多核处理器WCET分析工具Roban.通过该工具对多核处理器进行WCET分析,并将分析结果与GEM5仿真工具模拟执行得到的时间进行对比,从而验证分析方法的有效性.

    实验针对4核处理器展开分析,每2个内核组成1簇,即2个1级一致性域和1个2级一致性域,处理器存储结构参照图2,关键参数配置如表1所示.从Mälardalen大学WCET研究小组测试用例集中选取典型测试用例进行测试.

    表  1  多核处理器参数配置
    Table  1.  Parameters Configuration of Multi-Core Processor
    Cache
    层次结构
    数据访问延时/cycle一致性
    协议
    替换策略
    L1L2L3
    4 核, 2 簇41442MESILRU
    注:LRU为最近最少使用(least recently used)替换策略.
    下载: 导出CSV 
    | 显示表格

    实验一共分为2组,第1组Cache相联度为4路组相联,第2组Cache相联度为8路组相联.分别调整L1,L2,L3 Cache的容量,得出在不同Cache配置情况下的WCET值,结果如图9所示.

    图  9  GEM5仿真结果与分析结果对比
    Figure  9.  Comparison of GEM5 simulation results and analysis results

    图9可以看出, Roban工具得到的WCET分析结果均大于GEM5仿真得到的结果,证明了本文方法的安全性.

    为了验证本文方法的有效性,采用Spearman相关性分析法对图9中Roban分析结果与GEM5仿真结果进行相关性分析,得出的相关性系数矩阵如图10所示,其中横坐标为Roban分析工具对应的测试结果,纵坐标表示GEM5仿真结果.从图10中可以看出,带有数字标签的矩阵元素为相同测试用例在不同Cache参数配置情况下得到WCET结果的相关性系数,相关性系数不小于0.98,说明Roban和GEM5两种工具得出的结果显著相关,同时表明了图9中的曲线变化趋势基本一致.

    图  10  GEM5与Roban相关性系数矩阵图
    Figure  10.  Correlation coefficient matrix diagram of GEM5 and Roban

    为了验证本文提出分析方法的精确性,采用过估计率这一指标对分析结果进行评估,过估计率 \mathcal{R} 计算方式为

    \mathcal{R} = \frac{{T_{{\text{WCET}}}^{{\text{estimation}}}}}{{T_{{\text{WCET}}}^{{\text{simulation}}}}} , (5)

    其中, {{T_{{\text{WCET}}}^{{\text{estimation}}}}} {{T_{{\text{WCET}}}^{{\text{simulation}}}}} 分别表示WCET的估计结果和仿真结果.

    过估计率的统计结果如图11所示.由图11可知,本文方法分析得出的过估计率最大值为1.476,最小值为1.158,平均值为1.30,对比瑞典皇家理工学院[25]设计实现的WCET分析工具KTA(KTH’s timing analysis),其平均过估计率为2.08,本文方法的过估计率降低了0.78.

    图  11  过估计率统计结果
    Figure  11.  Statistical results of over estimation rate

    本文通过分析多级一致性协议体系架构下的Cache状态转换和内存访问延迟,得出多级一致性域WCET分析方法.实验表明,本文方法能够精确估算出多核处理器任务WCET,在改变Cache配置参数的情况下,GEM5仿真结果与本文工具Roban分析结果相关性系数不低于0.98,表明本文方法分析结果的变化趋势与GEM5仿真结果一致;通过与当前分析方法进行对比,证明本文方法相比现有方法的过估计率降低了0.78.

    本文在进行Cache行为分析时,将总线假设为理想状态,而在实际的一致性协议中,如果存在大量的数据交互,将会导致总线发生阻塞,在以后的工作中,应将Cache之间共享总线纳入到分析范围中,进一步提高WCET分析结果的准确性.另外本文在考虑一致性协议时仅考虑了MESI协议,而在实际工程领域存在多种一致性协议,如Intel i7使用的MESIF协议、AMD Opteron使用的MOESI协议等,在后续的工作中将针对不同的一致性协议展开分析.

    作者贡献声明:朱怡安提出研究思路和指导意见;史先琛提出了算法并撰写论文;姚烨、李联负责实现算法并设计实验方案;任鹏远、董威振、李佳钰参与实验方案设计并整理实验数据.

  • 图  1   因果图示例

    Figure  1.   Example of causal graph

    图  2   前门/后门调整示例

    Figure  2.   Example of frontdoor/backdoor adjustment

    图  3   中介分析示例

    Figure  3.   Example of mediation analysis

    图  4   因果机器学习的主要研究问题总览

    Figure  4.   Overview of main research problems in causal machine learning

    图  5   反事实解释示例[49]

    Figure  5.   Example of counterfactual explanation [49]

    图  6   反事实图像混合示例[62]

    Figure  6.   Example of counterfactual image hybridization [62]

    图  7   3类反因果迁移问题的因果图[70]

    Figure  7.   Causal graphs of three types of anti-causal transfer problems [70]

    图  8   视觉对话任务的因果图和2种校准策略[97]

    Figure  8.   Causal graph and two calibration strategies in visual dialogue tasks [97]

    图  9   不变性学习方法的因果图[134-135]

    Figure  9.   Causal graph of invariance-learning methods [134-135]

    图  10   广告推荐系统的因果图[180]

    Figure  10.   Causal graph in advertising recommendation systems[180]

    表  1   因果方法在可解释性问题上的应用

    Table  1   Application of Causal Methods on Interpretability Problems

    分类子类别典型思路和方法
    基于归因分析忽略特征间结构直接计算每个输入特征对模型输出的因果效应[40-46]
    考虑特征间结构引入输入特征间的先验因果图结构,调整特征对模型输出的因果效应[47-48]
    基于反事实输入数据反事实在模型输入空间构造反事实样本[49-61]
    输出数据反事实对生成模型的中间节点进行反事实,构造反事实生成样本[62]
    反事实可行性对反事实操作的约束条件进行额外建模[63-66]
    下载: 导出CSV

    表  2   因果方法在可迁移性问题上的应用

    Table  2   Application of Causal Methods on Transferability Problems

    分类典型思路和方法
    仅考虑输入输出与
    域变量间的因果图
    求解在协变量偏移[67]、目标偏移[68]、条件偏移[69]
    广义目标偏移[70-71]情况下的建模方法
    考虑含其他复杂
    变量的因果图
    引入先验因果图[72-75]或从数据中进行因果发现[76]
    下载: 导出CSV

    表  3   因果方法在鲁棒性问题上的应用

    Table  3   Application of Causal Methods on Robustness Problems

    分类子类别典型思路和方法
    反事实数据增强伪相关特征反事实构造额外训练数据,在保持预测结果不变的前提下微调数据[88-93]
    因果特征反事实构造额外训练数据,更改关键因果特征并修改预测结果[92-95]
    因果效应校准基于后门调整根据对问题的认识指出混杂因素,对其估计后消除影响[96-99]
    基于中介分析根据对问题的认识指出中介变量,对其估计后消除影响[97,100-102]
    不变性学习稳定学习将每个特征视为处理变量,通过样本加权消除混杂,识别因果特征[103-107]
    不变因果预测基于多环境训练数据,利用假设检验确定因果特征集合[108-110]
    不变风险最小化基于多环境训练数据,在模型优化目标中添加跨环境不变约束,学习因果特征[111- 113]
    下载: 导出CSV

    表  4   因果方法在公平性问题上的应用

    Table  4   Application of Causal Methods on Fairness Problems

    分类典型思路和方法
    反事实公平性度量提出基于反事实的个体公平性指标[145-152]
    公平模型构建利用先验因果图指导模型的公平化构建[153-155]
    下载: 导出CSV

    表  5   因果方法在反事实评估问题上的应用

    Table  5   Application of Causal Methods on Counterfactual Evaluation Problems

    分类典型思路和方法
    推荐系统非随机
    缺失问题求解
    利用倾向性得分或者反事实风险
    最小化方法修正策略效用[161-172]
    检索系统位置
    偏差问题求解
    利用倾向性得分方法
    修正相关性[173-179]
    下载: 导出CSV
  • [1]

    LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436−444

    [2] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification[C] //Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2015: 1026−1034
    [3]

    Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis [C/OL] //Proc of the 7th Int Conf on Learning Representations. 2019 [2021-11-03]. https://openreview.net/pdf?id=B1xsqj09Fm

    [4]

    Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1877−1901

    [5]

    Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484−489

    [6]

    Senior A W, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706−710

    [7]

    Gunning D, Aha D. DARPA’s explainable artificial intelligence (XAI) program[J]. AI Magazine, 2019, 40(2): 44−58

    [8]

    Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[C/OL] //Proc of the 2nd Int Conf on Learning Representations. 2014 [2021-11-03]. https://arxiv.org/abs/1312.6199

    [9]

    Barocas S, Hardt M, Narayanan A. Fairness in machine learning [EB/OL]. 2017 [2021-11-13]. https://fairmlbook.org/pdf/fairmlbook.pdf

    [10]

    Pearl J, Mackenzie D. The Book of Why: The New Science of Cause and Effect[M]. New York: Basic Books, 2018

    [11]

    Pearl J. Theoretical impediments to machine learning with seven sparks from the causal revolution[J]. arXiv preprint, arXiv: 1801.04016, 2018

    [12] 苗旺, 刘春辰, 耿直. 因果推断的统计方法[J]. 中国科学: 数学, 2018, 48(12): 1753-1778

    Miao Wang, Liu Chunchen, Geng Zhi. Statistical approaches for causal inference [J]. SCIENTIA SINICA Mathematica, 2018, 48(12): 1753-1778 (in Chinese)

    [13]

    Guo Ruocheng, Cheng Lu, Li Jundong, et al. A survey of learning causality with data: Problems and methods[J]. ACM Computing Surveys, 2020, 53(4): 1−37

    [14]

    Kuang Kun, Li Lian, Geng Zhi, et al. Causal inference [J]. Engineering, 2020, 6(3): 253−263

    [15]

    Schölkopf B. Causality for machine learning [J]. arXiv preprint, arXiv: 1911.10500, 2019

    [16]

    Schölkopf B, Locatello F, Bauer S, et al. Toward causal representation learning[J]. Proceedings of the IEEE, 2021, 109(5): 612−634

    [17]

    Splawa-Neyman J, Dabrowska D M, Speed T P. On the application of probability theory to agricultural experiments. Essay on principles. Section 9[J]. Statistical Science, 1990, 5(4): 465−472

    [18]

    Rubin D B. Estimating causal effects of treatments in randomized and nonrandomized studies[J]. Journal of Educational Psychology, 1974, 66(5): 688−701

    [19]

    Pearl J. Causality[M]. Cambridge, UK: Cambridge University Press, 2009

    [20]

    Granger C W J. Investigating causal relations by econometric models and cross-spectral methods[J]. Econometrica, 1969, 37(3): 424−438

    [21]

    Rubin D B. Randomization analysis of experimental data: The Fisher randomization test comment[J]. Journal of the American Statistical Association, 1980, 75(371): 591−593

    [22]

    Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects[J]. Biometrika, 1983, 70(1): 41−55

    [23]

    Hirano K, Imbens G W, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score[J]. Econometrica, 2003, 71(4): 1161−1189

    [24]

    Robins J M, Rotnitzky A, Zhao Lueping. Estimation of regression coefficients when some regressors are not always observed[J]. Journal of the American Statistical Association, 1994, 89(427): 846−866

    [25]

    Dudík M, Langford J, Li Lihong. Doubly robust policy evaluation and learning[C] //Proc of the 28th Int Conf on Machine Learning. Madison, WI: Omnipress, 2011: 1097−1104

    [26]

    Kuang Kun, Cui Peng, Li Bo, et al. Estimating treatment effect in the wild via differentiated confounder balancing[C] //Proc of the 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2017: 265−274

    [27]

    Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences[M]. Cambridge, UK: Cambridge University Press, 2015

    [28]

    Yao Liuyi, Chu Zhixuan, Li Sheng, et al. A survey on causal inference [J]. arXiv preprint, arXiv: 2002.02770, 2020

    [29]

    Pearl J. Causal diagrams for empirical research[J]. Biometrika, 1995, 82(4): 669−688

    [30]

    Spirtes P, Glymour C. An algorithm for fast recovery of sparse causal graphs[J]. Social Science Computer Review, 1991, 9(1): 62−72

    [31]

    Verma T, Pearl J. Equivalence and synthesis of causal models[C] //Proc of the 6th Annual Conf on Uncertainty in Artificial Intelligence. Amsterdam: Elsevier, 1990: 255−270

    [32]

    Spirtes P, Glymour C N, Scheines R, et al. Causation, Prediction, and Search[M]. Cambridge, MA: MIT Press, 2000

    [33]

    Schwarz G. Estimating the dimension of a model[J]. The Annals of Statistics, 1978, 6(2): 461−464

    [34] Chickering D M. Optimal structure identification with greedy search[J]. Journal of Machine Learning Research, 2002, 3(Nov): 507−554
    [35]

    Shimizu S, Hoyer P O, Hyvärinen A, et al. A linear non-Gaussian acyclic model for causal discovery[J]. Journal of Machine Learning Research, 2006, 7(10): 2003−2030

    [36]

    Zhang Kun, Hyvärinen A. On the identifiability of the post-nonlinear causal model[C] //Proc of the 25th Conf on Uncertainty in Artificial Intelligence. Arlington, VA: AUAI Press, 2009: 647−655

    [37]

    Pearl J. Direct and indirect effects[C] //Proc of the 17th Conf on Uncertainty in Artificial Intelligence. San Francisco, CA: Morgan Kaufmann Publishers Inc, 2001: 411−420

    [38]

    VanderWeele T. Explanation in Causal Inference: Methods for Mediation and Interaction[M]. Oxford, UK: Oxford University Press, 2015

    [39] 陈珂锐,孟小峰. 机器学习的可解释性[J]. 计算机研究与发展,2020,57(9):1971−1986 doi: 10.7544/issn1000-1239.2020.20190456

    Chen Kerui, Meng Xiaofeng. Interpretation and understanding in machine learning[J]. Journal of Computer Research and Development, 2020, 57(9): 1971−1986 (in Chinese) doi: 10.7544/issn1000-1239.2020.20190456

    [40]

    Ribeiro M T, Singh S, Guestrin C. “Why should I trust you?” Explaining the predictions of any classifier[C] //Proc of the 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1135−1144

    [41] Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C] //Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 618-626
    [42]

    Sundararajan M, Taly A, Yan Qiqi. Axiomatic attribution for deep networks[C] //Proc of the 34th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2017: 3319−3328

    [43]

    Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C] //Proc of the 31st Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2017: 4765−4774

    [44] Alvarez-Melis D, Jaakkola T. A causal framework for explaining the predictions of black-box sequence-to-sequence models[C] //Proc of the 2017 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2017: 412−421
    [45]

    Schwab P, Karlen W. CXPlain: Causal explanations for model interpretation under uncertainty[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 10220−10230

    [46]

    Chattopadhyay A, Manupriya P, Sarkar A, et al. Neural network attributions: A causal perspective[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 981−990

    [47]

    Frye C, Rowat C, Feige I. Asymmetric Shapley values: Incorporating causal knowledge into model-agnostic explainability[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1229−1239

    [48]

    Heskes T, Sijben E, Bucur I G, et al. Causal Shapley values: Exploiting causal knowledge to explain individual predictions of complex models [C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 4778−4789

    [49]

    Goyal Y, Wu Ziyan, Ernst J, et al. Counterfactual visual explanations[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 2376−2384

    [50]

    Wang Pei, Vasconcelos N. SCOUT: Self-aware discriminant counterfactual explanations[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 8981−8990

    [51]

    Hendricks L A, Hu Ronghang, Darrell T, et al. Generating counterfactual explanations with natural language[J]. arXiv preprint, arXiv: 1806.09809, 2018

    [52]

    Chang Chunhao, Creager E, Goldenberg A, et al. Explaining image classifiers by counterfactual generation[C/OL] //Proc of the 7th Int Conf on Learning Representations, 2019 [2021-11-03]. https://openreview.net/pdf?id=B1MXz20cYQ

    [53]

    Kanehira A, Takemoto K, Inayoshi S, et al. Multimodal explanations by predicting counterfactuality in videos[C] //Proc of the 32nd IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 8594−8602

    [54]

    Akula A R, Wang Shuai, Zhu Songchun. CoCoX: Generating conceptual and counterfactual explanations via fault-lines[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 2594−2601

    [55]

    Madumal P, Miller T, Sonenberg L, et al. Explainable reinforcement learning through a causal lens[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 2493−2500

    [56] Mothilal R K, Sharma A, Tan C. Explaining machine learning classifiers through diverse counterfactual explanations[C] //Proc of the 2020 Conf on Fairness, Accountability, and Transparency. New York: ACM, 2020: 607−617
    [57]

    Albini E, Rago A, Baroni P, et al. Relation-based counterfactual explanations for Bayesian network classifiers[C] //Proc of the 29th Int Joint Conf on Artificial Intelligence, Red Hook, NY: Curran Associates Inc, 2020: 451−457

    [58]

    Kenny E M, Keane M T. On generating plausible counterfactual and semi-factual explanations for deep learning[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 11575−11585

    [59]

    Abrate C, Bonchi F. Counterfactual graphs for explainable classification of brain networks[J]. arXiv preprint, arXiv: 2106.08640, 2021

    [60]

    Yang Fan, Alva S S, Chen Jiahao, et al. Model-based counterfactual synthesizer for interpretation[J]. arXiv preprint, arXiv: 2106.08971, 2021

    [61]

    Parmentier A, Vidal T. Optimal counterfactual explanations in tree ensembles[J]. arXiv preprint, arXiv: 2106.06631, 2021

    [62]

    Besserve M, Mehrjou A, Sun R, et al. Counterfactuals uncover the modular structure of deep generative models[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-03]. https://openreview.net/pdf?id=SJxDDpEKvH

    [63]

    Kanamori K, Takagi T, Kobayashi K, et al. DACE: Distribution-aware counterfactual explanation by mixed-integer linear optimization[C] //Proc of the 19th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2020: 2855−2862

    [64]

    Kanamori K, Takagi T, Kobayashi K, et al. Ordered counterfactual explanation by mixed-integer linear optimization[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 11564−11574

    [65]

    Tsirtsis S, Gomez-Rodriguez M. Decisions, counterfactual explanations and strategic behavior[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 16749−16760

    [66]

    Karimi A H, von Kügelgen B J, Schölkopf B, et al. Algorithmic recourse under imperfect causal knowledge: A probabilistic approach[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 265−277

    [67]

    Rojas-Carulla M, Schölkopf B, Turner R, et al. Invariant models for causal transfer learning[J]. The Journal of Machine Learning Research, 2018, 19(1): 1309−1342

    [68]

    Guo Jiaxian, Gong Mingming, Liu Tongliang, et al. LTF: A label transformation framework for correcting target shift[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 3843−3853

    [69]

    Cai Ruichu, Li Zijian, Wei Pengfei, et al. Learning disentangled semantic representation for domain adaptation[C] //Proc of the 28th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2019: 2060−2066

    [70]

    Zhang Kun, Schölkopf B, Muandet K, et al. Domain adaptation under target and conditional shift[C] //Proc of the 30th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2013: 819−827

    [71]

    Gong Mingming, Zhang Kun, Liu Tongliang, et al. Domain adaptation with conditional transferable components[C] //Proc of the 33rd Int Conf on Machine Learning. Cambridge, MA: JMLR, 2016: 2839−2848

    [72]

    Teshima T, Sato I, Sugiyama M. Few-shot domain adaptation by causal mechanism transfer[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 9458−9469

    [73]

    Edmonds M, Ma Xiaojian, Qi Siyuan, et al. Theory-based causal transfer: Integrating instance-level induction and abstract-level structure learning[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 1283−1291

    [74]

    Etesami J, Geiger P. Causal transfer for imitation learning and decision making under sensor-shift[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 10118−10125

    [75]

    Yue Zhongqi, Zhang Hanwang, Sun Qianru, et al. Interventional few-shot learning[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 2734−2746

    [76]

    Zhang Kun, Gong Mingming, Stojanov P, et al. Domain adaptation as a problem of inference on graphical models[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 4965−4976

    [77]

    Schölkopf B, Janzing D, Peters J, et al. On causal and anticausal learning[C] //Proc of the 29th Int Conf on Machine Learning. Madison, WI: Omnipress, 2012: 459−466

    [78]

    Zhang Kun, Gong Mingming, Schölkopf B. Multi-source domain adaptation: A causal view[C] //Proc of the 29th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2015: 3150−3157

    [79]

    Bagnell J A. Robust supervised learning[C] //Proc of the 20th National Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2005: 714−719

    [80]

    Hu Weihua, Niu Gang, Sato I, et al. Does distributionally robust supervised learning give robust classifiers?[C] //Proc of the 35th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2018: 2029−2037

    [81]

    Rahimian H, Mehrotra S. Distributionally robust optimization: A review[J]. arXiv preprint, arXiv: 1908.05659, 2019

    [82]

    Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[C/OL] //Proc of the 5th Int Conf on Learning Representations. 2017 [2021-11-14]. https://openreview.net/pdf?id=B1xsqj09Fm

    [83]

    Xu Han, Ma Yao, Liu Haochen, et al. Adversarial attacks and defenses in images, graphs and text: A review[J]. International Journal of Automation and Computing, 2020, 17(2): 151−178

    [84]

    Gururangan S, Swayamdipta S, Levy O, et al. Annotation artifacts in natural language inference data[C] //Proc of the 16th Conf of the North American Chapter of the ACL: Human Language Technologies, Vol 2. Stroudsburg, PA: ACL, 2018: 107−112

    [85]

    Zhang Guanhua, Bai Bing, Liang Jian, et al. Selection bias explorations and debias methods for natural language sentence matching datasets[C] //Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4418−4429

    [86] Clark C, Yatskar M, Zettlemoyer L. Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases[C] //Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 4060−4073
    [87]

    Cadene R, Dancette C, Cord M, et al. Rubi: Reducing unimodal biases for visual question answering[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 841−852

    [88] Lu Kaiji, Mardziel P, Wu Fangjing, et al. Gender bias in neural natural language processing[G] //LNCS 12300: Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday. Berlin: Springer, 2020: 189−202
    [89] Maudslay R H, Gonen H, Cotterell R, et al. It’s all in the name: Mitigating gender bias with name-based counterfactual data substitution[C] //Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 5270−5278
    [90]

    Zmigrod R, Mielke S J, Wallach H, et al. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology[C] //Proc of the 57th Annual Meeting of the ACL. Stroudsburg, PA: ACL, 2019: 1651−1661

    [91]

    Kaushik D, Hovy E, Lipton Z. Learning the difference that makes a difference with counterfactually-augmented data[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-14]. https://openreview.net/pdf?id=Sklgs0NFvr

    [92]

    Agarwal V, Shetty R, Fritz M. Towards causal VQA: Revealing and reducing spurious correlations by invariant and covariant semantic editing[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 9690−9698

    [93]

    Chang Chunhao, Adam G A, Goldenberg A. Towards robust classification model by counterfactual and invariant data generation[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 15212−15221

    [94]

    Wang Zhao, Culotta A. Robustness to spurious correlations in text classification via automatically generated counterfactuals[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 14024−14031

    [95]

    Chen Long, Yan Xin, Xiao Jun, et al. Counterfactual samples synthesizing for robust visual question answering[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10800−10809

    [96] Wu Yiquan, Kuang Kun, Zhang Yating, et al. De-biased court’s view generation with causality[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 763−780
    [97]

    Qi Jia, Niu Yulei, Huang Jianqiang, et al. Two causal principles for improving visual dialog[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10860−10869

    [98]

    Wang Tan, Huang Jiangqiang, Zhang Hanwang, et al. Visual commonsense R-CNN[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10760−10770

    [99]

    Zhang Dong, Zhang Hanwang, Tang Jinhui, et al. Causal intervention for weakly-supervised semantic segmentation[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 655−666

    [100]

    Tang Kaihua, Niu Yulei, Huang Jianqiang, et al. Unbiased scene graph generation from biased training[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 3716−3725

    [101]

    Tang Kaihua, Huang Jianqiang, Zhang Hanwang. Long-tailed classification by keeping the good and removing the bad momentum causal effect[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1513−1524

    [102]

    Niu Yulei, Tang Kaihua, Zhang Hanwang, et al. Counterfactual VQA: A cause-effect look at language bias[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 12700−12710

    [103]

    Kuang Kun, Cui Peng, Athey S, et al. Stable prediction across unknown environments[C] //Proc of the 24th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining. New York: ACM, 2018: 1617−1626

    [104]

    Shen Zheyan, Cui Peng, Kuang Kun, et al. Causally regularized learning with agnostic data selection bias[C] //Proc of the 26th ACM Int Conf on Multimedia. New York: ACM, 2018: 411−419

    [105]

    Kuang Kun, Xiong Ruoxuan, Cui Peng, et al. Stable prediction with model misspecification and agnostic distribution shift[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 4485−4492

    [106]

    Shen Zheyan, Cui Peng, Zhang Tong, et al. Stable learning via sample reweighting[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 5692−5699

    [107]

    Zhang Xingxuan, Cui Peng, Xu Renzhe, et al. Deep stable learning for out-of-distribution generalization[J]. arXiv preprint, arXiv: 2104.07876, 2021

    [108]

    Peters J, Bühlmann P, Meinshausen N. Causal inference by using invariant prediction: Identification and confidence intervals[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2016, 78(5): 947−1012

    [109]

    Christina H D, Nicolai M, Jonas P. Invariant causal prediction for nonlinear models[J/OL]. Journal of Causal Inference, 2018, 6(2): 20170016 [2021-11-15]. https://www.degruyter.com/document/doi/10.1515/jci-2017-0016/pdf

    [110]

    Pfister N, Bühlmann P, Peters J. Invariant causal prediction for sequential data[J]. Journal of the American Statistical Association, 2019, 114(527): 1264−1276

    [111]

    Arjovsky M, Bottou L, Gulrajani I, et al. Invariant risk minimization[J]. arXiv preprint, arXiv: 1907.02893, 2019

    [112]

    Zhang A, Lyle C, Sodhani S, et al. Invariant causal prediction for block MDPs[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 11214−11224

    [113]

    Creager E, Jacobsen J H, Zemel R. Environment inference for invariant learning[C] //Proc of the 38th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2021: 2189−2200

    [114]

    Kaushik D, Setlur A, Hovy E H, et al. Explaining the efficacy of counterfactually augmented data[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2021 [2021-11-14]. https://openreview.net/pdf?id=HHiiQKWsOcV

    [115]

    Abbasnejad E, Teney D, Parvaneh A, et al. Counterfactual vision and language learning[C] //Proc of the 33rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 10044−10054

    [116] Liang Zujie, Jiang Weitao, Hu Haifeng, et al. Learning to contrast the counterfactual samples for robust visual question answering[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 3285−3292
    [117]

    Teney D, Abbasnedjad E, van den Hengel A. Learning what makes a difference from counterfactual examples and gradient supervision[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 580−599

    [118]

    Fu T J, Wang X E, Peterson M F, et al. Counterfactual vision-and-language navigation via adversarial path sampler[C] //Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 71−86

    [119]

    Parvaneh A, Abbasnejad E, Teney D, et al. Counterfactual vision-and-language navigation: Unravelling the unseen[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 5296−5307

    [120]

    Sauer A, Geiger A. Counterfactual generative networks[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2021 [2021-11-14]. https://openreview.net/pdf?id=BXewfAYMmJw

    [121]

    Mao Chengzhi, Cha A, Gupta A, et al. Generative interventions for causal learning[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 3947−3956

    [122] Zeng Xiangji, Li Yunliang, Zhai Yuchen, et al. Counterfactual generator: A weakly-supervised method for named entity recognition[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 7270−7280
    [123] Fu T J, Wang Xin, Grafton S, et al. Iterative language-based image editing via self-supervised counterfactual reasoning[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 4413−4422
    [124]

    Pitis S, Creager E, Garg A. Counterfactual data augmentation using locally factored dynamics[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 3976−3990

    [125]

    Zhang Junzhe, Kumor D, Bareinboim E. Causal imitation learning with unobserved confounders[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 12263−12274

    [126]

    Coston A, Kennedy E, Chouldechova A. Counterfactual predictions under runtime confounding[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 4150−4162

    [127]

    Atzmon Y, Kreuk F, Shalit U, et al. A causal view of compositional zero-shot recognition[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1462−1473

    [128]

    Yang Zekun, Feng Juan. A causal inference method for reducing gender bias in word embedding relations[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 9434−9441

    [129]

    Schölkopf B, Hogg D W, Wang Dun, et al. Modeling confounding by half-sibling regression[J]. Proceedings of the National Academy of Sciences, 2016, 113(27): 7391−7398 doi: 10.1073/pnas.1511656113

    [130] Shin S, Song K, Jang J H, et al. Neutralizing gender bias in word embedding with latent disentanglement and counterfactual generation[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing: Findings. Stroudsburg, PA: ACL, 2020: 3126−3140
    [131]

    Yang Zekun, Liu Tianlin. Causally denoise word embeddings using half-sibling regression[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 9426−9433

    [132]

    Yang Xu, Zhang Hanwang, Qi Guojin, et al. Causal attention for vision-language tasks[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 9847−9857

    [133]

    Tople S, Sharma A, Nori A. Alleviating privacy attacks via causal learning[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 9537−9547

    [134]

    Zhang Cheng, Zhang Kun, Li Yingzhen. A causal view on robustness of neural networks[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 289−301

    [135]

    Sun Xinwei, Wu Botong, Liu Chang, et al. Latent causal invariant model[J]. arXiv preprint, arXiv: 2011.02203, 2020

    [136]

    Mitrovic J, McWilliams B, Walker J C, et al. Representation learning via invariant causal mechanisms[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2021 [2021-11-14]. https://openreview.net/pdf?id=9p2ekP904Rs

    [137]

    Mahajan D, Tople S, Sharma A. Domain generalization using causal matching[J]. arXiv preprint, arXiv: 2006.07500, 2020

    [138]

    Zhang Weijia, Liu Lin, Li Jiuyong. Robust multi-instance learning with stable instances[C] //Proc of the 24th European Conf on Artificial Intelligence. Ohmsha: IOS, 2020: 1682−1689

    [139]

    Kleinberg J, Mullainathan S, Raghavan M. Inherent trade-offs in the fair determination of risk scores[J]. arXiv preprint, arXiv: 1609.05807, 2016

    [140] Grgic-Hlaca N, Zafar M B, Gummadi K P, et al. The case for process fairness in learning: Feature selection for fair decision making[C/OL] //Proc of Symp on Machine Learning and the Law at the 30th Conf on Neural Information Processing Systems. 2016 [2021-11-17]. http: //www.mlandthelaw.org/papers/grgic.pdf
    [141]

    Dwork C, Hardt M, Pitassi T, et al. Fairness through awareness[C] //Proc of the 3rd Innovations in Theoretical Computer Science Conf. New York: ACM, 2012: 214−226

    [142]

    Calders T, Kamiran F, Pechenizkiy M. Building classifiers with independency constraints[C] //Proc of the 9th IEEE Int Conf on Data Mining Workshops. Piscataway, NJ: IEEE, 2009: 13−18

    [143]

    Hardt M, Price E, Srebro N. Equality of opportunity in supervised learning[C] //Proc of the 30th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2016: 3315−3323

    [144]

    Xu Renzhe, Cui Peng, Kuang Kun, et al. Algorithmic decision making with conditional fairness[C] //Proc of the 26th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining. New York: ACM, 2020: 2125−2135

    [145]

    Kusner M J, Loftus J, Russell C, et al. Counterfactual fairness[C] //Proc of the 31st Int Conf on Neural Information Processing Systems. New York: ACM, 2017: 4066−4076

    [146]

    Kilbertus N, Rojas-Carulla M, Parascandolo G, et al. Avoiding discrimination through causal reasoning[C] //Proc of the 31st Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2017: 656−666

    [147]

    Nabi R, Shpitser I. Fair inference on outcomes[C] //Proc of the 32nd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018: 1931−1940

    [148]

    Chiappa S. Path-specific counterfactual fairness[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 7801−7808

    [149]

    Wu Yongkai, Zhang Lu, Wu Xintao, et al. PC-fairness: A unified framework for measuring causality-based fairness[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 3404−3414

    [150]

    Wu Yongkai, Zhang Lu, Wu Xintao. Counterfactual fairness: Unidentification, bound and algorithm[C] //Proc of the 28th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2019: 1438−1444

    [151] Huang P S, Zhang Huan, Jiang R, et al. Reducing sentiment bias in language models via counterfactual evaluation[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing: Findings. Stroudsburg, PA: ACL, 2020: 65−83
    [152]

    Garg S, Perot V, Limtiaco N, et al. Counterfactual fairness in text classification through robustness[C] //Proc of the 33rd AAAI/ACM Conf on AI, Ethics, and Society. Menlo Park, CA: AAAI, 2019: 219−226

    [153]

    Hu Yaowei, Wu Yongkai, Zhang Lu, et al. Fair multiple decision making through soft interventions[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 17965−17975

    [154]

    Goel N, Amayuelas A, Deshpande A, et al. The importance of modeling data missingness in algorithmic fairness: A causal perspective[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 7564−7573

    [155]

    Xu Depeng, Wu Yongkai, Yuan Shuhan, et al. Achieving causal fairness through generative adversarial networks[C] //Proc of the 28th Int Joint Conf on Artificial Intelligence. Red Hook, NY: Curran Associates Inc, 2019: 1452−1458

    [156]

    Khademi A, Lee S, Foley D, et al. Fairness in algorithmic decision making: An excursion through the lens of causality[C] //Proc of the 28th World Wide Web Conf. New York: ACM, 2019: 2907−2914

    [157]

    Zhang Junzhe, Bareinboim E. Fairness in decision-making—The causal explanation formula[C] //Proc of the 32nd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2018: 2037−2045

    [158]

    Zhang Junzhe, Bareinboim E. Equality of opportunity in classification: A causal approach[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2018: 3671−3681

    [159]

    Wang Hao, Ustun B, Calmon F. Repairing without retraining: Avoiding disparate impact with counterfactual distributions[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 6618−6627

    [160]

    Creager E, Madras D, Pitassi T, et al. Causal modeling for fairness in dynamical systems[C] //Proc of the 37th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2020: 2185−2195

    [161]

    Swaminathan A, Joachims T. Batch learning from logged bandit feedback through counterfactual risk minimization[J]. The Journal of Machine Learning Research, 2015, 16(1): 1731−1755

    [162]

    Swaminathan A, Joachims T. The self-normalized estimator for counterfactual learning[C] //Proc of the 29th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2015: 3231−3239

    [163]

    Wu Hang, Wang May. Variance regularized counterfactual risk minimization via variational divergence minimization[C] //Proc of the 35th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2018: 5353−5362

    [164]

    London B, Sandler T. Bayesian counterfactual risk minimization[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 4125−4133

    [165]

    Faury L, Tanielian U, Dohmatob E, et al. Distributionally robust counterfactual risk minimization[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 3850−3857

    [166]

    Schnabel T, Swaminathan A, Singh A, et al. Recommendations as treatments: Debiasing learning and evaluation[C] //Proc of the 33rd Int Conf on Machine Learning. Cambridge, MA: JMLR, 2016: 1670−1679

    [167]

    Yang Longqi, Cui Yin, Xuan Yuan, et al. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback[C] //Proc of the 12th ACM Conf on Recommender Systems. New York: ACM, 2018: 279−287

    [168]

    Bonner S, Vasile F. Causal embeddings for recommendation[C] //Proc of the 12th ACM Conf on Recommender Systems. New York: ACM, 2018: 104−112

    [169]

    Narita Y, Yasui S, Yata K. Efficient counterfactual learning from bandit feedback[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 4634−4641

    [170]

    Zou Hao, Cui Peng, Li Bo, et al. Counterfactual prediction for bundle treatment[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 19705−19715

    [171]

    Xu Da, Ruan Chuanwei, Korpeoglu E, et al. Adversarial counterfactual learning and evaluation for recommender system[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 13515−13526

    [172]

    Lopez R, Li Chenchen, Yan Xiang, et al. Cost-effective incentive allocation via structured counterfactual inference[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 4997−5004

    [173]

    Joachims T, Swaminathan A, Schnabel T. Unbiased learning-to-rank with biased feedback[C] //Proc of the 10th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2017: 781−789

    [174]

    Wang Xuanhui, Golbandi N, Bendersky M, et al. Position bias estimation for unbiased learning to rank in personal search[C] //Proc of the 11th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2018: 610−618

    [175]

    Ai Qingyao, Bi Keping, Luo Cheng, et al. Unbiased learning to rank with unbiased propensity estimation[C] //Proc of the 41st Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2018: 385−394

    [176]

    Agarwal A, Takatsu K, Zaitsev I, et al. A general framework for counterfactual learning-to-rank[C] //Proc of the 42nd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2019: 5−14

    [177]

    Jagerman R, de Rijke M. Accelerated convergence for counterfactual learning to rank[C] //Proc of the 43rd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2020: 469−478

    [178]

    Vardasbi A, de Rijke M, Markov I. Cascade model-based propensity estimation for counterfactual learning to rank[C] //Proc of the 43rd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2020: 2089−2092

    [179]

    Jagerman R, Oosterhuis H, de Rijke M. To model or to intervene: A comparison of counterfactual and online learning to rank from user interactions[C] //Proc of the 42nd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2019: 15−24

    [180]

    Bottou L, Peters J, Quiñonero-Candela J, et al. Counterfactual reasoning and learning systems: The example of computational advertising[J]. The Journal of Machine Learning Research, 2013, 14(1): 3207−3260

    [181]

    Lawrence C, Riezler S. Improving a neural semantic parser by counterfactual learning from human bandit feedback[C] //Proc of the 56th Annual Meeting of the ACL, Vol 1. Stroudsburg, PA: ACL, 2018: 1820−1830

    [182]

    Bareinboim E, Forney A, Pearl J. Bandits with unobserved confounders: A causal approach[C] //Proc of the 29th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2015: 1342−1350

    [183]

    Lee S, Bareinboim E. Structural causal bandits: Where to intervene?[C] //Proc of the 32nd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2018: 2568−2578

    [184]

    Lee S, Bareinboim E. Structural causal bandits with non-manipulable variables[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 4164−4172

    [185]

    Haan P, Jayaraman D, Levine S. Causal confusion in imitation learning[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 11698−11709

    [186]

    Kyono T, Zhang Yao, van der Schaar M. CASTLE: Regularization via auxiliary causal graph discovery[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 1501−1512

    [187]

    Yang Mengyue, Liu Frurui, Chen Zhitang, et al. CausalVAE: Disentangled representation learning via neural structural causal models[C] //Proc of the 34th IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 9593−9602

    [188]

    Zinkevich M, Johanson M, Bowling M, et al. Regret minimization in games with incomplete information[C] //Proc of the 20th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2007: 1729−1736

    [189]

    Brown N, Lerer A, Gross S, et al. Deep counterfactual regret minimization[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 793−802

    [190]

    Farina G, Kroer C, Brown N, et al. Stable-predictive optimistic counterfactual regret minimization[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 1853−1862

    [191]

    Brown N, Sandholm T. Solving imperfect-information games via discounted regret minimization[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 1829−1836

    [192]

    Li Hui, Hu Kailiang, Zhang Shaohua, et al. Double neural counterfactual regret minimization[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-14]. https://openreview.net/pdf?id=ByedzkrKvH

    [193]

    Oberst M, Sontag D. Counterfactual off-policy evaluation with Gumbel-max structural causal models[C] //Proc of the 36th Int Conf on Machine Learning. Cambridge, MA: JMLR, 2019: 4881−4890

    [194]

    Buesing L, Weber T, Zwols Y, et al. Woulda, coulda, shoulda: Counterfactually-guided policy search[C/OL] //Proc of the 9th Int Conf on Learning Representations. 2019 [2021-11-14]. https://openreview.net/pdf?id=BJG0voC9YQ

    [195] Chen Long, Zhang Hanwang, Xiao Jun, et al. Counterfactual critic multi-agent training for scene graph generation[C] //Proc of the 2019 IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 4613−4623
    [196] Zhu Qingfu, Zhang Weinan, Liu Ting, et al. Counterfactual off-policy training for neural dialogue generation[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 3438−3448
    [197] Choi S, Park H, Yeo J, et al. Less is more: Attention supervision with counterfactuals for text classification[C] //Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 6695−6704
    [198]

    Zhang Zhu, Zhao Zhou, Lin Zhejie, et al. Counterfactual contrastive learning for weakly-supervised vision-language grounding[C] //Proc of the 34th Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2020: 655-666

    [199]

    Kocaoglu M, Snyder C, Dimakis A G, et al. CausalGAN: Learning causal implicit generative models with adversarial training[C] //Proc of the 6th Int Conf on Learning Representations, 2018 [2021-11-03]. https://openreview.net/pdf?id=BJE-4xW0W

    [200]

    Kim H, Shin S, Jang J H, et al. Counterfactual fairness with disentangled causal effect variational autoencoder[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 8128−8136

    [201] Qin Lianhui, Bosselut A, Holtzman A, et al. Counterfactual story reasoning and generation[C] //Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA: ACL, 2019: 5046−5056
    [202]

    Hao Changying, Pang Liang, Lan Yanyan, et al. Sketch and customize: A counterfactual story generator[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 12955−12962.

    [203]

    Madaan N, Padhi I, Panwar N, et al. Generate your counterfactuals: Towards controlled counterfactual generation for text[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 13516−13524

    [204]

    Peysakhovich A, Kroer C, Lerer A. Robust multi-agent counterfactual prediction[C] //Proc of the 33rd Int Conf on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc, 2019: 3083−3093

    [205]

    Baradel F, Neverova N, Mille J, et al. CoPhy: Counterfactual learning of physical dynamics[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020 [2021-11-14]. https://openreview.net/pdf?id=SkeyppEFvS

  • 期刊类型引用(1)

    1. LUO Haoran,HU Shuisong,WANG Wenyong,TANG Yuke,ZHOU Junwei. Research on Multi-Core Processor Analysis for WCET Estimation. ZTE Communications. 2024(01): 87-94 . 必应学术

    其他类型引用(4)

图(10)  /  表(5)
计量
  • 文章访问数:  2401
  • HTML全文浏览量:  238
  • PDF下载量:  1146
  • 被引次数: 5
出版历程
  • 收稿日期:  2021-07-22
  • 修回日期:  2021-11-14
  • 网络出版日期:  2023-02-10
  • 刊出日期:  2022-12-31

目录

/

返回文章
返回