Survey on Key Technologies of Graph Processing Systems Based on Multi-core CPU and GPU Platforms
-
摘要:
图计算作为分析与挖掘关联关系的一种关键技术,已在智慧医疗、社交网络分析、金融反欺诈、地图道路规划、计算科学等领域广泛应用. 当前,通用CPU与GPU架构的并行结构、访存结构、互连结构及同步机制的不断发展,使得多核CPU与GPU成为图处理加速的常用平台. 但由于图处理具有处理数据规模大、数据依赖复杂、访存计算比高等特性,加之现实应用场景下的图数据分布不规则且图中的顶点与边呈现动态变化,给图处理的性能提升和高可扩展性带来严峻挑战. 为应对上述挑战,大量基于多核CPU与GPU平台的图处理系统被提出,并在该领域取得显著成果. 为了让读者了解多核CPU与GPU平台上图处理优化相关技术的演化,首先剖析了图数据、图算法、图应用特性,并阐明图处理所面临的挑战. 然后分类梳理了当前已有的基于多核CPU与GPU平台的图处理系统,并从加速图处理设计的角度,详细、系统地总结了关键优化技术,包括图数据预处理、访存优化、计算加速和数据通信优化等. 最后对已有先进图处理系统的性能、可扩展性等进行分析,并从不同角度对图处理未来发展趋势进行展望,希望对从事图处理系统研究的学者有一定的启发.
Abstract:As a key technique for analyzing and mining relationships, graph computing has been widely used in smart healthcare, social network analysis, financial anti-fraud, road navigation, computational sciences, and others. In recent years, with the development of parallel structures, memory access methods, interconnection structures, and synchronization mechanisms of general-purpose CPU and GPU architecture, multi-core CPU and GPU have become common platforms for accelerating graph processing. However, there are significant challenges to graph processing in terms of improving performance and achieving high scalability, such as the large scale and irregular distribution of data, complex data dependence, high communication-to-computation ratio, and dynamic changes of vertices and edges. To address the above challenges, a large number of graph processing systems based on multi-core CPU and GPU platforms are proposed and achieve good results. To provide readers with a comprehensive understanding of graph processing optimizations on multi-core CPU and GPU platforms, we first present the basic concepts, including the graph data structures, the typical graph algorithms, and the characteristics of graph application. Then the challenges of graph processing are clarified. After that, we present the existing graph processing systems based on multi-core CPU and GPU platforms. From the perspective of accelerated graph processing design, we summarize the key technology optimizations in detail and systematically, including pre-processing, memory access optimization, computing acceleration, and overhead reduction of data communication. Finally, we analyze the performance and scalability of state-of-art graph processing and conclude the future development trend of graph processing based on different perspectives, which expects to bring certain inspiration to relevant researchers to explore high-performance graph processing system.
-
物联网技术的发展和普及使得现代生活环境更加友好和便捷,对人们的生活方式产生了重要影响. 物联网将电子设备与互联网连接,能够使互联网连接对象使用嵌入式传感器、射频识别、激光扫描器等信息传感设备进行数据的采集和交换. 通过网络接入,实现物与物、物与人的连接,从而达到智能化感知、识别和管理. 由于采用无线网络通信,使得物联网更容易遭受各种攻击. 身份认证技术能避免非授权用户非法访问数据,为物联网数据提供了可靠的安全保障. 数字签名能提供数据的完整性和真实性,并且可以实现签名者身份认证功能. 在签名时通常涉及椭圆曲线群中的指数运算或配对运算,这些运算往往计算代价高昂,使得轻量级设备无法承受. 在线/离线签名(online/offline signature, OOS)[1]方案将签名过程分为在线和离线部分,在未知消息之前,通过将高昂的计算分配给离线阶段而仅保留一些轻量级计算给在线阶段的方式,使得原本无法部署在轻量级设备的签名方案也能适用于物联网环境. 传统的公钥密码体制不具有细粒度访问控制功能. 为了解决这一问题,2005年,Sahai等人[2]提出模糊身份加密方案,定义了属性基密码概念. 在属性基加密方案[3-5]中,用户的身份由一个属性集表示,密文或者密钥与一个访问策略相关联,当用户的属性满足指定的访问策略时,用户可以成功解密密文. 属性基加密体制不仅实现了细粒度访问控制而且具有一对多加密功能,可以很好地解决云计算中的访问控制、数据安全和隐私保护等关键技术问题,得到了国内外学术界和工业界的高度重视. 过去,国内商用密码方案大都是基于国外的密码技术标准,不符合国家网络空间安全自主可控的发展战略. SM9[6]是中国商用标识密码标准,包含数字签名、加密、密钥交换和密钥封装. 2018年,SM9 标识签名算法已被采纳为国际标准ISO/IEC的一部分,并于2020年成为中国国家标准GM/T 38635.2—2020. 2020年,SM9标识算法成为国际标准. 由于SM9 标识密码算法是通过椭圆曲线上的配对运算实现的,因此高昂的计算代价成为了其在轻量级设备中应用的瓶颈. 同时,原始的SM9 标识密码算法无法高效地实现1对多的数据共享和访问控制功能. 综上所述,如何将SM9 标识密码算法应用于轻量级设备和实现1对多的数据共享和访问控制是亟待解决并具有挑战的研究方向.
1. 相关工作
属性基密码主要包括属性基加密和属性基签名.在属性基加密方案中,根据访问策略部署的位置不同可分为密文策略[7-9]和密钥策略[10]的属性基加密. 密文策略的属性基加密方案中,访问策略与密文关联,属性集与密钥关联. 当属性集满足密文中的访问策略时,用户可以正确解密. 在密钥策略的属性基加密方案中,访问策略与密钥关联,属性集嵌入在密文中,当且仅当密文中的属性集满足密钥中的访问策略时,用户可以正确解密.
属性基加密方案提供了数据的细粒度访问控制功能并且实现了保密性,但无法提供数据完整性和认证性. 2011年,Maji等人[11]提出了属性基签名方案,实现了细粒度访问控制并能够确保数据的完整性和认证性,在一般群模型中证明了方案的安全性. 2012年,Okamoto等人[12]提出支持非单调访问策略的高效属性基签名(attribute-based signature, ABS)方案,并在标准模型下给出了方案的安全性证明. 在文献[11−12]中,签名过程需要使用群中的指数运算和配对运算,这些计算代价高昂,使得方案无法适用于轻量级设备. 为了降低签名算法中高昂的计算代价,1989年,Even等人[1]提出在线/离线签名方案,在线/离线签名通过将签名算法分为在线、离线阶段,在知道签名消息之前,将高昂的计算在离线阶段完成. 而在在线阶段运行一些轻量级计算. 2010年,Liu等人[13]提出基于身份的在线/离线签名方案,并给出了方案的安全性证明. 为了解决文献[13]中密钥托管问题,2017年,Liu等人[14]基于无证书思想提出无密钥托管的身份基在线/离线签名方案. 基于身份的在线/离线签名方案虽然降低了在线签名的计算代价,但无法实现访问控制和用户身份隐私保护. 为了解决上述问题,Rao[15]在2017年提出了属性基在线/离线签密方案,利用属性集代替用户身份,只有当用户属性集满足访问策略时才能产生有效签名. 为了降低签名客户端的计算代价,张应辉等人[16]提出可验证的服务器辅助属性基签名方案. 2021年,Li等人[17-18]利用服务器辅助技术,提出了具有服务器辅助的属性基签名方案,将大量计算外包给服务器从而降低了计算开销. 为了解决属性基签名方案中的用户身份追踪和敏感消息隐藏问题,2021年,李继国等人[19]提出可追踪的属性基净化签名方案,实现了恶意签名者身份追踪和数据脱敏. 目前,大多数密码方案基于国外密码技术标准设计,不符合国家网络空间安全自主可控的发展战略. SM9[6]是中国商用标识密码标准,包含数字签名、加密和密钥交换. 2018年,Cheng等人[20]分析了SM9加密算法和密钥协商协议的安全性,并给出了安全性证明. 由于SM9标识密码算法是通过椭圆曲线上的指数运算和配对运算实现,因此计算代价高昂,无法在轻量级设备上使用. 为了提高SM9标识签名计算性能,王松等人[21]提出了SM9标识签名及其验证算法快速实现方案,但无法降低签名过程中的计算代价. 2021年,Lai等人[22]利用在线/离线签名技术,提出了基于商密SM9的在线/离线签名方案,在知道签名消息之前,它将指数运算和配对运算在离线阶段完成,而在线阶段只运行Hash运算或乘法运算,从而有效降低了在线阶段的计算代价. 为了实现1对多的数据通信,2021年,文献[23]提出了基于商密SM9的高效标识广播加密方案. 文献[17−23]仅实现签名或加密功能,为了同时实现数字签名和数据加密,文献[24]提出了基于商密SM9的高效标识签密,仅执行1次操作就能完成签名和加密计算,有效降低了计算开销.
本文贡献主要有:
本文首次提出基于商密SM9的属性基在线/离线签名方案(attribute-based online/offline signature, ABOOS),提出的方案不仅可以获得细粒度访问控制功能,并且通过将指数运算和配对运算在离线阶段完成,降低了在线阶段的计算代价,能够适用于物联网等应用环境. 本文在随机谕言机模型下证明了ABOOS的安全性,安全性可规约到q-SDH困难问题假设. 通过Charm平台,实例化提出的方案并给出性能分析.
2. 预备知识
本节介绍文中用到的相关知识,包括双线性映射、分叉引理、q-SDH困难问题.
2.1 双线性映射
给定安全参数
κ ,生成1个双线性元组BP=(G1,G2,GT,e,p) . 其中G1 ,G2 ,GT 是素数p 阶的循环群. 令P 是G1 的1个生成元,Q 是G2 的1个生成元,1个双线性映射e:G1×G2→GT 具有3个性质.1)双线性. 对任意
P∈G1 ,Q∈G2 和任意a,b∈Z∗p ,有e(aP,bQ)=e(P,Q)ab .2)非退化性. 对任意
P∈G1 ,Q∈G2 ,有e(P,Q)≠1 .3)可计算性. 对任意
P∈G1 ,Q∈G2 ,e(P,Q) 可以被有效计算.此外,在
G1 和G2 之间存在1个有效且能公开计算的同构映射ψ:G2→G1 ,即ψ(Q)=P ,其中P ,Q 分别是G1 ,G2 的生成元.2.2 q-SDH(q-strong Diffie-Hellman)困难问题和困难问题假设
q-SDH困难问题. 令
P ,Q 分别为G1 ,G2 的生成元,在(G1,G2) 群上的q-SDH问题可表述为:给定q+2 个元素的元组(P,Q,aQ,a2Q,…,aqQ) ,找到1对元素(c,1c+aP) ,其中c∈Z∗p .(t,ε)− q-SDH困难问题假设:若不存在概率多项式时间t 的算法至少以不可忽略的概率ε 解决(G1,G2) 上的q-SDH问题,则称q-SDH问题在(G1,G2) 是(t,ε) 困难的.2.3 分叉引理
本文使用分叉引理[25]证明方案的安全性. 为了便于描述,简单回顾如下. 令
T 为一个输入仅包含公共信息的概率多项式时间的图灵机,在进行qs 次签名询问和qh 次随机谕言机询问后,敌手A可以在概率多项式时间t 内,以ε≥10(qs+1)(qs+qh)/2κ 的概率产生1个有效消息签名元组(M,σ1,h,σ2) ,其中M 表示消息,h 是关于元组(M,σ1) 的Hash值,σ2 表示仅与M,σ1,h 相关的值. 若元组(σ1,h,σ2) 能够在不知道签名密钥的情况下以不可区分的分布概率进行模拟,那么就存在另一台概率多项式时间的图灵机T′ 能够在理想时间t′≤120686qht/ε 时,通过控制敌手A模拟替换与签名者的交互并产生2个有效消息签名元组(M,σ1,h,σ2) 和(M,σ1,h′,σ′2) ,使得h≠h′ .3. 形式化定义和安全模型
本节给出ABOOS方案的形式化定义和安全模型. 图1给出了方案的系统框架,其中包括3个实体:属性授权机构、轻量级签名设备和验证设备. 属性授权机构为签名设备产生密钥
skωj ,签名设备在未知消息之前先通过离线阶段进行预计算产生离线签名σoff ,然后再通过在线阶段产生在线签名σon ,最终将在线签名σon 和消息M发送给验证设备. 若签名有效,验证设备返回accept;否则,返回reject.3.1 ABOOS方案的形式化定义
在线/离线属性基签名方案由4个阶段构成.
1)设置. 该算法输入安全参数
κ ,输出公开参数params 和主密钥msk . 属性授权机构保留主密钥msk .2)密钥生成. 算法输入公开参数
params 、主密钥msk 和访问策略A ,输出签名密钥skωj .3)签名. 该阶段分为离线签名和在线签名2个阶段.具体算法为:
①离线签名. 算法输入公开参数
params 和签名者密钥skωj ,输出离线签名σoff ;②在线签名. 算法输入公开参数
params 、签名者属性集ωj 、离线签名σoff 、签名者密钥skωj 和消息M ,输出在线签名σon .4)验证. 算法输入公开参数
params ,消息M 和在线签名σon . 若签名有效,则输出accept;否则,输出reject.3.2 安全模型
借鉴文献[22]的思想,本节定义在选择消息和选择策略下的存在不可伪造游戏,算法B和敌手A的具体交互为:
初始化. A首先声明将要挑战的访问策略
A∗ 和A∗ 中的一个属性集ω∗j 发送给B.设置. B执行设置算法,输入安全参数
κ ,输出公开参数params 和主密钥msk ,将公开参数params 发送给A,自己保留主密钥msk .密钥询问. A询问关于访问策略
A 和属性集ωj 的密钥,其中ωj∉A∗ . B执行密钥生成算法生成密钥skωj 并发送给A.签名询问. A询问关于属性集
ωj 和消息M 的签名,B首先执行离线签名算法生成离线签名σoff ,再执行在线签名算法生成在线签名σon . 最后,将在线签名σon 发送给A.伪造. A输出一个元组
(A∗,ω∗j,M∗,σ∗) ,若满足3个条件,则称A赢得不可伪造性游戏.1)
σ∗ 是关于(A∗,ω∗j,M∗) 的1个有效签名;2)A没有询问过关于
A∗ 和ω∗j 的密钥,即在密钥生成询问中A≠A∗ 且ωj∉A∗ ;3) A没有询问过关于
(A∗,ω∗j,M∗) 的签名.定义 1. 如果对于所有概率多项式时间
t 的敌手进行至多qk 次密钥生成询问和至多qs 次签名询问,它赢得上述不可伪造性游戏的概率ϵ 是可忽略的,那么提出的方案在选择消息和策略下具有存在性不可伪造.4. 方案构造
借鉴文献[26]的思想,可以从基于身份(identity based signature, IBS)方案构造ABS方案,再利用原始的SM9标识签名方案构造基于商密SM9的属性基在线/离线签名方案.在方案中,令系统属性域为
U=\{{a\mathrm{t}\mathrm{t}}_{1},{att}_{2},… ,{att}_{u}\} ,访问策略\mathcal{A}=\{{\mathcal{A}}_{1},{\mathcal{A}}_{2},… ,{\mathcal{A}}_{n}\} 表示1组授权属性集,其中{\mathcal{A}}_{i}=\{{att}_{i1},{att}_{i2},… ,{att}_{i{s}_{i}}\} ,{\omega }_{j}=\{{att}_{j1},{att}_{j2},… ,{att}_{jd}\} 表示签名者属性集,其中u=\left|U\right| ,n=\left|\mathcal{A}\right| ,1\le {s}_{i}\le u 且1\le d\le u . 若{\omega }_{j}\in \mathcal{A} ,称属性集{\omega }_{j} 满足访问策略\mathcal{A} . 设置算法\phi (\omega ,U) 将属性集\omega 转换为1个二进制标识{ID}_{\omega } ,\phi (\omega ,U) 定义为:首先输入属性集\omega ,系统属性域U ,令{ID}_{\omega }\left[i\right] 表示{ID}_{\omega } 的第i 位,若{att}_{i}\in \omega ,令{ID}_{\omega }\left[i\right]=1 ;否则,令{ID}_{\omega }\left[i\right]=0 . 其中1\le i\le u ,u=\left|U\right| . 然后算法输出{ID}_{\omega } . 最后调用算法\phi (\omega ,U) 将访问策略\mathcal{A}=\{{\mathcal{A}}_{1},{\mathcal{A}}_{2},… ,{\mathcal{A}}_{n}\} 转换成一个二进制标识集合I=\{{ID}_{{\mathcal{A}}_{1}},{ID}_{{\mathcal{A}}_{2}},… ,{ID}_{{\mathcal{A}}_{n}}\} .ABOOS方案包含5个算法:设置、密钥生成、在线签名、离线签名和验证.
1)设置. 给定安全参数
\kappa ,属性授权机构生成1个双线性配对元组BP=({G}_{1},{G}_{2},{G}_{T},e,p) . 随机选取{G}_{1} 的生成元{P}_{1} ,{G}_{2} 的生成元{P}_{2} . 然后,随机选取\mathrm{\alpha }\in {\mathbb{Z}}_{p}^{*} 作为主密钥,计算{P}_{\mathrm{p}\mathrm{u}\mathrm{b}}=\alpha {P}_{2} ,g=e({P}_{1},{P}_{\mathrm{p}\mathrm{u}\mathrm{b}}) . 系统属性域U= \{{att}_{1}, {att}_{2},… ,{att}_{u}\} ,u=\left|U\right| . 选取2个Hash函数{H}_{1}: {\left\{\mathrm{0,1}\right\}}^{*}\to {\mathbb{Z}}_{p}^{*} ,{H}_{2}:{\left\{\mathrm{0,1}\right\}}^{*}\to {\mathbb{Z}}_{p}^{*} . 属性授权机构随机选取1比特密钥生成函数标识hid ,则公开参数为params= (BP,{P}_{1},{P}_{2},{P}_{\mathrm{p}\mathrm{u}\mathrm{b}},g,U,hid,{H}_{1},{H}_{2}) .2)密钥生成. 属性授权机构为属性集
{\omega }_{j} 产生密钥{sk}_{{\omega }_{j}} . 利用算法\phi (\omega ,U) 将访问策略\mathcal{A}=\{{\mathcal{A}}_{1}, {\mathcal{A}}_{2},… , {\mathcal{A}}_{n}\} 转换成二进制标识集合I=\{{ID}_{{\mathcal{A}}_{1}},{ID}_{{\mathcal{A}}_{2}},… , {ID}_{{\mathcal{A}}_{n}}\} . 算法输入访问策略\mathcal{A} 、集合I 、属性集{\omega }_{j} 、主密钥msk 和公开参数params . 属性授权机构随机选取{r}_{\mathrm{s}}\in {\mathbb{Z}}_{p}^{*} ,计算密钥{sk}_{{\omega }_{j}}=({sk}_{1},{sk}_{2}) ,其中{sk}_{1}=\dfrac{\alpha {r}_{\mathrm{s}}^{-1}}{\left[\dfrac{\prod\limits _{i=1}^{n}{H}_{1}\left({ID}_{{\mathcal{A}}_{i}}||hid,{r}_{\mathrm{s}}\right)}{\prod\limits_{{\mathcal{A}}_{i}\ne {\omega }_{j},i\in \left[1,n\right]}{H}_{1}\left({ID}_{{\mathcal{A}}_{i}}||hid,{r}_{\mathrm{s}}\right)}+\alpha \right]}{P}_{1} ,{sk}_{2}={r}_{\mathrm{s}} . 此时将{sk}_{1} ,{sk}_{2} 相乘可知,密钥{sk}_{{\omega }_{j}} 满足商密SM9标识签名算法密钥结构. 若\dfrac{\prod\limits_{i=1}^{n}{H}_{1}\left({ID}_{{\mathcal{A}}_{i}}||hid,{r}_{\mathrm{s}}\right)}{\prod\limits_{{\mathcal{A}}_{i}\ne {\omega }_{j},i\in \left[1,n\right]}{H}_{1}\left({ID}_{{\mathcal{A}}_{i}}||hid,{r}_{\mathrm{s}}\right)}+\alpha =0 ,属性授权机构重新选取随机数{r}_{\mathrm{s}}\in {\mathbb{Z}}_{p}^{*} ,并重新为签名者生成密钥.3)离线签名. 算法输入公开参数
params ,签名者密钥{sk}_{{\omega }_{j}}=({sk}_{1},{sk}_{2}) . 签名者随机选取r,k\in {\mathbb{Z}}_{p}^{*} ,计算w={g}^{r} ,l={sk}_{2}\times \left(r-k\right)\mathrm{m}\mathrm{o}\mathrm{d}p ,S=l\times {sk}_{1} . 输出离线签名{\sigma }_{\mathrm{o}\mathrm{f}\mathrm{f}}=(r,k,w,S) .4)在线签名. 算法输入公开参数
params ,{\sigma }_{\mathrm{o}\mathrm{f}\mathrm{f}}= (r,k,w,S) 和消息M .签名者利用算法\varphi (\omega ,U) 将属性集{\omega }_{j} 转换成一个二进制标识{ID}_{{\omega }_{j}} . 计算h={H}_{2}\left(M\right||w,p) ,\tau =\left(r-h\right){\left(r-k\right)}^{-1}\mathrm{m}\mathrm{o}\mathrm{d}p ,y={H}_{1}\left({ID}_{{\omega }_{j}}||hid, {sk}_{2}\right) . 输出在线签名{\sigma }_{\mathrm{o}\mathrm{n}}=(h,\tau ,y,S) .5)验证. 当验证者获得消息签名对
(M,{\sigma }_{\mathrm{o}\mathrm{n}})=(M, (h,\tau ,y,S\left)\right) ,可以通过执行算法验证签名. 1)计算t={g}^{h} ;2)计算P=y{P}_{2}+{P}_{\mathrm{p}\mathrm{u}\mathrm{b}} ;3)计算\beta =e(\tau \cdot S,P) ;4)计算{w}'= \beta \cdot t ;5)计算{h}_{2}={H}_{2}\left(M\right||{w}',p) . 检查等式{h}_{2}=h 是否成立. 若成立,则签名有效,输出accept;否则,输出reject.5. 正确性和安全性分析
5.1 正确性分析
若签名者产生1个有效签名,那么该签名可以通过验证算法. 正确性分析为:
当用户属性集
{\omega }_{j} 满足给定的访问策略\mathcal{A} ,即{\omega }_{j}\in \mathcal{A} 时,可以得到{ID}_{{\omega }_{j}}\in \{{ID}_{{\mathcal{A}}_{1}},{ID}_{{\mathcal{A}}_{2}},… ,{ID}_{{\mathcal{A}}_{n}}\} ,则有等式成立.\begin{aligned} &{w}'=\beta \times t=e\left(\tau \times S,P\right)\times {g}^{h}=e\Bigg((r-h)\times\\ & \quad \dfrac{\alpha {r}_{\mathrm{s}}^{-1}\left(r-k\right){\times \left(r-k\right)}^{-1}{r}_{\mathrm{s}}}{\left[\dfrac{\prod\limits_{i=1}^{n}{H}_{1}\left({ID}_{{\mathcal{A}}_{i}}||hid,{r}_{\mathrm{s}}\right)}{\prod\limits_{{\mathcal{A}}_{i}\ne {\omega }_{j},i\in \left[1,n\right]}{H}_{1}\left({ID}_{{\mathcal{A}}_{i}}||hid,{r}_{\mathrm{s}}\right)}+\alpha \right]}{P}_{1},\\ &{H}_{1}\left({ID}_{{\omega }_{j}}||hid,{r}_{\mathrm{s}}\right){P}_{2}+\alpha {P}_{2}\Bigg)\times {e\left({P}_{1},{P}_{\mathrm{p}\mathrm{u}\mathrm{b}}\right)}^{h}=\\ &\quad e\Bigg(\dfrac{\alpha (r-h)}{{H}_{1}\left({ID}_{{\omega }_{j}}||hid,{r}_{\mathrm{s}}\right)+\alpha }{P}_{1},\\ &\left({H}_{1}\left({ID}_{{\omega }_{j}}||hid,{r}_{\mathrm{s}}\right) +\alpha \right){P}_{2}\Bigg)\times {e\left({P}_{1},\alpha {P}_{2}\right)}^{h}=\\ &\quad e\left(\alpha \left(r-h\right){P}_{1},{P}_{2}\right)\times {e\left({P}_{1},\alpha {P}_{2}\right)}^{h}=\\ &\quad e(\alpha r{P}_{1},{P}_{2})= w . \end{aligned} 因此
{h}_{2}={H}_{2}\left(M\right|\left|{w}',p\right)={H}_{2}\left(M\right|\left|w,p\right)=h .所以提出的方案满足正确性要求.5.2 安全性分析
本节给出ABOOS方案的安全性证明,基于q-SDH困难问题假设提出的方案在选择消息和选择策略下具有存在性不可伪造.
定理 1. 令
{H}_{1} ,{H}_{2} 是随机谕言机.若存在概率多项式时间的敌手A能够以\beta 的优势赢得不可伪造性游戏,则存在一个概率多项式时间的算法B能够以\varepsilon 的概率解决q-SDH困难问题,其中\varepsilon \le \dfrac{\beta }{{q}_{{H}_{1}}} ,{q}_{{H}_{1}} 是{H}_{1} 询问次数.证明. 通过A和B的交互游戏证明定理1.给定1个q-SDH困难问题实例
(P,Q,aQ,{a}^{2}Q,… ,{a}^{q}Q) ,B的目标是通过与A的交互找到元组\left(c,\dfrac{1}{c+a}P\right) ,其中c\in {\mathbb{Z}}_{p}^{*} . A与B的交互游戏为:初始化. A选择将要挑战的访问策略
{\mathcal{A}}^{\mathcal{*}} 和{\mathcal{A}}^{\mathcal{*}} 中的1个属性集{\omega }_{j}^{*} 并发送给B,B通过算法\varphi (\omega ,U) 将{\mathcal{A}}^{\mathcal{*}} ,{\omega }_{j}^{*} 分别转换为二进制标识集合{I}^{\mathrm{*}}=\{{ID}_{{\mathcal{A}}_{1}^{\mathrm{*}}},{ID}_{{\mathcal{A}}_{2}^{\mathrm{*}}},… , {ID}_{{\mathcal{A}}_{n}^{\mathrm{*}}}\} 和二进制标识{ID}_{{\omega }_{j}^{\mathrm{*}}} .设置. B产生公开参数. 首先令
{\mathcal{A}}_{k}=\{{\mathcal{A}}_{k1}, {\mathcal{A}}_{k2},… , {\mathcal{A}}_{kn}\} 表示访问策略,{\omega }_{j} 表示用户属性集. 随机选取{s}_{1}^{\mathrm{*}},{s}_{2}^{\mathrm{*}},… ,{s}_{n}^{\mathrm{*}},{s}_{i1},{s}_{i2},… ,{s}_{in}\in {\mathbb{Z}}_{p}^{\mathrm{*}} ,其中i\in \{\mathrm{0,1},… ,{q}_{\mathrm{s}}-1\} . 令{x}_{i}=\dfrac{\prod\limits_{j=1}^{n}{s}_{kj}}{\prod\limits_{{\mathcal{A}}_{kj}\ne {\omega }_{j},j\in [1,n]}{s}_{kj}} ,{x}^{\mathrm{*}}=\dfrac{\prod\limits_{i=1}^{n}{s}_{i}^{\mathrm{*}}}{\prod\limits _{{\mathcal{A}}_{i}^{\mathrm{*}}\ne {\omega }_{j}^{\mathrm{*}},i\in [1,n]}{s}_{i}^{\mathrm{*}}} . 生成一个q-1 阶多项式f\left(z\right)=\prod\limits_{i=1}^{q-1}\left(z+{x}_{i}\right)=\displaystyle\sum _{i=0}^{q-1}{c}_{i}{z}^{i} ,其中{c}_{i} 是f\left(z\right) 的系数. 利用给定的q-SDH困难问题实例计算{P}_{2}= f\left(a\right)Q= \displaystyle\sum_{i=0}^{q-1}{c}_{i}\left({a}^{i}Q\right) . 计算{P}_{1}=\varphi \left({P}_{2}\right)=f\left(a\right) P ,{P}_{\mathrm{p}\mathrm{u}\mathrm{b}}= \displaystyle\sum _{i=0}^{q}{c}_{i-1}\left({a}^{i}Q\right)= a{P}_{2}\mathrm{和}g=e({P}_{1},{P}_{\mathrm{p}\mathrm{u}\mathrm{b}}) ;当i\in [1, q-1] 时,令{f}_{i}\left(z\right)=\dfrac{f\left(z\right)}{z+{x}_{i}}=\displaystyle\sum _{i=0}^{q-2}{d}_{i}{z}^{i} ,计算{V}_{i}=\displaystyle\sum_{i=0}^{q-2}{d}_{i}\varphi \left({a}^{i+1}Q\right)= a{f}_{i}\left(a\right) P= \dfrac{af\left(a\right)}{a+{x}_{i}}P=\dfrac{a}{a+{x}_{i}}{P}_{1} . 因此对于任意i\in [1, q-1] ,元组\left({x}_{i},{V}_{i}=\dfrac{a}{a+{x}_{i}}{P}_{1}\right) 是可计算的. 随机选取1比特密钥生成函数标识hid . 令主密钥msk=a ,公开参数params= ({P}_{1},{P}_{2},g,hid,{P}_{\mathrm{p}\mathrm{u}\mathrm{b}}) . 此时,\mathrm{H}\mathrm{a}\mathrm{s}\mathrm{h}\mathrm{函}\mathrm{数}{H}_{1},{H}_{2} 被看成是由B控制的随机谕言机.询问. 在询问阶段,A可以进行
{H}_{1} 询问、{H}_{2} 询问、密钥询问和签名询问. 具体过程为:{H}_{1} 询问. A首先利用算法\varphi (\omega ,U) 将访问策略{\mathcal{A}}_{k}=\{{\mathcal{A}}_{k1},{\mathcal{A}}_{k2},… ,{\mathcal{A}}_{kn}\} 转换为二进制标识集合I= \{{ID}_{{\mathcal{A}}_{k1}},{ID}_{{\mathcal{A}}_{k2}},… ,{ID}_{{\mathcal{A}}_{kn}}\} ,其中k\in \{\mathrm{1,2},… ,{q}_{\mathrm{s}}-1\} . A询问关于{ID}_{{\mathcal{A}}_{ki}} 的Hash值{s}_{ki} . B初始化1个空列表{L}_{1} 并记录相关应答({ID}_{{\mathcal{A}}_{ki}},{s}_{ki}) .若关于{ID}_{{\mathcal{A}}_{ki}} 的询问记录已经在列表{L}_{1} 中,B直接返回{H}_{1}\left({ID}_{{\mathcal{A}}_{ki}}||hid,{r}_{\mathrm{s}}\right)={s}_{ki} ;若{ID}_{{\mathcal{A}}_{ki}} 是一个未经询问的二进制标识,此时令{H}_{1}\left({ID}_{{\mathcal{A}}_{ki}}||hid, {r}_{s}\right)= {s}_{ki} 并将({ID}_{{\mathcal{A}}_{ki}},{s}_{ki}) 添加到列表{L}_{1} 中;若{\mathcal{A}}_{k}={\mathcal{A}}^{\mathcal{*}} ,令{H}_{1} \left({ID}_{{\mathcal{A}}_{i}^{*}}||hid,{r}_{\mathrm{s}}\right)={s}_{i}^{*} . 然后,B将询问结果发送给A.{H}_{2} 询问. A询问关于({M}_{i},{w}_{i}) 的Hash值{h}_{i} . B初始化1个空列表{L}_{2} 并记录相关应答(M,{w}_{i},{h}_{i}) . 若关于({M}_{i},{w}_{i}) 的询问记录已经在列表中,B直接返回 {H}_{2}\left({M}_{i}\right|\left|{w}_{i},p\right)={h}_{i} ;否则,B随机选取{h}_{i}\in {\mathbb{Z}}_{p}^{\mathrm{*}} ,令{H}_{2} \left({M}_{i}\right|\left|{w}_{i},p\right)={h}_{i} ,并将({M}_{i},{w}_{i},{h}_{i}) 添加到列表{L}_{2} 中. 然后,B将询问结果发送给A.密钥询问. A询问关于访问策略
{\mathcal{A}}_{k}=\{{\mathcal{A}}_{k1}, {\mathcal{A}}_{k2},… , {\mathcal{A}}_{kn}\} 和满足{\mathcal{A}}_{k} 的1个属性集{\omega }_{j} 的密钥,此时满足{\mathcal{A}}_{k}\ne {\mathcal{A}}^{\mathcal{*}} ,{ID}_{{\omega }_{j}}\in \{{ID}_{{\mathcal{A}}_{k1}},{ID}_{{\mathcal{A}}_{k2}},… ,{ID}_{{\mathcal{A}}_{kn}}\} 且{ID}_{{\omega }_{j}}\notin \{{ID}_{{\mathcal{A}}_{1}^{\mathrm{*}}}, {ID}_{{\mathcal{A}}_{2}^{\mathrm{*}}},… , {ID}_{{\mathcal{A}}_{n}^{\mathrm{*}}}\} . B询问{H}_{1} 预言机获得({ID}_{{\omega }_{j}},{s}_{{\omega }_{j}}) . 在设置阶段有{x}_{i}=\dfrac{\prod\limits_{j=1}^{n}{s}_{kj}}{\prod\limits _{{\mathcal{A}}_{kj}\ne {\omega }_{j},j\in [1,n]}{s}_{kj}} . 随机选取{r}_{\mathrm{s}}\in {\mathbb{Z}}_{p}^{\mathrm{*}} ,计算{sk}_{1}={r}_{\mathrm{s}}^{-1}{V}_{i} ,{sk}_{2}={r}_{\mathrm{s}} 并发送给A,其中{V}_{i}=\dfrac{a}{a+{x}_{i}}{P}_{1} 在设置阶段被计算.签名询问. A询问关于访问策略
{\mathcal{A}}_{k}=\{{\mathcal{A}}_{k1},{\mathcal{A}}_{k2},… , {\mathcal{A}}_{kn}\} ,属性集{\omega }_{j} 和消息M 的签名\sigma . B首先询问{H}_{1} 预言机获得{s}_{ki} 并计算P={s}_{ki}{P}_{2}+{P}_{\mathrm{p}\mathrm{u}\mathrm{b}} . 然后,B随机选取h,\tau \in {\mathbb{Z}}_{p}^{\mathrm{*}} ,S\in {G}_{1} 并计算w=e(\tau \cdot S,P){g}^{h} . 最后,B将(M,w) 添加到列表{L}_{2} 中. 此时,设置{H}_{2}\left(M||w,p\right)=h . 需要注意的是,当{H}_{2}\left(M\right|\left|w,p\right) 已经在记录在列表{L}_{2} 中,那么B就无法正确模拟签名,此时B就终止游戏.这个事件发生的概率为\dfrac{{q}_{s}+{q}_{{H}_{2}}}{{2}^{\kappa }} ,其中{q}_{s} 是签名询问次数,{q}_{{H}_{2}} 是{H}_{2} 询问次数.伪造. A输出关于
{M}^{\mathrm{*}} ,{\mathcal{A}}^{\mathcal{*}} ,{\omega }_{j}^{\mathrm{*}} 的1个有效签名{\sigma }^{\mathrm{*}} . 由文献[25]定义的分叉引理可知,给定一个输入(mpk,{\mathcal{A}}^{\mathcal{*}},{\omega }_{j}^{\mathrm{*}}) ,可以构造另一个算法A'以足够多的次数重放A,并获得2个有效签名({M}^{\mathrm{*}},{w}^{\mathrm{*}},{y}^{\mathrm{*}},{h}_{1}^{\mathrm{*}},{\tau }_{1}^{\mathrm{*}},{S}_{1}^{\mathrm{*}}) 和({M}^{\mathrm{*}},{w}^{\mathrm{*}},{y}^{\mathrm{*}},{h}_{2}^{\mathrm{*}},{\tau }_{2}^{\mathrm{*}},{S}_{2}^{\mathrm{*}}) ,其中{h}_{1}^{\mathrm{*}}\ne {h}_{2}^{\mathrm{*}} . 然后,B运行A'以获得关于{M}^{\mathrm{*}} ,{\mathcal{A}}^{\mathcal{*}} ,{\omega }_{j}^{\mathrm{*}}\mathrm{和}{w}^{\mathrm{*}} 的2个有效伪造签名({M}^{\mathrm{*}},{w}^{\mathrm{*}}, {y}^{\mathrm{*}},{h}_{1}^{\mathrm{*}},{\tau }_{1}^{\mathrm{*}},{S}_{1}^{\mathrm{*}}) 和({M}^{\mathrm{*}},{w}^{\mathrm{*}},{y}^{\mathrm{*}},{h}_{2}^{\mathrm{*}},{\tau }_{2}^{\mathrm{*}},{S}_{2}^{\mathrm{*}}) . 从列表{L}_{1} 中,B可以获得元组 ({ID}_{{\mathcal{A}}_{i}^{*}},{s}_{i}^{*}) ,并根据设置阶段{x}^{\mathrm{*}}= \dfrac{\prod\limits_{i=1}^{n}{s}_{i}^{\mathrm{*}}}{\prod\limits _{{\mathcal{A}}_{i}^{\mathrm{*}}\ne {\omega }_{j}^{\mathrm{*}},i\in [1,n]}{s}_{i}^{\mathrm{*}}} 获得{x}^{\mathrm{*}} . 由于2个都是有效签名,因此都能通过验证算法. 可得{w}^{\mathrm{*}}= e\left({\tau }_{1}^{\mathrm{*}}\times {S}_{1}^{\mathrm{*}}, {P}^{\mathrm{*}}\right){g}^{{h}_{1}^{\mathrm{*}}}= e({\tau }_{2}^{\mathrm{*}}\times {S}_{2}^{\mathrm{*}},{P}^{\mathrm{*}}){g}^{{h}_{2}^{\mathrm{*}}} ,其中{P}^{\mathrm{*}}={y}^{\mathrm{*}}{P}_{2}+{P}_{\mathrm{p}\mathrm{u}\mathrm{b}}={x}^{\mathrm{*}}{P}_{2}+a{P}_{2}= ({x}^{\mathrm{*}}+a){P}_{2} . 然后计算e\left(\left({\tau }_{1}^{\mathrm{*}}\times {S}_{1}^{\mathrm{*}}-{\tau }_{2}^{\mathrm{*}}\times {S}_{2}^{\mathrm{*}}\right),{P}^{\mathrm{*}}\right)={g}^{{h}_{2}^{\mathrm{*}}-{h}_{1}^{\mathrm{*}}} . 进一步,可以获得e({a}^{-1}{\left({h}_{2}^{\mathrm{*}}-{h}_{1}^{\mathrm{*}}\right)}^{-1}\left({\tau }_{1}^{\mathrm{*}}\times {S}_{1}^{\mathrm{*}}-{\tau }_{2}^{\mathrm{*}}\times {S}_{2}^{\mathrm{*}}\right),\left({x}^{\mathrm{*}}+a\right){P}_{2})= e({P}_{1},{P}_{2}) . 因此有{\left({h}_{2}^{\mathrm{*}}-{h}_{1}^{\mathrm{*}}\right)}^{-1}\left({\tau }_{1}^{\mathrm{*}}\times{S}_{1}^{\mathrm{*}}-{\tau }_{2}^{\mathrm{*}}\times {S}_{2}^{\mathrm{*}}\right)=\dfrac{a}{{x}^{\mathrm{*}}+a}{P}_{1} . 令{X}^{\mathrm{*}}= {\left({h}_{2}^{\mathrm{*}}-{h}_{1}^{\mathrm{*}}\right)}^{-1}\left({\tau }_{1}^{\mathrm{*}}\times {S}_{1}^{\mathrm{*}}-{\tau }_{2}^{\mathrm{*}}\times {S}_{2}^{\mathrm{*}}\right) ,因为{P}_{1}=f\left(a\right)P ,有\dfrac{a}{{x}^{\mathrm{*}}+a}{P}_{1}= \dfrac{af\left(a\right)}{{x}^{\mathrm{*}}+a}P=\dfrac{\gamma }{{x}^{\mathrm{*}}+a}P+\displaystyle\sum_{i=0}^{q-1}{\gamma }_{i}{a}^{i}P ,其中系数{\gamma }_{i} 已知并且有\gamma \ne 0 . B计算{\pi }^{\mathrm{*}}=\dfrac{1}{{x}^{\mathrm{*}}+a}{P}_{1}=\dfrac{1}{\gamma }\left({X}^{\mathrm{*}}-\displaystyle\sum_{i=0}^{q-1} {\gamma }_{i}\varphi \left({a}^{i}Q\right)\right) ,输出({x}^{\mathrm{*}},{\pi }^{\mathrm{*}}) 作为q-SDH困难问题实例的1个解.综上所述,若A能够以不可忽略的概率
\beta 伪造1个有效签名,那么B就能以不可忽略的概率\varepsilon 解决q-SDH困难问题,其中\varepsilon \le \dfrac{\beta }{{q}_{{H}_{1}}} ,{q}_{{H}_{1}} 是{H}_{1} 询问的次数,定理1得证. 证毕.6. 方案分析
为了解决物联网环境中轻量级设备计算受限而无法执行高昂的计算和用户隐私保护问题,本文提出了基于商密SM9的属性基在线/离线签名方案(ABOOS),提出的方案同时具有细粒度访问控制功能. 通过与已有工作[12,17,27-28]相比,分析本文方案优势. 文献[12]给出了支持非单调访问策略的属性基签名方案,方案具有匿名性并且实现了细粒度访问控制,但签名和验证阶段需要大量的指数运算和配对运算,计算开销大. 为了提高签名和验证算法的效率,文献[27−28]提出了高效的属性基签名方案,它通过减少运算中使用的配对运算提高了算法的效率,但仍无法高效地适用于轻量级设备. 为进一步降低验证过程的计算开销,文献[17]提出了具有服务器辅助验证的属性基签名方案,它通过将大量计算外包给云服务器降低了本地的计算开销. 文献[12,17,27−28]的方案是基于国外密码技术标准设计的,不符合国家网络空间安全自主可控的发展战略. 本文提出的基于商密SM9的属性基在线/离线签名方案,不仅具有签名者匿名性和细粒度访问控制,并且有效降低了轻量级设备的签名计算代价. 而且方案是在商密SM9标识签名算法标准下设计的,符合国家核心技术自主创新的发展战略. 方案性能对比如表1所示.
7. 性能分析
本方案与文献[27−28]的计算开销和通信开销比较如表2和表3所示. 基于Ubuntu 18.4,本文在Charm0.5框架下实现了所提的方案. 使用Intel(R) Core(TM) i5-3230M CPU @2.60GHz,4GB RAM性能计算机,利用Charm库中的超奇异椭圆曲线(SS512)测试方案. 实验中群的阶
p 为512b的大素数. 在计算机上测试主要密码学操作开销,经过1000次测量取平均值后得到实验中配对运算所需时间为12.58 ms,在群{G}_{1} 和{G}_{2} 中执行指数运算所需时间分别为4.97 ms和5.02 ms,在群{G}_{T} 执行指数运算的时间为8.37 ms,在群{\mathbb{Z}}_{p}^{*} 中执行乘法运算的时间为0.24 ms,执行Hash运算的时间为0.02 ms. 实验结果如图2所示.表 2 方案计算开销比较Table 2. Comparison of Computation Cost of Schemes方案 密钥生成 离线签名 在线签名 验证 文献[27]方案 \left(\left|{\omega }_{j}\right|+3\right)\times{\mathrm{E} }_{ {G}_{1} }+\mathrm{T} 3\left(\left|{\omega }_{j}\right|+6\right)\times{\mathrm{E} }_{ {G}_{1} }+7\mathrm{T} \left(\left|{\omega }_{j}\right|+3\right)\times\mathrm{P}+4\mathrm{T} 文献[28]方案 2\left|{\omega }_{j}\right|\times{\mathrm{E} }_{ {G}_{1} }+\left|{\omega }_{j}\right|\times \mathrm{T} H+T+(\left|{\omega }_{j}\right|+1)\times{\mathrm{E} }_{ {G}_{1} } H+3P+\left|{\omega }_{j}\right|\times{\mathrm{E} }_{ {G}_{1} }+T 本文方案 2nT+(2n-1)×H {\mathrm{E}}_{{G}_{T}} +2T 2H+T H+P+ {\mathrm{E}}_{{G}_{T}} + {\mathrm{E}}_{{G}_{1}} + {\mathrm{E}}_{{G}_{2}} +T 注: {\omega }_{j} 表示签名者属性集;n表示访问策略中授权属性集的数量;P表示双线性对运算;H表示Hash运算;T表示群 {\mathbb{Z}}_{p}^{\mathrm{*}} 中的乘法运算; {\mathrm{E}}_{{G}_{1}} 表示群 {G}_{1} 中的指数运算; {\mathrm{E}}_{{G}_{2}} 表示群 {G}_{2} 中的指数运算; {\mathrm{E}}_{{G}_{T}} 表示群 {G}_{T} 中的指数运算. 表 3 方案通信开销比较Table 3. Comparison of Communication Cost of Schemes方案 密钥 离线签名 在线签名 文献[27]方案 (\left|{\omega }_{j}\right|+2)\times\left|{G}_{1}\right| (2\left|{\omega }_{j}\right|+2)\times\left|{G}_{1}\right| 文献[28]方案 \left|{\omega }_{j}\right|\times\left|{G}_{1}\right| 2\left|{G}_{1}\right|+\left|{\mathbb{Z}}_{p}^{*}\right| 本文方案 \left|{G}_{1}\right|+\left|{\mathbb{Z}}_{p}^{*}\right| \left|{G}_{1}\right|+\left|{G}_{{T} }\right|+2\left|{\mathbb{Z} }_{p}^{*}\right| \left|{G}_{1}\right|+3\left|{\mathbb{Z}}_{p}^{*}\right| 注: \left|{\omega }_{j}\right| 表示属性集 {\omega }_{j} 的大小; \left|{G}_{1}\right| , \left|{G}_{2}\right| , \left|{G}_{T}\right| , \left|{\mathbb{Z}}_{p}^{*}\right| 分别表示各个群元素的大小. 实验仿真分析结果表明提出的ABOOS方案采用在线/离线签名技术使得签名过程的各计算开销均低于文献[27−28]提出的高效属性基签名方案的. 在签名验证过程中,本文提出的方案使用固定数量的配对运算和指数运算验证计算开销也远小于文献[27−28] .
8. 结束语
本文在SM9标识签名方案的基础上首次提出基于商密SM9的属性基在线/离线签名方案(ABOOS). 该方案不仅适用于物联网环境,同时还实现了细粒度访问控制功能. 基于q-SDH困难问题,本文在随机谕言机模型下证明了该方案的安全性. 通过与现有属性基签名方案的对比分析可知,提出的方案更适用于物联网环境.
作者贡献声明:朱留富负责提出初步方案,实验设计、论文初稿撰写和修改;李继国负责论文思路构建、理论指导、方案分析和论文修改;赖建昌、黄欣沂和张亦辰负责论文方案分析、论文润色和修改.
-
表 1 典型静态图处理系统
Table 1 Typical Static Graph Processing Systems
平台及类别 典型系统 数据表示 编程模型 核心优化 实现算法 CPU 存内图处理 Ligra[2] CSR 点中心同步 混合Push和Pull模式间动态切换 BFS,BC,Radii,CC,PR,SSSP Galois[3] CSR 点中心异步 拓扑感知任务调度和优先级调度 BFS,CC,PR,SSSP,BC GraphMat[39] DCSC 点中心同步 基于SpMV实现图处理 PR,BFS,TC,CF,SSSP Polymer[117] CSR 点中心同步 面向NUMA,减少远程内存访问 PR,SpMV,BP,BFS,CC,SSSP GraphIt[4] CSR,CSC 点中心同步 图计算领域编程语言,混合优化 BFS,CC,CF,PR-D 此外,Gagra[118],Ligra+[119]等. 核外图处理 GraphChi[5] EL 点中心异步 并行滑动窗口,优化对硬盘的访问 PR,SpMV,CC,TC,CF X-Stream[76] EL 边中心同步 以边为中心计算,以流的方式处理边 CC,SSSP,SpMV,PR,BP,ALS GridGraph[6] EL 边中心同步 两级图划分,双滑动窗口,减少I/O BFS,WCC,SpMV,PR LUMOS[120] EL 点中心异步 提出依赖驱动的核外图处理 PR,CoEM,DP,BP,WPR,LP GraphSD[98] CSR 点中心异步 状态和依赖感知的更新策略 PR,PR-D,CC,SSSP 此外,VENUS[121] ,CLIP[122],MultilogVC[123]等. 分布式图处理 Pregel[1] CSR 点中心BSP同步 高效、易用、可扩展性强 PR,SSSP,CC,LP PowerGraph[72] CSR 点中心GAS同步 点中心划分实现负载均衡,减少通信 SSSP,ALS,PR Gemini[7] CSR+CSC 点中心同步 高效内存管理,通信技术及计算模型 PR,CC,SSSP,BFS,BC FLASH[8] CSR 点中心同步 简单、高效地分布式图分析编程模型 BFS,KC,CC,BC,MM,TC,等
(70多种算法)此外,PowerLyra[124]等. GPU 存内图处理 Medusa[125] CSR 点中心 同步 使用CSR表示和消息日志减少读开销 BFS,SSSP Cusha[73] CSR 点中心ICU同步 提供G-shard和CW机制,合并访存 BFS,SSSP,CC,SSWP,NN,… Gunrock[9-10] CSR+SOA 数据中心同步 AFC编程模型和高性能原语,混合调度 BFS,SSSP,BC,PR,CC SEP-graph[12] CSR+CSC 点中心混合 Push/Pull, Sysn/Async, TD/DD混合处理 PR,SSSP,BFS,BC SIMD-X[42] CSR 点中心ACC同步 运行时任务管理 BFS,PR,SSSP,k-Core GSWITCH[126] CSR+CSC 点中心同步 机器学习模型自动调优组合图模式 BFS,CC,PR,SSSP,BC G2[127] CSR+CSC 点中心同步 图计算领域原语,混合多种优化 PR,CC,BC,SSSP,BFS 此外,IrGL [127]等. 核外图处理 Totem[128-129] CSR 点中心同步 低度数据GPU处理高度数据CPU处理 BFS,PR,BC,SSSP Graphie[130] EL 边中心异步 异步边流输传输较少开销,重命名技术 BFS,SSSP,CC Subway[13] CSR 点中心异步 子图生成,减少数据传输(CPU-GPU) SSSP,SSWP,BFS,CC,PR,… EMOGI[103] CSR 点中心同步 基于零拷贝,优化图算法提高数据传输 SSSP,BFS,CC,PR HyTGraph[14] CSR 点中心异步 混合数据转换管理,代价感知异步调度 PR,SSSP,CC,BFS 此外,GTS[131],GraphReduce[91],Garaph[132],HALO[99],Grus[133]等. 分布式图处理 LUX[134] CSR 点中心同步 数据重划分以实现GPU间负载均衡 PR,CC,SSSP,BC,CF Gluon[135] CSR 点中心同步 混合划分优化通信 BFS,CC,PR,SSSP Groute[15-16] CSR 点中心异步 借鉴路由思想通信,实现异步图处理 BFS,SSSP,PR,CC GUM[17] CSR 点中心同步 负载迁移的高效多GPU图分析系统 BFS,SSSP,PR,WCC 此外,MultiGraph[41],SWARMGRAPH[136]等. 表 2 典型静态图处理系统性能
Table 2 Performance of Classic Static Graph Processing Systems
系统 平台 数据集 BFS性能/GTEPS SSSP性能/GTEPS Ligra 40核Intel CPU E7-8870 Twitter[173] 4.579 0.553 rMat27[36] 5.012 0.526 GraphSD 2×8核Intel CPU E5-2620 Uk-union[174] 0.010(PageRank) 0.001 rMat30[36] 0.009(PageRank) 0.001 FLASH 20节点,单节点30核Intel CPU E5-2682 v4 Uk-2002[175] 0.132 0.043(TC) soc-orkut[176] 0.334 0.035(TC) Tigr 8 GB,P4000 GPU soc-orkut[176] 2.117 0.760 Pokec[177] 3.010 1.464 HyTGraph 11 GB,GTX 2080Ti GPU Twitter[174] 2.306 0.938 UK-2007[174] 3.761 1.191 GUM 32 GB V100×8 NvLink Web2001[178-179] 25.147 2.587(PageRank) road-USA[175] 0.082 0.784(PageRank) 注:TC(triangle counting). -
[1] Malewicz G, Austern M H, Bik A J C, et al. Pregel: A system for large-scale graph processing[C]//Proc of the 2010 ACM SIGMOD Int Conf on Management of data. New York: ACM, 2010: 135–146
[2] Shun Julian, Blelloch G E. Ligra: A lightweight graph processing framework for shared memory[C]//Proc of the 18th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2013: 135–146
[3] Nguyen D, Lenharth A, Pingali K. A lightweight infrastructure for graph analytics[C]//Proc of the 24th ACM Symp on Operating Systems Principles. New York: ACM, 2013: 456–471
[4] Zhang Yunming, Yang Mengjiao, Baghdadi R, et al. GraphIt: A high-performance graph DSL[J]. Proceedings of the ACM on Programming Languages, 2018, 2: 1−30
[5] Kyrola A, Blelloch G, Guestrin C. GraphChi: Large-scale graph computation on just a PC[C]//Proc of the 10th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 31–46
[6] Zhu Xiaowei, Han Wentao, Chen Wenguang. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning[C]//Proc of the 2015 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2015: 375–386
[7] Zhu Xiaowei, Chen Wenguang, Zheng Weimin, et al. Gemini: A computation-centric distributed graph processing system[C]//Proc of the 12th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2016: 301–316
[8] Li Xue, Meng Ke, Qin Lu, et al. FLASH: A framework for programming distributed graph processing algorithms[C]//Proc of the 39th Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2023: 232−244
[9] Wang Yangzihao, Davidson A, Pan Yuechao, et al. Gunrock: A high-performance graph processing library on the GPU[C]//Proc of the 21st ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2016: 1−12
[10] Wang Yangzihao, Pan Yuechao, Davidson A, et al. Gunrock: GPU graph analytics[J]. ACM Transactions on Parallel Computing, 2017, 4(1): 1−49
[11] Sabet A H N, Qiu Junqiao, Zhao Zhijia. Tigr: Transforming irregular graphs for GPU-friendly graph processing[C]//Proc of the 23rd Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2018: 622–636
[12] Wang Hao, Geng Liang, Lee R, et al. SEP-graph: Finding shortest execution paths for graph processing under a hybrid framework on GPU[C]//Proc of the 24th Symp on Principles and Practice of Parallel Programming. New York: ACM, 2019: 38–52
[13] Sabet A H N, Zhao Zhijia, Gupta R. Subway: Minimizing data transfer during out-of-GPU-memory graph processing[C]//Proc of the 15th European Conf on Computer Systems. New York: ACM, 2020: 1−16
[14] Wang Qiange, Ai Xin, Zhang Yanfeng, et al. HyTGraph: GPU-accelerated graph processing with hybrid transfer management[C]//Proc of the 39th Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2023: 558−571
[15] Ben-Nun T, Sutton M, Pai S, et al. Groute: An asynchronous multi-GPU programming model for irregular computations[C]//Proc of the 22nd ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2017: 235–248
[16] Ben-Nun T, Sutton M, Pai S, et al. Groute: Asynchronous multi-GPU programming model with applications to large-scale graph processing[J]. ACM Transactions on Parallel Computing, 2020, 7(3): 1−27
[17] Meng Ke, Geng Liang, Li Xue, et al. Efficient multi-GPU graph processing with remote work stealing [C]//Proc of the 39th Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2023: 191−204
[18] Sengupta D, Song S L. EvoGraph: On-the-fly efficient mining of evolving graphs on GPU[C]//Proc of the High Performance Computing: 32nd Int Conf, ISC High Performance 2017. Berlin: Springer, 2017: 97–119
[19] Mariappan M, Vora K. GraphBolt: Dependency-driven synchronous processing of streaming Graphs[C]//Proc of the 14th EuroSys Conf 2019. New York: ACM, 2019: 1−16
[20] Kumar P, Huang H H. GraphOne: A data store for real-time analytics on evolving graphs[C].//Proc of the 17th USENIX Conf on File and Storage Technologies (FAST’19). Berkeley, CA: USENIX Association, 2019: 249−263
[21] Iyer A P, Pu Qifan, Patel K, et al. TEGRA: Efficient ad-hoc analytics on evolving graphs[C]//Proc of the 18th USENIX Symp on Networked Systems Design and Implementation (NSDI’21). Berkeley, CA: USENIX Association, 2021: 337−355
[22] Martella C, Shaposhnik R, Logothetis D, et al. Practical Graph Analytics with Apache Giraph [M]. Berkeley, CA: Apress, 2015
[23] Gonzalez J E, Xin R S, Dave A, et al. GraphX: Graph processing in a distributed dataflow framework[C]//Proc of the 11th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2014: 599−613
[24] Fan Wenfei, Xu Jingbo, Wu Yinghui, et al. GRAPE: Parallelizing sequential graph computations[J]. Proceedings of the VLDB Endowment, 2017, 10(12): 1889−1892 doi: 10.14778/3137765.3137801
[25] Yang Ke, Zhang Mingxing, Chen Kang, et al. KnightKing: a fast distributed graph random walk engine[C]//Proc of the 27th ACM Symp on Operating Systems Principles. New York: ACM, 2019: 524–537
[26] Tencent: Plato code [EB/OL]. [2024−01−20].https://github.com/Tencent/plato/
[27] Fan Wenfei, He Tao, Lai Longbin, et al. GraphScope: a unified engine for big graph processing[J]. Proceedings of the VLDB Endowment, 2021, 14(12): 2879−2892 doi: 10.14778/3476311.3476369
[28] Ant Group: TuGraph website [EB/OL]. [2024−01−20].https://www.tugraph.org/
[29] Shi Xuanhua, Zheng Zhigao, Zhou Yongluan, et al. Graph processing on GPUs: A survey[J]. ACM Computing Surveys, 2018, 50(6): 1−35
[30] Huang Jianqiang, Qin Wei, Wang Xiaoying, et al. Survey of external memory large-scale graph processing on a multi-core system[J]. The Journal of Supercomputing, 2020, 76(1): 549−579 doi: 10.1007/s11227-019-03023-0
[31] 王靖,张路,王鹏宇,等. 面向图计算的内存系统优化技术综述[J]. 中国科学:信息科学,2019,49(3):295−313 doi: 10.1360/N112018-00281 Wang Jing, Zhang Lu, Wang Pengyu, et al. Survey on memory system optimization technologies for graph computing[J]. SCIENTIA SINICA Informationis, 2019, 49(3): 295−313 (in Chinese) doi: 10.1360/N112018-00281
[32] Gui Chuangyi, Zheng Long, He Bingsheng, et al. A survey on graph processing accelerators: Challenges and opportunities[J]. Journal of Computer Science Technology, 2019, 34: 339−371 doi: 10.1007/s11390-019-1914-z
[33] 张宇,姜新宇,余辉,等. 图计算体系结构和系统软件关键技术综述[J]. 计算机研究与发展,2024,61(1):20−42 doi: 10.7544/issn1000-1239.202220778 Zhang Yu, Jiang Xinyu, Yu Hui, et al. Research and trend of graph computing architecture and software system[J]. Journal of Computer Research and Development, 2024, 61(1): 20−42 (in Chinese) doi: 10.7544/issn1000-1239.202220778
[34] Heidari S, Simmhan Y, Calheiros R N, et al. Scalable graph processing frameworks: A taxonomy and open challenges[J]. ACM Computing Surveys, 2018, 51(3): 1−53
[35] Sahu S, Mhedhbi A, Salihoglu S, et al. The ubiquity of large graphs and surprising challenges of graph processing[J]. Proceedings of the VLDB Endowment, 2017, 11(4): 420−431 doi: 10.1145/3186728.3164139
[36] Chakrabarti D, Zhan Yiping, Faloutsos C. R-MAT: A recursive model for graph mining[C]//Proc of the 2004 SIAM Int Conf on Data Mining. Philadelphia, PA: SIAM, 2004: 442−446
[37] Takac L, Zabovsky M. Data analysis in public social networks[C]//Proc of the Int scientific Conf and Int workshop present day trends of innovations, 2012, 1−6
[38] Leskovec J, Lang K, Dasgupta A, et al. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters[J]. Internet Mathematics, 2008, 6(1): 29−123
[39] Sundaram N, Satish N, Patwary M M A, et al. GraphMat: high performance graph analytics made productive[J]. Proceedings of the VLDB Endowment, 2015, 8(11): 1214−1225 doi: 10.14778/2809974.2809983
[40] Yang C, Buluç A, Owens J D. GraphBLAST: A high-performance linear algebra-based graph framework on the GPU[J]. ACM Transactions on Mathematical Software, 2022, 48(1): 1−51
[41] Hong Changwan, Sukumaran-Rajam A, Kim J, et al. MultiGraph: Efficient graph processing on GPUs[C]//Proc of the 26th Int Conf on Parallel Architectures and Compilation Techniques (PACT). Piscataway, NJ: IEEE, 2017: 27−40
[42] Liu Hang, Huang H H. SIMD-X: Programming and processing of graph algorithms on GPUs[C]//Proc of the 2019 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2019: 411–427
[43] Green O, Bader D A. cuSTINGER: Supporting dynamic graph algorithms for GPUs[C]//Proc of the 2016 IEEE High Performance Extreme Computing Conf (HPEC). Piscataway, NJ: IEEE, 2016: 1−6
[44] Busato F, Green O, Bombieri N, et al. Hornet: An efficient data structure for dynamic sparse graphs and matrices on GPUs[C]//Proc of the 2018 IEEE High Performance Extreme Computing Conf (HPEC). Piscataway, NJ: IEEE, 2018: 1−7
[45] Winter M, Mlakar D, Zayer R, et al. faimGraph: High performance management of fully-dynamic graphs under tight memory constraints on the GPU[C]//Proc of the SC18: Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2018: 754−766
[46] Sha Mo, Li Yuchen, He Bingsheng, et al. Accelerating dynamic graph analytics on GPUs[J]. Proceedings of the VLDB Endowment, 2017, 11(1): 107−120 doi: 10.14778/3151113.3151122
[47] Bader D A, Berry J, Kahan S, et al. Graph500 [EB/OL]. [2024−01−20]. http://www.graph500.org
[48] Dong Rongyu, Cao Huawei, Ye Xiaochun, et al. Highly efficient and GPU-friendly implementation of BFS on single-node system[C]//Proc of 2020 IEEE Int Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). Piscataway, NJ: IEEE, 2020: 544−553
[49] Liu Hang, Huang H H. Enterprise: Breadth-first graph traversal on GPUs[C]//Proc of the Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2015: 1−12
[50] Dijkstra E W. A Note on Two Problems in Connexion with Graphs [M]. Edsger Wybe Dijkstra: His Life, Work, and Legacy. 2022: 287−290
[51] Bellman R. On a routing problem[J]. Quarterly of Applied Mathematics, 1958, 16(1): 87−90 doi: 10.1090/qam/102435
[52] Meyer U, Sanders P. δ-stepping: A parallel single source shortest path algorithm[C]//Proc of the European Symp on algorithms. Berlin: Springer, 1998: 393−404
[53] Wang Kai, Fussell D, Lin Calvin. A fast work-efficient SSSP algorithm for GPUs[C]//Proc of the 26th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2021: 133–146
[54] Dong Xiaojun, Gu Yan, Sun Yihan, et al. Efficient stepping algorithms and implementations for parallel shortest paths[C]//Proc of the 33rd ACM Symp on Parallelism in Algorithms and Architectures. New York: ACM, 2021: 184–197
[55] Dhulipala L, Blelloch G, Shun Julian. Julienne: A framework for parallel graph algorithms using work-efficient bucketing[C]//Proc of the 29th ACM Symp on Parallelism in Algorithms and Architectures. New York: ACM, 2017: 293–304
[56] Zhang Yuan, Cao Huawei, Zhang Jie, et al. A bucket-aware asynchronous single-source shortest path algorithm on GPU[C]//Proc of the 52nd Int Conf on Parallel Processing Workshops. New York: ACM, 2023: 50–60
[57] Beamer S, Asanović K, Patterson D. Reducing PageRank communication via propagation blocking[C]//Proc of the 2017 IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2017: 820−831
[58] Lakhotia K, Kannan R, Prasanna V. Accelerating PageRank using partition-centric processing[C]//Proc of the 2018 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2018: 427–440
[59] Pandey S, Li X S, Buluc A, et al. H-INDEX: Hash-indexing for parallel triangle counting on GPUs[C]//Proc of the 2019 IEEE High Performance Extreme Computing Conf (HPEC). Piscataway, NJ: IEEE, 2019: 1−7
[60] Yaşar A, Rajamanickam S, Wolf M, et al. Fast triangle counting using Cilk[C]//Proc of the 2018 IEEE High Performance extreme Computing Conf (HPEC). Piscataway, NJ: IEEE, 2018: 1−7
[61] Wasserman S, Faust K. Social network analysis: Methods and applications[J]. American Ethnologist, 1997, 24(1): 219−220 doi: 10.1525/ae.1997.24.1.219
[62] Mislove A, Marcon M, Gummadi K P, et al. Measurement and analysis of online social networks[C]//Proc of the 7th ACM SIGCOMM Conf on Internet Measurement. New York: ACM, 2007: 29−42
[63] Ye Chang, Li Yuchen, He Bingsheng, et al. GPU-accelerated graph label propagation for real-time fraud detection[C]//Proc of the 2021 Int Conf on Management of Data. New York: ACM, 2021: 2348–2356
[64] Jiang Jiaxin, Li Yuan, He Bingsheng, et al. Spade: A real-time fraud detection framework on evolving graphs[J]. Proceedings of the VLDB Endowment, 2022, 16(3): 461−469 doi: 10.14778/3570690.3570696
[65] Alam M A, Faruq M O. Finding shortest path for road network using Dijkstra’s algorithm[J]. Bangladesh Journal of Multidisciplinary Scientific Research, 2019, 1(2): 41−45 doi: 10.46281/bjmsr.v1i2.366
[66] Xiao Yunpeng, He Xi, Yang Chen, et al. Dynamic graph computing: A method of finding companion vehicles from traffic streaming data[J]. Information Sciences, 2022, 591: 128−141 doi: 10.1016/j.ins.2022.01.022
[67] Huang Zan, Chung Wingyan, Chen Hsinchun, et al. A graph model for E-commerce recommender systems[J]. Journal of the American Society for Information Science, 2004, 55(3): 259−274 doi: 10.1002/asi.10372
[68] Mcauley J, Pandey R, Leskovec J. Inferring networks of substitutable and complementary products[C]//Proc of the 21st ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2015: 785–794
[69] Yang Xiaoyong, Zhu Yadong, Zhang Yi, et al. Large scale product graph construction for recommendation in E-commerce [J]. arXiv preprint, arXiv: 2010.05525, 2020
[70] Afarin M, Gao Chao, Rahman S, et al. CommonGraph: Graph analytics on evolving data[C]//Proc of the 28th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2023: 133–145
[71] Zou Lei, Zhang Fan, Lin Yinnian, et al. An efficient data structure for dynamic graph on GPUS[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(11): 11051-11066
[72] Gonzalez J E, Low Yucheng, Gu Haijie, et al. PowerGraph: Distributed graph-parallel computation on natural graphs[C]//Proc of the 10th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 17–30
[73] Khorasani F, Vora K, Gupta R, et al. CuSha: Vertex-centric graph processing on GPUs[C]//Proc of the 23rd Int Symp on High-Performance Parallel and Distributed Computing. New York: ACM, 2014: 239–252
[74] Zhang Yu, Liao Xiaofei, Jin Hai, et al. DiGraph: An efficient path-based iterative directed graph processing system on multiple GPUs[C]//Proc of the 24th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2019: 601–614
[75] Zhang Yu, Peng Da, Liao Xiaofei, et al. LargeGraph: An efficient dependency-aware GPU-accelerated large-scale graph processing[J]. ACM Transactions on Architecture and Code Optimization. New York: ACM, 2021, 18(4): 1−24
[76] Roy A, Mihailovic I, Zwaenepoel W. X-Stream: Edge-centric graph processing using streaming partitions[C]//Proc of the 24th ACM Symp on Operating Systems Principles. New York: ACM, 2013: 472–488
[77] Lin Heng, Zhu Xiaowei, Yu Bowen, et al. ShenTu: Processing multi-trillion edge graphs on millions of cores in seconds[C]//Proc of the SC18: Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2018: 706−716
[78] Tian Yanyan, Balmin A, Corsten S A, et al. From "think like a vertex" to "think like a graph"[J]. Proceedings of the VLDB Endowment, 2013, 7((3): ): 193−204 doi: 10.14778/2732232.2732238
[79] Kang U, Tsourakakis C E, Faloutsos C. PEGASUS: A peta-scale graph mining system implementation and observations[C]//Proc of the 9th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2009: 229−238
[80] Vora K, Gupta R, Xu Guoqing. KickStarter: Fast and accurate computations on streaming graphs via trimmed approximations[C]//Proc of the 22d Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2017: 237–251
[81] Jiang Xiaolin, Xu Chengshuo, Yin Xizhe, et al. Tripoline: Generalized incremental graph processing via graph triangle inequality[C]//Proc of the 16th European Conf on Computer Systems. New York: ACM, 2021: 17–32
[82] Islam A a R, Dai Dong. DGAP: Efficient dynamic graph analysis on persistent memory[C]//Proc of the Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2023: 1−13
[83] Sengupta D, Sundaram N, Zhu Xia, et al. GraphIn: An online high performance incremental graph processing framework[C]//Proc of the 22nd Int European Conf on Parallel and Distributed Computing (Euro-Par 2016). Berlin: Springer, 2016: 319−333
[84] Zhu Huanzhou, He Ligang, Leeke M, et al. WolfGraph: The edge-centric graph processing on GPU[J]. Future Generation Computer Systems, 2020, 111: 552−569 doi: 10.1016/j.future.2019.09.052
[85] Sheng Feng, Cao Qiang, Yao Jie. Exploiting buffered updates for fast streaming graph analysis[J]. IEEE Transactions on Computers, 2021, 70(2): 255−269 doi: 10.1109/TC.2020.2987571
[86] Feng Guanyu, Ma Zixuan, Li Daixuan, et al. RisGraph: A real-time streaming system for evolving graphs to support sub-millisecond per-update analysis at millions Ops/s[C]//Proc of the 2021 Int Conf on Management of Data. New York: ACM, 2021: 513–527
[87] Cheng R, Hong Ji, Kyrola A, et al. Kineograph: Taking the pulse of a fast-changing and connected world[C]//Proc of the 7th ACM European Conf on Computer Systems. New York: ACM, 2012: 85–98
[88] Ju Wuyang, Li Jianxin, Yu Weiren, et al. iGraph: An incremental data processing system for dynamic graph[J]. Frontiers of Computer Science, 2016, 10: 462−476 doi: 10.1007/s11704-016-5485-7
[89] Jaiyeoba W, Skadron K. Graphtinker: A high performance data structure for dynamic graph processing[C]//Proc of the 2019 IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2019: 1030−1041
[90] Zhang Yuan, Cao Huawei, Liang Yan, et al. FSGraph: Fast and scalable implementation of graph traversal on GPUs[J]. CCF Transactions on High Performance Computing, 2023, 5(3): 277−291 doi: 10.1007/s42514-023-00155-x
[91] Sengupta D, Song S L, Agarwal K, et al. GraphReduce: Processing large-scale graphs on accelerator-based systems[C]//Proc of the Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2015: 1−12
[92] Xue Rui, Han Haoyu, Zhao Tong, et al. Large-scale graph neural networks: The past and new frontiers[C]//Proc of the 29th ACM SIGKDD Conf on Knowledge Discovery and Data Mining. New York: ACM, 2023: 5835–5836
[93] Ma Lingxiao, Yang Zhi, Miao Youshan, et al. Neugraph: Parallel deep neural network computation on large graphs[C]//Proc of the 2019 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2019: 443–457
[94] Mariappan M, Che J, Vora K. DZiG: Sparsity-aware incremental processing of streaming graphs[C]//Proc of the 16th European Conf on Computer Systems. New York: ACM, 2021: 83–98
[95] Gong Shufeng, Tian Chao, Yin Qiang, et al. Automating incremental graph processing with flexible memoization[J]. Proceedings of the VLDB Endowment, 2021, 14(9): 1613−1625 doi: 10.14778/3461535.3461550
[96] Yu Song, Gong Shufeng, Zhang Yanfeng, et al. Layph: Making change propagation constraint in incremental graph processing by layering graph [J]. arXiv preprint, arXiv: 2304.07458, 2023
[97] Jiang Zihan, Mao Fubing, Guo Yapu, et al. ACGraph: Accelerating streaming graph processing via dependence hierarchy[C]//Proc of the 2023 60th ACM/IEEE Design Automation Conf (DAC). Piscataway, NJ: IEEE, 2023: 1−6
[98] Xu Xianghao, Jiang Hong, Wang Fang, et al. GraphSD: A state and dependency aware out-of-core graph processing system[C]//Proc of the 51st Int Conf on Parallel Processing. New York: ACM, 2023: 1−11
[99] Gera P, Kim H, Sao P, et al. Traversing large graphs on GPUs with unified memory[J]. Proceedings of the VLDB Endowment, 2020, 13(7): 1119−1133 doi: 10.14778/3384345.3384358
[100] Dhulipala L, Blelloch G E, Shun Julian. Low-latency graph streaming using compressed purely-functional trees[C]//Proc of the 40th ACM SIGPLAN Conf on Programming Language Design and Implementation. New York: ACM, 2019: 918–934
[101] Zhu Xiaowei, Feng Guanyu, Serafini M, et al. LiveGraph: A transactional graph storage system with purely sequential adjacency list scans[J]. Proceedings of the VLDB Endowment, 2020, 13(7): 1020−1034 doi: 10.14778/3384345.3384351
[102] Kim J, Dally W J, Scott S, et al. Technology-driven, highly-scalable dragonfly topology[C]//Proc of 2008 Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2008: 77−88
[103] Min S W, Mailthody V S, Qureshi Z, et al. EMOGI: Efficient memory-access for out-of-memory graph-traversal in GPUs[J]. Proceedings of the VLDB Endowment, 2020, 14(2): 114−127 doi: 10.14778/3425879.3425883
[104] Zhou Shijie, Kannan R, Prasanna V K, et al. HitGraph: High-throughput graph processing framework on FPGA[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(10): 2249−2264 doi: 10.1109/TPDS.2019.2910068
[105] Chen Xinyu, Tan Hongshi, Chen Yao, et al. ThunderGP: HLS-based graph processing framework on FPGAs[C]//Proc of 2021 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2021: 69–80
[106] Dai Guohao, Huang Tianhao, Wang Yu, et al. NewGraph: Balanced large-scale graph processing on FPGAs with low preprocessing overheads[C]//Proc of 2018 IEEE 26th Annual Int Symp on Field-Programmable Custom Computing Machines (FCCM). Piscataway, NJ: IEEE, 2018: 208−208
[107] Dai Guohao, Huang Tianhao, Chi Yuze, et al. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture[C]//Proc of 2017 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2017: 217–226
[108] Chen Xinyu, Chen Yao, Cheng Feng, et al. ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2022: 1342−1358
[109] Wang Qinggang, Zheng Long, Huang Yu, et al. GraSU: A fast graph update library for FPGA-based dynamic graph processing[C]//Proc of 2021 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2021: 149–159
[110] Ham T J, Wu Lisa, Sundaram N, et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics[C]//Proc of the 49th Annual IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2016: 1−13
[111] Yan Mingyu, Hu Xing, Li Shuangchen, et al. Alleviating irregularity in graph analytics acceleration: A hardware/software co-design approach[C]//Proc of the 52nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2019: 615–628
[112] Dadu V, Liu Sihao, Nowatzki T. PolyGraph: Exposing the value of flexibility for graph processing accelerators[C]//Proc of the 48th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2021: 595–608
[113] Yang Yifan, Li Zhaoshi, Deng Yangdong, et al. GraphABCD: Scaling out graph analytics with asynchronous block coordinate descent[C]//Proc of the ACM/IEEE 47th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2020: 419–432
[114] Rahman S, Abu-Ghazaleh N, Gupta R. GraphPulse: An event-driven hardware accelerator for asynchronous graph processing[C]//Proc of the 53rd Annual IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2020: 908−921
[115] Rahman S, Afarin M, Abu-Ghazaleh N, et al. JetStream: Graph analytics on streaming data with event-driven hardware accelerator[C]//Proc of the 54th Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2021: 1091–1105
[116] Zhang Yu, Liao Xiaofei, Jin Hai, et al. DepGraph: A dependency-driven accelerator for efficient iterative graph processing[C]//Proc of the 2021 IEEE Int Symp on High-Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2021: 371−384
[117] Zhang Kaiyuan, Chen Rong, Chen Haibo. NUMA-aware graph-structured analytics[C]//Proc of the 20th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2015: 183–193
[118] Zhang Yunming, Kiriansky V, Mendis C, et al. Making caches work for graph analytics[C]//Proc of the 2017 IEEE Int Conf on Big Data (Big Data). Piscataway, NJ: IEEE, 2017: 293−302
[119] Shun Julian, Dhulipala L, Blelloch G E. Smaller and faster: Parallel processing of compressed graphs with Ligra+[C]//Proc of the 2015 Data Compression Conf. Piscataway, NJ: IEEE, 2015: 403−412
[120] Vora K. LUMOS: Dependency-driven disk-based graph processing[C]//Proc of the 2019 USENIX Conf on USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2019: 429–442
[121] Cheng Jiefeng, Liu Qin, Li Zhenguo, et al. VENUS: Vertex-centric streamlined graph computation on a single PC[C] //Proc of the 2015 IEEE 31st Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2015: 1131−1142
[122] Ai Zhiyuan, Zhang Mingxing, Wu Yongwei, et al. Squeezing out all the value of loaded data: An out-of-core graph processing system with reduced disk I/O[C]//Proc of the 2017 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2017: 125–137
[123] Matam K K, Hashemi H, Annavaram M. MultiLogVC: Efficient out-of-core graph processing framework for flash storage[C]//Proc of the 2021 IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2021: 245−255
[124] Chen Rong, Shi Jiaxin, Chen Yanzhe, et al. PowerLyra: Differentiated graph computation and partitioning on skewed graphs[J]. ACM Transactions on Parallel Computing, 2019, 5(3): 1−39
[125] Zhong Jianlong, He Bingsheng. Medusa: Simplified graph processing on GPUs[J]. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(6): 1543−1552 doi: 10.1109/TPDS.2013.111
[126] Meng Ke, Li Jiajia, Tan Guangming, et al. A pattern based algorithmic autotuner for graph processing on GPUs[C]//Proc of the 24th Symp on Principles and Practice of Parallel Programming. New York: ACM, 2019: 201–213
[127] Pai S, Pingali K. A compiler for throughput optimization of graph algorithms on GPUs[C]//Proc of the 2016 ACM SIGPLAN Int Conf on Object-Oriented Programming, Systems, Languages, and Applications. New York: ACM, 2016: 1–19
[128] Gharaibeh A, Santos-Neto E, Costa L, et al. Efficient large-scale graph processing on hybrid CPU and GPU systems [J]. arXiv preprint, arXiv: 1312.3018, 2013
[129] Gharaibeh A, Costa L B, Santos-Neto E, et al. A yoke of oxen and a thousand chickens for heavy lifting graph processing[C]//Proc of the 21st Int Conf on Parallel Architectures and Compilation Techniques. New York: ACM, 2012: 345–354
[130] Han Wei, Mawhirter D, Wu Bo, et al. Graphie: Large-scale asynchronous graph traversals on just a GPU[C]//Proc of 2017 26th Int Conf on Parallel Architectures and Compilation Techniques (PACT). Piscataway, NJ: IEEE, 2017: 233−245
[131] Kim M S, An K, Park H, et al. GTS: A fast and scalable graph processing method based on streaming topology to GPUs[C]//Proc of the 2016 Int Conf on Management of Data. New York: ACM, 2016: 447–461
[132] Ma Lingxiao, Yang Zhi, Chen Han, et al. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication[C]//Proc of the 2017 USENIX Conf on Usenix Annual Technical Conf. Berkeley, CA: USENIX Association, 2017: 195–207
[133] Wang Pengyu, Wang Jing, Li Chao, et al. Grus: Toward unified-memory-efficient high-performance graph processing on GPU[J]. ACM Transactions on Architecture and Code Optimization, 2021, 18(2): 1−25
[134] Jia Zhihao, Kwon Y, Shipman G, et al. A distributed multi-GPU system for fast graph processing[J]. Proceedings of the VLDB Endowment, 2017, 11(3): 297−310 doi: 10.14778/3157794.3157799
[135] Dathathri R, Gill G, Hoang L, et al. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics[C]//Proc of the 39th ACM SIGPLAN Conf on Programming Language Design and Implementation. New York: ACM, 2018: 752–768
[136] Ji Yuede, Liu Hang, Huang H H. SWARMGRAPH: Analyzing large-scale in-memory graphs on GPUs[C]//Proc of the 2020 IEEE 22nd Int Conf on High Performance Computing and Communications; IEEE 18th Int Conf on Smart City; IEEE 6th Int Conf on Data Science and Systems (HPCC/SmartCity/DSS). Piscataway, NJ: IEEE, 2020: 52−59
[137] Brahmakshatriya A, Zhang Yunming, Hong Changwan, et al. Compiling graph applications for GPUs with GraphIt[C]//Proc of the 2021 IEEE/ACM Int Symp on Code Generation and Optimization (CGO). Piscataway, NJ: IEEE, 2021: 248−261
[138] Yuan Pingpeng, Xie Changfeng, Liu Ling, et al. PathGraph: A path centric graph processing system[J]. IEEE Transactions on Parallel and Distributed Systems. Piscataway, NJ: IEEE, 2016, 27(10): 2998−3012 doi: 10.1109/TPDS.2016.2518664
[139] Chi Yuze, Dai Guohao, Wang Yu, et al. NXGraph: An efficient graph processing system on a single machine[C]//Proc of the 2016 IEEE 32nd Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2016: 409−420
[140] Xu Xianghao, Wang Fang, Jiang Hong, et al. A hybrid update strategy for I/O-efficient out-of-core graph processing[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(8): 1767−1782 doi: 10.1109/TPDS.2020.2973143
[141] Zhang Mingxing, Wu Yongwei, Zhuo Youwei, et al. Wonderland: A novel abstraction-based out-of-core graph processing system[C]//Proc of the 23rd Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2018: 608–621
[142] Zheng Long, Li Xianliang, Zheng Yaohui, et al. Scaph: Scalable GPU-accelerated graph processing with value-driven differential scheduling[C]//Proc of the 2020 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2020: 573−588
[143] Tang Ruiqi, Zhao Ziyi, Wang Kailun, et al. Ascetic: Enhancing cross-iterations data efficiency in out-of-memory graph processing on GPUs[C]//Proc of the 50th Int Conf on Parallel Processing. New York: ACM, 2021: 1−10
[144] Vora K, Xu Guoqing, Gupta R. Load the edges you need: a generic I/O optimization for disk-based graph processing[C]//Proc of the 2016 USENIX Conf on Usenix Annual Technical Conf. Berkeley, CA: USENIX Association, 2016: 507–522
[145] Pan Zhenxuan, Wu Tao, Zhao Qingwen, et al. GeaFlow: A graph extended and accelerated dataflow system[C]//Proc of the ACM on Management of Data. New York: ACM, 2023, 1(2): 1−27
[146] Ediger D, Mccoll R, Riedy J, et al. STINGER: High performance data structure for streaming graphs[C]//Proc of the 2012 IEEE Conf on High Performance Extreme Computing. Piscataway, NJ: IEEE, 2012: 1−5
[147] Macko P, Marathe V J, Margo D W, et al. LLAMA: Efficient graph analytics using large multiversioned arrays[C]//Proc of the 2015 IEEE 31st Int Conf on Data Engineering. Piscataway, NJ: IEEE, 2015: 363−374
[148] Leo D D, Boncz P. Teseo and the analysis of structural dynamic graphs[J]. Proceedings of the VLDB Endowment, 2021, 14(6): 1053−1066 doi: 10.14778/3447689.3447708
[149] Basak A, Lin Jilan, Lorica R, et al. SAGA-Bench: Software and hardware characterization of streaming graph analytics workloads[C]//Proc of the 2020 IEEE Int Symp on Performance Analysis of Systems and Software (ISPASS). Piscataway, NJ: IEEE, 2020: 12−23
[150] Islam A A R, Dai Dong, Cheng Dazhao. VCSR: Mutable CSR graph format using vertex-centric packed memory array[C]//Proc of the 2022 22nd IEEE Int Symp on Cluster, Cloud and Internet Computing (CCGrid). Piscataway, NJ: IEEE, 2022: 71−80
[151] Wang Rui, He Shuibing, Zong Weixu, et al. XPGraph: XPline-friendly persistent memory graph stores for large-scale evolving graphs[C]//Proc of the 2022 55th IEEE/ACM Int Symp on Microarchitecture (MICRO). Piscataway, NJ: IEEE, 2022: 1308−1325
[152] Chen Hongtao, Zhang Mingxing, Yang Ke, et al. Achieving sub-second pairwise query over evolving graphs[C]//Proc of the 28th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2023: 1−15
[153] Shen Sijie, Yao Zihang, Shi Lin, et al. Bridging the gap between relational OLTP and graph-based OLAP[C]//Proc of the 2023 USENIX Annual Technical Conf (USENIX ATC 23). Berkeley, CA: USENIX Association, 2023: 181−196
[154] Vaziri P, Vora K. Controlling memory footprint of stateful streaming graph processing[C]//Proc of the 2021 USENIX Annual Technical Conf (USENIX ATC 21). Berkeley, CA: USENIX Association, 2021: 269−283
[155] 张承龙,曹华伟,王国波,等. 面向高通量计算机的图算法优化技术[J]. 计算机研究与发展,2020,57(6):1152−1163 doi: 10.7544/issn1000-1239.2020.20200115 Zhang Chenglong, Cao Huawei, Wang Guobo, et al. Efficient optimization of graph computing on high-throughput computer[J]. Journal of Computer Research and Development, 2020, 57(6): 1152−1163 (in Chinese) doi: 10.7544/issn1000-1239.2020.20200115
[156] Sha Mo, Li Yuchen, Tan K L. GPU-based graph traversal on compressed graphs[C]//Proc of the 2019 Int Conf on Management of Data. New York: ACM, 2019: 775–792
[157] Chen Zheng, Zhang Feng, Guan Jiawei, et al. CompressGraph: Efficient parallel graph analytics with rule-based compression [J] //Proc of the ACM on Management of Data. New York: ACM, 2023, 1: 1−31
[158] Lee E, Kim J, Lim K, et al. Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph engines[C]//Proc of the 2019 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2019: 459–473
[159] Yang T Y, Liang Yuhong, Yang M C. Practicably boosting the processing performance of BFS-like algorithms on semi-external graph system via I/O-efficient graph ordering[C]//Proc of the 20th USENIX Conf on File and Storage Technologies (FAST 22). Berkeley, CA: USENIX Association, 2022: 381−396
[160] Wei Hao, Yu J X, Lu Can, et al. Speedup graph processing by graph ordering[C]//Proc of the 2016 Int Conf on Management of Data. New York: ACM, 2016: 1813–1828
[161] Balaji V, Lucia B. When is graph reordering an optimization? studying the effect of lightweight graph reordering across applications and input graphs[C]//Proc of the 2018 IEEE Int Symp on Workload Characterization (IISWC). Piscataway, NJ: IEEE, 2018: 203−214
[162] Guan Nan, Stigge M, Yi Wang, et al. Cache-aware scheduling and analysis for multicores[C]//Proc of the 7th ACM Int Conf on Embedded software. New York: ACM, 2009: 245–254
[163] Han W S, Lee S, Park K, et al. TurboGraph: A fast parallel graph engine handling billion-scale graphs in a single PC[C]//Proc of the 19th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2013: 77–85
[164] Lin Shuai, Wang Rui, Li Yongkun, et al. Towards fast large-scale graph analysis via two-dimensional balanced partitioning[C]//Proc of the 51st Int Conf on Parallel Processing. New York: ACM, 2023: 1−11
[165] Gong Shufeng, Zhang Yanfeng, Yu Ge. HBP: Hotness balanced partition for prioritized iterative graph computations[C]//Proc of the 2020 IEEE 36th Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2020: 1942−1945
[166] NVIDIA. Unified memory for cuda beginners [EB/OL]. [2024−01−20].https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
[167] NVIDIAL. Nvlink and nvswitch [EB/OL]. [2024−01−20].https://www.nvidia.com/en-us/data-center/nvlink/
[168] Lin Yen, Jeng J Y, Liu Y Y, et al. A review of PCI express protocol-based systems in response to 5G application demand[J]. Electronics, 2022, 11((5): ): 678 doi: 10.3390/electronics11050678
[169] Wang Jing, Li Chao, Liu Yibo, et al. Fargraph+: Excavating the parallelism of graph processing workload on RDMA-based far memory system[J]. Journal of Parallel and Distributed Computing, 2023, 177: 144−159 doi: 10.1016/j.jpdc.2023.02.015
[170] Wang Jing, Li Chao, Wang Taolei, et al. Excavating the potential of graph workload on RDMA-based far memory architecture[C]//Proc of the 2022 IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2022: 1029−1039
[171] 崔鹏杰,袁野,李岑浩,等. RGraph:基于RDMA的高效分布式图数据处理系统[J]. 软件学报,2022,33(3):1018−1042 Cui Pengjie, Yuan Ye, Li Cenhao, et al. RGraph: Effective distributed graph data processing system based on RDMA[J]. Journal of Software, 2022, 33(3): 1018−1042 (in Chinese)
[172] Brock B, Buluç A, Yelick K J a P A. RDMA-based algorithms for sparse matrix multiplication on GPUs [J]. arXiv preprint, arXiv: 2311.18141, 2023
[173] Kwak H, Lee C, Park H, et al. What is Twitter, a social network or a news media?[C]//Proc of the 19th Int Conf on World Wide Web. New York: ACM, 2010: 591–600
[174] Boldi P, Santini M, Vigna S. Laboratory for web algorithmics [EB/OL]. [2024−01−20]. http://law.di.unimi.it/
[175] Rossi R, Ahmed N. The network data repository with interactive graph analytics and visualization [J] //Proc of the AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2015, 29: 1−2
[176] Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth[C]//Proc of the ACM SIGKDD Workshop on Mining Data Semantics. New York: ACM, 2012: 1−8
[177] Leskovec J, Sosič R. Snap: A general-purpose network analysis and graph-mining library[J]. ACM Transactions on Intelligent Systems and Technology, 2016, 8(1): 1−20
[178] Boldi P, Vigna S. The webgraph framework I: Compression techniques[C]//Proc of the 13th Int Conf on World Wide Web. New York: ACM, 2004: 595–602
[179] Boldi P, Rosa M, Santini M, et al. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks[C]//Proc of the 20th Int Conf on World Wide Web. New York: ACM, 2011: 587–596
-
期刊类型引用(1)
1. 程知,何立新,胡春玲,张新,张琛. 基于SystemC的计算机组成与结构课程总线仿真方法. 电脑知识与技术. 2025(03): 131-133+138 . 百度学术
其他类型引用(0)