面向处理器微架构设计空间探索的加速方法综述

王铎; 刘景磊; 严明玉; 滕亦涵; 韩登科; 叶笑春; 范东睿

doi:10.7544/issn1000-1239.202330348

面向处理器微架构设计空间探索的加速方法综述

王铎^{1, 2,},
刘景磊³,
严明玉^{1, 2, ,},
滕亦涵^{1, 2},
韩登科^{1, 2},
叶笑春^{1, 2},
范东睿^{1, 2}

1.
处理器芯片全国重点实验室（中国科学院计算技术研究所）　北京　100190
2.
中国科学院大学计算机科学与技术学院　北京　100049
3.
中国移动研究院　北京　100053

基金项目: 国家自然科学基金项目（62202451）；中国科学院国际伙伴计划项目（171111KYSB20200002）；中国科学院稳定支持基础研究领域青年团队计划项目（YSBR-029）；中国科学院青年创新促进会项目（Y2021039）；中科院计算所-中国移动研究院联合创新平台项目

详细信息

作者简介:
王铎: 1995年生. 博士. CCF学生会员. 主要研究方向为处理器设计空间探索、计算机体系结构

刘景磊: 1982年生. 硕士，高级工程师. 主要研究方向为算力网络、计算机体系结构

严明玉: 1990年生. 博士，副研究员. CCF会员. 主要研究方向为基于图的硬件加速器、数据流架构

滕亦涵: 2000年生. 硕士. 主要研究方向为基于图的硬件加速器和高吞吐量计算体系结构

韩登科: 1998年生. 硕士研究生. 主要研究方向为基于图的硬件加速器和高吞吐量计算体系结构

叶笑春: 1981年生. 博士，研究员. CCF会员. 主要研究方向为软件仿真、算法并行优化、高性能计算机架构

范东睿: 1979年生. 博士，研究员. CCF杰出会员. 主要研究方向为众核处理器设计、高通量处理器设计、低功耗微架构

通讯作者:
严明玉（yanmingyu@ict.ac.cn）

中图分类号: TP302
计量
- 文章访问数: 288
- HTML全文浏览量: 94
- PDF下载量: 141
出版历程
- 收稿日期: 2023-05-06
- 修回日期: 2024-01-14
- 网络出版日期: 2024-11-12
- 刊出日期: 2024-12-31

Acceleration Methods for Processor Microarchitecture Design Space Exploration: A Survey

Wang Duo^{1, 2,},
Liu Jinglei³,
Yan Mingyu^{1, 2, ,},
Teng Yihan^{1, 2},
Han Dengke^{1, 2},
Ye Xiaochun^{1, 2},
Fan Dongrui^{1, 2}

1.
State Key Lab of Processors (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190
2.
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049
3.
China Mobile Communications Research Institute, Beijing 100053

Funds: This work was supported by the National Natural Science Foundation of China (62202451), the International Partnership Program of Chinese Academy of Sciences (171111KYSB20200002), the CAS Project for Young Scientists in Basic Research (YSBR-029), the CAS Project for Youth Innovation Promotion Association (Y2021039), and the Institute of Computing Technology, Chinese Academy of Sciences-China Mobile Communications Group Co., Ltd. Joint Institute.

More Information

Author Bio:
Wang Duo: born in 1995. PhD. Student member of CCF. His main research interests include processor design space exploration and computer architecture

Liu Jinglei: born in 1982. Master, senior engineer. His main research interests include computility network and computer architecture

Yan Mingyu: born in 1990. PhD, associate professor. Member of CCF. His main research interest includes graph based hardware accelerator and dataflow architecture

Teng Yihan: born in 2000. Master. His main research interests include graph-based hardware accelerator and high-throughput computer architecture

Han Dengke: born in 1998. Master candidate. His main research interest includes graph-based hardware accelerator and high-throughput computer architecture

Ye Xiaochun: born in 1981. PhD, professor. Member of CCF. His main research interests include software simulation, algorithm paralleling and optimizing, and architecture for high performance computer

Fan Dongrui: born in 1979. PhD, professor. Distinguished member of CCF. His main research interests include manycore processor design, high throughput processor design, and low power microarchitecture

摘要

摘要:
中央处理器是目前最重要的算力基础设施. 为了最大化收益，架构师在设计处理器微架构时需要权衡性能、功耗、面积等多个目标. 但处理器运行负载的指令多，单个微架构设计点的评估耗时从10 min到数十小时不等. 加之微架构设计空间巨大，全设计空间暴力搜索难以实现. 近些年来许多机器学习辅助的设计空间探索加速方法被提出，以减少需要探索的设计空间或加速设计点的评估，但缺少对加速方法的全面调研和系统分类的综述. 对处理器微架构设计空间探索的加速方法进行系统总结及分类，包含软件设计空间的负载选择、负载指令的部分模拟、设计点选择、模拟工具、性能模型5类加速方法. 对比了各加速方法内文献的异同，覆盖了从软件选择到硬件设计的完整探索流程. 最后对该领域的前沿研究方向进行了总结，并放眼于未来的发展趋势.
- 处理器微架构设计 /
- 设计空间探索 /
- 性能模型 /
- 负载选择 /
- 软件模拟
Abstract:
Central processing unit is the most important computing infrastructure nowadays. To maximize the profit, architects design the processor microarchitecture by trading-off multiple objectives including performance, power, and area. However, because of the tremendous instructions of workloads running on the processors, the evaluation of individual microarchitecture design point costs minutes to hours. Furthermore, the design space of the microarchitecture is huge, which results that the exploration of comprehensive design space is unrealistic. Therefore, many machine-learning-assisted design space exploration acceleration methods are proposed to reduce the size of evaluated design space or accelerate the evaluation of a design point. However, a comprehensive survey summarizing and systematically classifying recent acceleration methods is missing. This survey paper systematically summarizes and classifies the five kinds of acceleration methods for the design space exploration of the processor microarchitecture, including the workload selection of software design space, the partial simulation of workload instructions, the design point selection, the simulation tools, and the performance models. This paper systematically compares the similarities and differences between papers in the acceleration methods, and covers the complete exploration process from the software workload selection to the hardware microarchitecture design. Finally, the research direction is summarized, and the future development trend is discussed.
- processor microarchitecture design /
- design space exploration /
- performance model /
- workload selection /
- software simulation

HTML全文

近些年来，流数据在网络安全、智慧城市、气象预测等多个领域大量涌现. 流数据作为一种重要的数据类型，具有持续产生、实时性强、规模巨大且数据分布动态变化等复杂特性，这给流数据挖掘任务带来了极大挑战^[1-5]. 概念漂移是指随着时间推移或数据分布发生变化，样本的输入特征和输出标签之间的关系也发生改变的现象^[6-9]. 此时集成模型由于没有及时学习到新的数据分布特征从而导致性能会下降.

基于集成学习的方法^[10-12]利用历史数据构建基学习器，并借助特定的投票机制（如加权平均、组合投票等）进行集成决策，以此得到比单一基学习器更好的效果，解决了单一基学习器在流数据挖掘中不能把握全局信息的问题，因此利用集成学习处理概念漂移是一种有效可行的手段. 然而，传统集成学习方法在漂移发生后不能对新数据分布及时做出响应，且通常认为历史数据不再适用，如果这些数据中含有对当前模型学习有帮助的样本知识，直接丢弃则会造成已有资源的浪费. 此外，流数据分布变化方式的多样性易产生不同类型的概念漂移（如突变型和渐变型），不同类型漂移的数据分布变化跨度、变化快慢、变化方式等都不相同^[13]，然而多数在线集成模型只关注单一类型，不能针对漂移类型进行自适应建模.

为解决上述问题，本文提出一种面向不同类型概念漂移的两阶段自适应集成学习方法（two-stage adaptive ensemble learning method for different types of concept drift，TAEL）. 该方法从解决不同类型的概念漂移问题入手，检测漂移跨度以确定漂移类型，并构建了针对类型的“过滤-扩充”两阶段样本处理机制. 一方面在样本过滤过程中，根据漂移类型创建非关键样本过滤器，过滤掉历史样本中的非关键因素，保证剩余的历史关键样本块的数据分布更加接近当前数据分布；另一方面在样本扩充过程中，根据漂移类型确定合适的抽样规模，由当前数据块中各个类别的规模占比设置历史关键样本的抽样优先级，并确定抽样概率，按照抽样概率进行分块优先抽样，以扩充当前样本块，为当前样本块补充样本特征的同时缓解了块内类分布不平衡. 本文工作的主要贡献有3方面：

1）通过检测漂移跨度确定概念漂移类型，为不同类型漂移的自适应集成建模提供了一种可行方案；

2）通过对历史数据中非关键样本的过滤，使更新后的历史数据分布更接近最新数据分布，提高了历史基学习器的有效性；

3）通过对当前数据的扩充，缓解了当前基学习器的欠拟合问题，提高了基学习器的稳定性.

1. 相关工作

目前，对含概念漂移的流数据挖掘的处理策略主要包括基于实例选择的方法和基于集成学习的方法. 基于实例选择的方法通常使用滑动窗口技术来实现，其基本思想是将数据流分成固定大小的窗口，通过窗口的向前滑动来实现对概念漂移的检测和处理. ADWIN^[14]通过计算子窗口之间的均值差异来判断是否发生了概念漂移. DDM^[15]通过持续监视窗口内的数据样本分类错误率来检测概念漂移. STEPD^[16]通过比较最近窗口和整个窗口来检测错误率变化. DWCDS^[17]提出一种双窗口机制来周期性地检测概念漂移，并对模型进行动态更新以适应概念漂移. CD-TW^[18]首先创建2个分别加载历史数据和当前数据的基础节点时序窗口，通过比较二者包含数据的分布变化情况来检测概念漂移. CDT_MSW^[19]由单个基本滑动窗口和单个基本静态窗口来检测概念漂移.

使用集成学习处理含概念漂移流数据的研究已经取得了很多成果和进展，基于集成学习的方法大体可分为2类：在线集成和基于数据块的集成.

在线集成是一种对样本进行逐一处理的增量学习方法. 基于单样本的增量模型方法^[20]首先初始化一组基分类器，使用每个时间戳下到达的单个样本更新集成模型，然后对基分类器进行加权组合. DOED^[21]通过维护低多样性和高多样性的在线加权集成，从而准确地处理各种类型的漂移. 基于混合标记策略的在线主动学习集成框架^[22]由一个长期固定分类器和多个动态分类器组成来适应概念漂移. CBCE^[23]为每个类维护一个基学习器，并在有新样本时更新基学习器. 在线集成学习方法能够有效提高模型的实时泛化性能，但由于需要逐一处理样本，增加了计算资源，易导致学习效率较低.

基于数据块的集成是一种对固定数量的输入实例进行处理的方法. SEA^[24]在连续的数据块上构建基分类器，并且使用启发式替换策略组合成固定大小的集成模型. DWMIL^[25]，ACDWM^[26]为每个数据块创建一个基学习器，通过根据基学习器在当前数据块上的分类性能进行动态加权集成. SRE^[27]在基于块的框架中保留一部分先前少数样本以平衡当前块的类分布. DUE^[28]为每个数据块创建若干候选分类器，对其进行分段加权，并通过动态调整分类器权重来解决概念漂移问题. SEOA^[29]将神经网络的不同层次作为基分类器进行集成，根据各基分类器在当前数据块上的决策损失进行动态加权，以实现稳定性与适应性的平衡. 然而，划分的数据块的大小通常会影响模型的性能和训练速度，因此，选择合适的数据块大小很重要.

与传统方法相比，本文提出的TAEL方法能够充分利用新旧样本信息，根据漂移类型针对性地采用两阶段样本处理机制更新历史样本块和当前样本块，实现了集成模型在概念漂移发生后对新数据分布的快速响应.

2. 面向不同类型概念漂移的两阶段自适应集成

本文提出的TAEL方法的模型总体结构如图1所示. 在漂移类型检测阶段，通过检测漂移跨度span确定漂移类型. 在两阶段自适应集成阶段，首先根据漂移类型创建非关键样本过滤器F，过滤掉历史样本集D上的非关键样本，然后对剩余的历史关键样本 $\hat D$ 进行分块优先抽样Sampling，根据漂移类型确定合适的抽样规模M，并根据样本所属类在当前样本块的规模占比设置抽样优先级α，由α获得抽样概率P，按照P抽取一定规模的关键样本子集 $\widetilde D$ 来扩充当前数据集D_t. 在更新后的历史样本集和当前样本集中训练得到具有更高有效性的基学习器，提升了集成模型的实时泛化性能.

图 1 TAEL模型总体结构图

Figure 1. The overall structure of the TAEL model

下载: 全尺寸图片幻灯片

2.1 漂移类型检测

流数据是指实时、连续、无限、随时间不断变化的数据序列，时刻t到达的样本由具有联合概率分布 ${P_t}({\boldsymbol {x}},y)$ 的数据源产生. 在流数据挖掘任务中，样本分布的不稳定和动态变化等因素导致流数据中隐含的目标概念发生改变，即概念漂移，其本质可看作流数据的联合概率分布发生变化：

${P_{t - 1}}({\boldsymbol{x}},y) \ne {P_t}({{\boldsymbol {x}}},y) .$

(1)

为了根据不同类型漂移有针对性地更新集成模型，首先在概念漂移位点处进行漂移类型检测. 本文通过计算span来检测漂移类型. span由漂移开始位点和漂移结束位点间相距的时间跨度确定. 本文判断漂移是否结束的依据是后序数据分布是否已经稳定. 已知漂移开始位点a，选取该位点后序的L个连续数据块 ${D_{{{a}} + 1}},{D_{{{a}} + 2}}, … ,{D_{a + L}}$ ，在这些数据块上训练得到基学习器 ${f_{{{a}} + 1}},{f_{{{a}} + 2}}, … ,{f_{a + L}}$ ，并得到在当前数据块上的实时预测精度 $ac{c_{{{a}} + 1}},ac{c_{{{a}} + 2}}, … , ac{c_{a + L}}$ . 计算实时预测精度的方差：

${s^2} = \frac{{\sum\limits_{l = 1}^L {{{(ac{c_{a + {{l}}}} - \overline {acc} )}^2}} }}{L} ,$

(2)

其中 $\overline {acc}$ 为实时预测精度的平均值. s²反映了L个基学习器的预测差异，同时反映出位点 $a$ 的后序数据分布的稳定程度. 若 ${s^2} < \delta$ （ $\delta$ 为漂移稳定性参数），则认为位点 $a + 1$ 为漂移结束位点， $span = 1$ ；若s²≥ $\delta$ ，则认为漂移仍未结束，接着从位点 $a + 1$ 开始继续上述操作，直到得到漂移结束位点b， $span = b - a$ ；若 $span > \theta$ （θ为漂移类型参数），则判定此次漂移为渐变型，否则判定此次漂移为突变型. 漂移类型检测过程如图2所示.

图 2 漂移类型检测过程

Figure 2. Drift type detection process

下载: 全尺寸图片幻灯片

2.2 “过滤-扩充”两阶段自适应集成学习

为充分利用当前漂移场景的样本信息和有选择地利用历史样本信息以提高集成模型在概念漂移发生后对新数据分布的适应性，本文提出“过滤-扩充”两阶段自适应集成学习方法. 过滤阶段通过过滤非关键样本以帮助历史样本块筛选接近当前数据分布的关键样本，扩充阶段通过向当前数据集补充过滤后保留的历史关键样本以弥补其缺少的样本特征.

2.2.1 样本过滤策略

由于历史样本中含有大量样本信息，而非关键样本会导致该数据集上模型的有效性降低. 为“筛掉”这些无用信息，提高样本质量，使历史数据分布更接近当前数据分布，本文提出一种样本过滤策略，通过创建非关键样本过滤器F过滤掉历史非关键样本. 考虑到不同类型的概念漂移场景下数据分布的变化方式和特点不同，因此需创建不同的F.

假设有历史数据块 $D = \{ {D_1},{D_2}, … ,{D_{{n}}}\}$ ，第i个数据块为 ${D_i} = \{ ({{\boldsymbol {x}}_{ij}},{y_{ij}})|j = 1,2, … ,k\}$ （k为数据块大小），由D_i训练得到历史基学习器f_i. 当前数据块 ${D_t} = \{ ({{\boldsymbol {x}}_{tj}},{y_{tj}})|j = 1,2, … ,k\}$ ，在D_t上训练得到当前基学习器f_t. 候选基学习器池Q用来存储参与集成的候选基学习器，最大容量s=15.

当发生突变型概念漂移时，数据分布急速变化，历史数据分布和当前数据分布差异较大，大量历史样本成为阻碍模型学习的负面因素，导致历史基学习器的性能快速下降. 由于当前基学习器f_t在最新数据块D_t上训练得到，反映了流数据的最新分布，因此，为了快速过滤掉历史非关键样本，本文针对这种类型的概念漂移采用一种直接式过滤器，将f_t作为每个历史数据块的非关键样本过滤器F，即 $F = {f_t}$ . 以f_t对D_i的预测观察结果作为样本过滤条件C_i，表达式为：

${C_{{i}}}:{y_{ij}} \ne F({{\boldsymbol {x}}_{ij}}) ,$

(3)

真实标签与f_t预测结果不同的样本将被直接过滤掉.

当发生渐变型概念漂移时，数据分布变化较缓慢，历史数据分布与当前数据分布虽有差异但仍相似，历史数据块中可能只有少量样本变得非关键，因此与突变型概念漂移的直接过滤方式不同，渐变型概念漂移采用一种叠加式过滤器，即通过历史数据块的后序基学习器和f_t的加权组合来叠加过滤效果，确保充分利用历史样本知识和当前样本知识帮助进行更加准确的过滤操作. 为了实现对样本知识的有效利用，首先需要区分每个基学习器的重要程度，本文将基学习器在D_t上的实时预测精度作为其权重. 在此基础上，D_i的叠加过滤器F_i为：

${F_{{i}}} = \sum\limits_{p = i + 1}^n {\frac{{{w_p}}}{{\sum\limits_{q = i + 1}^n {{w_q}} + {w_t}}}} {f_p} + \frac{{{w_t}}}{{\sum\limits_{q = i + 1}^n {{w_q}} + {w_t}}}{f_t} ,$

(4)

${w_g} =\frac{1}{k}{{\sum\limits_{j = 1}^k { \llbracket {{f_g}({{\boldsymbol {x}}_{tj}}) = {y_{tj}}} \rrbracket} }} , \quad g =1,2,…,n,$

(5)

${w_t} = \frac{1}{k}{{\sum\limits_{j = 1}^k {\llbracket {{f_t}({{\boldsymbol {x}}_{tj}}) = {y_{tj}}} \rrbracket} }},$

(6)

其中当 $\left[\kern-0.15em\left[ \cdot \right]\kern-0.15em\right]$ 中的条件成立时值为1，否则为0. 以F_i对历史样本的预测观察结果作为样本过滤条件，表达式为：

${C_i}:{y_{ij}} \ne {F_{\text{i}}}({{\boldsymbol {x}}_{ij}}) ,$

(7)

真实标签与F_i预测结果不同的样本将被过滤掉.

经过上述操作，符合过滤条件的样本被丢弃，剩下更符合当前数据分布的历史关键样本块 ${\hat D_1},{\hat D_2}, … ,{\hat D_{{n}}}$ . 由于在突变型概念漂移发生后，过滤的样本通常较多，训练样本不足易导致模型训练不充分，因此本文向过滤后的每个历史样本块中补充D_t. 最后，在更新后的历史关键样本块上训练得到 ${\hat f_1},{\hat f_2}, … ,{\hat f_{{n}}}$ ，提高了基学习器的有效性.

2.2.2 样本扩充策略

概念漂移发生后，当前基学习器往往欠拟合，而历史样本恰恰可以帮助当前样本集弥补其缺少的样本知识. 因此，本文提出一种样本扩充策略，将过滤后保留的历史关键样本块 ${\hat D_1},{\hat D_2}, … ,{\hat D_{{n}}}$ 用来扩充D_t. 然而，即使历史样本集已过滤掉部分样本，全部扩充到D_t所花费的时间代价仍较大，为解决这个问题，本文从各个历史数据块中抽取子集 ${\widetilde D_1},{\widetilde D_2}, … ,{\widetilde D_n}$ 用来扩充D_t. 由于扩充后的D_t可能存在类别不平衡，造成这种情况的原因有2种：一种原因是D_t本身就存在类别不平衡的问题，而抽取的样本子集没有改善甚至加重了这种不平衡；另一种原因是D_t本身类分布平衡，但扩充导致了类别不平衡. 因此本文从抽取方式入手，为了降低扩充后的D_t的类不平衡率，提出一种分块优先抽样方法，该方法根据样本所属类在D_t中总类别的规模占比确定抽样优先级α，由此计算得到抽样概率P，按照抽样概率P依次从各个历史关键样本块中不放回地抽取一定数量的关键样本子集用于扩充.

抽样规模的设置直接关系实验结果的好坏. 如果抽样规模太小，将会导致抽样样本不足以提供足够的关键信息；如果抽样规模太大，将会浪费时间和资源，从而降低效率. 由于突变型漂移前后数据分布的差异较大，历史关键样本往往较少，设置总抽样规模M为较小值；渐变型漂移前后数据分布间虽有差异但仍相似，历史关键样本往往较多，设置M为较大值. 因此，可将总抽样规模M和漂移跨度span联系起来，表达式为：

${{M = }}\lambda \times \frac{{span}}{{span + 1}} \times \sum\limits_{i = 1}^n {{z_i}} ,$

(8)

其中λ为样本规模因子，z_i为历史数据块 ${\hat D_i}$ 的大小. 在确定M后， ${\hat D_i}$ 的抽样规模M_i由其大小确定，同时为了保证有相对足够的采样样本，限制最小的块抽样规模，表达式为：

${M}_{{i}}\text=\mathrm{max}\left\{\lambda \times \frac{span}{span+1}\times {z}_{i}\text{，}\frac{1}{10n}{{\displaystyle \sum _{j=1}^{n}{z}_{j}}}\right \} .$

(9)

为了缓解D_t在扩充后的类别不平衡现象，每个样本被抽中的概率与其所属类在D_t中的规模占比密切相关，即越少的类被选中的概率越大，越多的类被选中的概率越小. 因此，为历史样本中类别规模占比较小的样本设置较高的优先级，为类别规模占比较大的样本设置较低的优先级. 如果判断x_ij所属类别为 ${{{c}}'}$ ，设置其抽样优先级为

${\alpha _{ij}} = \left\{ \begin{gathered} \ln \left(\frac{{\sum\limits_{c \in C} {\sum\limits_{x = 1}^k { \llbracket {{y_{tx}} = c} \rrbracket} } }}{{\sum\limits_{x = 1}^k { \llbracket {{y_{tx}} = {c{'}}} \rrbracket} }}\right),{c'} \in C{\text{且 }}\left| C \right|{\text{ > }}1, \\ \ln \left(\frac{{\sum\limits_{c \in C} {\sum\limits_{x = 1}^k { \llbracket {{y_{tx}} = c} \rrbracket} } }}{2}\right),{c'} \in C{\text{且}}\left| C \right| = 1, \\ \ln \left(\sum\limits_{c \in C} {\sum\limits_{x = 1}^k { \llbracket {{y_{tx}} = c} \rrbracket} } \right),{\text{ 其他}},{\text{ }} \\ \end{gathered} \right.$

(10)

其中C为当前样本块中出现的样本类别. 抽样优先级和抽样概率成正比，x_ij的抽样概率可表示为

${P_{ij}} = {{Pr}}(({{\boldsymbol x}_{ij}}{\text{,}}{{{y}}_{ij}}) \in {\widetilde D_i}|({{\boldsymbol x}_{ij}}{\text{,}}{{{y}}_{ij}}) \in {\hat D_i}) = \frac{{{\alpha _{ij}}}}{{\sum\limits_{p = 1}^{z_i} {{\alpha _{ip}}} }} .$

(11)

显然，当 ${\hat D_i}$ 中每个样本的抽样优先级相等时，有

${P_{ij}} = {{Pr}}(({{\boldsymbol x}_{ij}}{\text{,}}{{{y}}_{ij}}) \in {\widetilde D_i}|({{\boldsymbol x}_{ij}}{\text{,}}{{{y}}_{ij}}) \in {\hat D_i}) = \frac{1}{{{z_i}}} ,$

(12)

分块优先抽样过程变为简单随机抽样. 将历史数据块 ${\hat{D}}_{i}$ 的优先抽样函数表示为

${\widetilde D_{{i}}} = S{{ampling}}({\hat D_i},{M_i},{P_i}) .$

(13)

依次从 ${\hat D_1},{\hat D_2}, … ,{\hat D_{{n}}}$ 中抽取数量为 ${M_1},{M_2}, … ,{M_{{n}}}$ 的关键样本子集 ${\widetilde D_1},{\widetilde D_2}, … ,{\widetilde D_n}$ ，将关键样本子集扩充到D_t中，得到扩充后的 ${\hat D_t} = {\widetilde D_1} \cup {\widetilde D_2} \cup … \cup {\widetilde D_n} \cup {D_t}$ . 经过上述操作，向 ${\hat D_t}$ 中补充了历史有用信息并且使类分布更加均衡，在扩充后的 ${\hat D_t}$ 上训练得到的 ${\hat f_t}$ 具有更丰富的样本特征，解决了当前基学习器的欠拟合问题，同时提高了基学习器的稳定性. 突变型和渐变型场景下的两阶段自适应集成过程如图3所示.

图 3 两阶段自适应集成过程

Figure 3. Two-stage adaptive ensemble process

下载: 全尺寸图片幻灯片

在将 ${\hat f_t}$ 存储到Q前，需要判断Q是否达到最大容量s. 如果 $n \geqslant s$ ，那么用 ${\hat f_t}$ 替换掉在D_t上实时预测精度最小的历史基学习器：

${\hat f_t} \to \mathop{\arg{\max }}\limits_{{{\hat f}_i} \in Q} \sum\limits_{j = 1}^k { \llbracket {{{\hat f}_i}({{\boldsymbol x}_{tj}}) \ne {{{y}}_{tj}}} \rrbracket} .$

(14)

最终的强分类器H对于x的预测结果为多数更新后的基学习器预测的结果，即

$H({{\boldsymbol{x}}}) = \mathop {\arg \max }\limits_{ y} \sum\limits_{i = 1}^n { \llbracket {{f_i}({\boldsymbol x}) = y} \rrbracket} .$

(15)

2.3 算法实施流程

TAEL方法首先检测漂移跨度span，判断漂移类型，然后在过滤阶段，设置非关键样本过滤器F，依次对历史样本块进行过滤操作，将剩余的历史关键样本用于训练更新历史基学习器，以提高其有效性；在扩充阶段，采用分块优先抽取策略，根据样本所属类别的规模占比设置抽样优先级，计算得到抽样概率，从历史关键样本块中抽取合适数量的样本子集来扩充当前样本块，缓解了扩充后的类分布不均衡，解决了当前基学习器欠拟合的问题. 算法1展示了TAEL方法的执行流程.

算法1. 面向不同类型概念漂移的两阶段自适应集成算法.

输入：历史数据块 ${D_1},{D_2}, … ,{D_{{n}}}$ ，当前数据块D_t，漂移跨度span，历史基学习器 ${f_1},{f_2}, … ,{f_{{n}}}$ ，当前基学习器f_t，非关键样本过滤器F.

输出：更新后的基学习器 $\hat f_1,\hat f_2,…,\hat f_n,$ 和 $\hat f_t$ .

① 获取D_t上每个类别的样本数 $\displaystyle\sum\limits_{x = 1}^k { \llbracket {{y_{tx}} = c} \rrbracket }$ ；

② if $span \leqslant \theta$

③ ${F_i} = {f_t}$ ；

④ else

⑤ ${F_i} = \displaystyle\sum\limits_{p = i + 1}^n {\frac{{{w_p}}}{{\displaystyle\sum\limits_{q = i + 1}^n {{w_q} + {w_t}} }}{f_p}} + \frac{{{w_t}}}{{\displaystyle\sum\limits_{q = i + 1}^n {{w_q} + {w_t}} }}{f_t}$ ；

⑥ end if

⑦ for $i = 1:n$

⑧ for $j = 1:k$

⑨ if ${F_i}({{\boldsymbol {x}}_{ij}}) \ne {y_{ij}}$

⑩ 从D_i中删除样本x_ij；

⑪ else

⑫ 根据式（10）计算x_ij的抽样优先级α_ij；

⑬ end if

⑭ end for

⑮ 根据式（11）由抽样优先级α_i计算抽样概率P_i；

⑯ 得到过滤后的历史关键数据块 ${\hat D_i}$ ；

⑰ if $span \leqslant \theta$

⑱ 更新历史基学习器 ${\hat f_i} \leftarrow train({\hat D_i} \cup {D_t})$ ；

⑲ else

⑳ 更新历史基学习器 ${\hat f_i} \leftarrow train({\hat D_i})$ ；

㉑ end if

㉒ end for

㉓获取总抽样规模 $M = \lambda \times \dfrac{{span}}{{span + 1}} \times \displaystyle\sum\limits_{i = 1}^n {{z_i}}$ ；

㉔ for $i = 1:n$

㉕根据式（9）计算每个 ${\hat D_i}$ 上的抽样规模M_i；

㉖按照抽样概率P_i从 ${\hat D_i}$ 中抽取大小为M_i的 ${\widetilde D_i} = Sampling({\hat D_i},{M_i},{P_i})$ ；

㉗ $\mathrm{e}\mathrm{n}\mathrm{d}\;\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{f}\mathrm{o}\mathrm{r}$

㉘ ${\hat D_t} = {D_t} \cup {\widetilde D_1} \cup {\widetilde D_2} \cup … \cup {\widetilde D_n}$ ；

㉙更新当前基学习器 ${\hat f_t} \leftarrow train({\hat D_t})$ ；

㉚根据式（15）将最新更新的基学习器参与集成；

㉛在D_t+1上进行测试，得到实时精度.

2.4 模型复杂度分析

TAEL的计算成本主要集中在漂移类型检测、样本过滤、样本扩充和基学习器更新这4个阶段，本文将依次对每个阶段进行时间复杂度分析.

1）漂移类型检测. 预测一个数据块中样本的时间复杂度为O(k)，其中k为数据块的样本数，那么L个历史基学习器在当前数据块上的预测时间复杂度为O(Lk)，计算后序数据分布稳定程度的时间复杂度为O(L). 因此，漂移类型检测过程的时间复杂度为 $O(Lk) + O(L) = O(Lk)$ .

2）样本过滤. 训练一个SVM分类器的时间复杂度为O(p²)，其中p为样本数，因此，在大小为k的数据块上训练基学习器的时间复杂度为O(k²). 当发生突变型概念漂移时，使用直接式过滤器，当前基学习器对所有历史数据的预测时间复杂度为O(sk)，其中s为基学习器的最大存储容量. 该过程的时间复杂度为O(k²)(一般地， $s < k$ ).

当发生渐变型概念漂移时，采用叠加式过滤器，根据历史和最新基学习器在当前数据块上的预测结果得到权值的时间复杂度为O((s+1)k). 对所有历史数据块依次执行基学习器的加权预测，总的时间复杂度为O(s(s+1)k). 因此，样本过滤过程的时间复杂度为 $O({k^2}) + O((s + 1)k) + O(s(s + 1){{k}}) = O({s^2}k)$ .

3）样本扩充. 计算所有历史数据块的抽样规模M_i的时间复杂度为O(s). 计算每个历史数据块中样本的抽样优先级α的时间复杂度为O(sk)，计算每个样本的抽样概率P的时间复杂度为O(sk). 对于每个历史数据块，根据抽样概率P在当前数据块随机抽取M_i个样本的时间复杂度为O(sk). 该过程的时间复杂度为 $O(s) + 3O(sk) = O(sk)$ .

4）更新基学习器. 训练s+1个基学习器的时间复杂度为O((s+1)k²). 替换掉最差基学习器的时间复杂度O((s+1)k)，整个过程的时间复杂度为 $O((s + 1)({k^2} + k)) = O(s{k^2})$ .

3. 实验分析

为验证本文提出的TAEL方法的有效性，本文在具有不同类型概念漂移的标准数据集和真实数据集上进行实验，并从精度、鲁棒性以及收敛性这3个方面进行评价. 实验平台为Windows10操作系统，CPU为酷睿i7-3，2 GHz内核，内存为8 GB，本方法采用MATLAB R2018a编写和运行.

3.1 实验数据

为了检验方法对不同类型概念漂移的处理能力，本文使用大规模在线分析平台MOA^[30]中的流数据生成器产生了6个具有突变式、渐进式以及增量式的概念漂移数据集. 除此之外，本文还选取了4个真实数据集. 具体的数据集信息如表1所示.

表 1 数据集信息

Table 1. Datasets Information

分类	数据集	实例数	维度	类别数量	漂移类型	漂移数量	漂移位点
合成数据集	Sea	100×10³	3	2	渐进式	3	25×10³, 50×10³, 75×10³
	Hyperplane	100×10³	10	2	增量式	-	-
	RBFBlips	100×10³	20	4	突变式	3	25×10³, 50×10³, 75×10³
	LED_abrupt	100×10³	24	10	突变式	1	50×10³
	LED_gradual	100×10³	24	10	渐进式	3	25×10³, 50×10³, 75×10³
	Tree	100×10³	30	10	突变式	3	25×10³, 50×10³, 75×10³
真实数据集	Electricity	45.3×10³	6	2	-	-	-
	Kddcup99	494×10³	41	23	-	-	-
	Covertype	581×10³	54	7	-	-	-
	Weather	95.1×10³	9	3	-	-	-
注：“-”表示未知.

下载: 导出CSV

| 显示表格

3.2 评价指标

为衡量TAEL方法的性能，本节从模型的精度、鲁棒性及收敛性3方面进行了分析.

1) 平均实时精度（average real-time accuracy，Avgracc）表示模型在每个时间步的实时精度的平均值，反映模型的实时性能.

$Avgracc = \frac{1}{T}\sum\limits_{t = 1}^T {\frac{{{n_t}}}{{|{D_t}|}}} ,$

(16)

其中n_t代表时间步t内正确分类的样本数，|D_t|表示样本块大小，T表示总的时间步数. 平均实时精度越高说明模型分类性能越好.

2）累积精度（cumulative accuracy，Cumacc）表示模型在当前时刻的累积预测正确样本数和总样本数的比值，反映模型从开始到当前时刻的整体性能.

$Cumacc = \frac{{\sum\limits_{i = 1}^{{T_t}} {{n_i}} }}{{\sum\limits_{j = 1}^{{T_t}} {|{D_j}|} }} ,$

(17)

其中T_t表示当前累积的时间步数.

3）鲁棒性（robustness，R）^[31]表示模型的稳定性和泛化性能. 本文在平均实时精度上分析了不同方法的鲁棒性，定义为：

$R(Dataset) = \frac{{racc(Dataset)}}{{\min racc(Dataset)}} ,$

(18)

其中 ${{racc}}({{Dataset}})$ 表示某算法在数据集Dataset上的平均实时精度， $\min {{racc}}({{Dataset}})$ 表示在数据集Dataset上所有算法中的最小平均实时精度.

某算法的整体鲁棒性值为该算法在所有数据集上的鲁棒性的总和. 鲁棒性值越大说明算法越稳定，面对数据中存在的干扰也能保持较好的性能.

4）收敛速度（recovery speed under accuracy，RSA）表示模型从概念漂移位点起实时精度恢复到稳定所需要的时间步数step与收敛位点后K个位点平均错误率avge的乘积：

$RSA = step \times avge .$

(19)

如果一个位点的性能表现和其后续K个参照位点的平均性能表现的差异小于阈值 $\gamma$ （当前波动程度较小），同时K个参照位点的前半部分和后半部分的平均性能表现的差异小于 $\dfrac{\gamma }{2}$ （整体波动程度趋近于稳定），那么该位点为收敛位点：

$\begin{gathered} \left |ac{c_t} - \frac{{\sum\limits_{j = 1}^K {ac{c_{t + j}}} }}{K}\right| < \gamma {\text{ 且}} \\ \frac{2}{K}\left|\sum\limits_{j = 1}^{\tfrac{K}{2}} {ac{c_{t + j}}} - \sum\limits_{k = \tfrac{K}{2} + 1}^K {ac{c_{t + k}}} \right| < \frac{\gamma }{2}{\text{. }} \\ \end{gathered}$

(20)

3.3 参数设置

本节对实验模型中的相关参数进行4点讨论：

1）数据块大小k. 过大的数据块中可能包含概念漂移，从而影响模型的分类效果；过小的数据块中可能无法包含足够多的样本特征，从而导致训练的基学习器稳定性较差. 因此，本文统一设置 $k = 500$ .

2）漂移稳定性参数 $\delta$ 和漂移类型参数 $\theta$ . 考虑到流数据本身的复杂性以及概念漂移类型的多样性，本文设置 $\delta = 0.01$ ， $\theta = 1$ .

3）样本规模因子 $\lambda$ . 样本规模控制了整体抽样的数量，直接影响了当前基学习器的训练，从而可能会对整体的模型性能造成影响. 因此，本文选取 $\lambda \in \{ 0.2,0.4,0.6,0.8\}$ 进行讨论，得到了在不同 $\lambda$ 下的分类性能，并使用最优样本规模因子与对比方法进行比较.

4）基学习器f. 本文选择LIBSVM来构建“同质”基学习器，核参数采用默认值 $g = 1/v$ ，（v为数据特征维度），惩罚因子设置为 $C = 10$ .

3.4 实验结果与分析

为评估TAEL的性能，本文选取DWCDS^[17]，HBP^[32]，Resnet^[33]，Highway^[34]以及原始深度神经网络（DNN）在精度、鲁棒性和收敛性3个方面进行对比实验和结果分析.

3.4.1 模型精度结果和分析

本节首先分析了在不同样本规模因子 $\lambda$ 下集成模型的表现性能. 展示了TAEL方法在不同 $\lambda$ 下的平均实时精度. 从可以看出当 $\lambda = 0.4$ 和 $\lambda = 0.6$ 时的平均实时精度值较高，这也反映了 $\lambda$ 会在一定程度上影响当前基学习器的性能，进而影响整个集成模型的实时精度. 分析其原因可能是当 $\lambda$ 取值较大时扩充的历史样本数太多，此时的关键信息冗余，训练得到的基学习器效果较差；当 $\lambda$ 取值较小时扩充的样本数太少，可能丢弃潜在的可用数据，导致训练得到的基学习器处于欠拟合状态. 因此，本文选择适中的扩充规模，又因实验结果中 $\lambda = 0.4$ 时的平均实时精度大于 $\lambda = 0.6$ 时的平均实时精度，最终选择 $\lambda = 0.4$ 的情况下与其他方法进行对比分析.

表 2 不同λ下平均实时精度

Table 2. Average Real-Time Accuracy Under Different λ

数据集	平均实时精度（排名）
数据集	λ=0.2	λ=0.4	λ=0.6	λ=0.8
Sea	0.8389 (1)	0.8378 (3)	0.8378 (3)	0.8378 (3)
Hyperplane	0.9109 (4)	0.9110 (2.5)	0.9111 (1)	0.9110 (2.5)
RBFBlips	0.9549 (2.5)	0.9549 (2.5)	0.9549 (2.5)	0.9549 (2.5)
LED_abrupt	0.6228 (3)	0.6229 (2)	0.6229 (2)	0.6229 (2)
LED_gradual	0.6205 (3)	0.6226 (1)	0.6199 (4)	0.6213 (2)
Tree	0.6671 (1)	0.6669 (2)	0.6660 (3)	0.6656 (4)
Electricity	0.7205 (2)	0.7193 (3)	0.7211 (1)	0.7190 (4)
Kddcup99	0.9384 (4)	0.9449 (2)	0.9455 (1)	0.9448 (3)
Covertype	0.7520 (2)	0.7526 (1)	0.7517 (3.5)	0.7517 (3.5)
Weather	0.8969 (3.5)	0.8970 (1.5)	0.8970 (1.5)	0.8969 (3.5)
平均排名	2.6	2.05	2.25	3.0
注：黑体数字表示最高平均实时精度及其排名.

下载: 导出CSV

| 显示表格

表3展示了不同方法在所有数据集上的平均实时精度及其综合排名. 由表3看出，在合成数据集上，TAEL的实时精度最好；在真实数据集上，TAEL的实时精度排名也都位于前列. TAEL在真实数据集上排名较低的原因可能在于数据集中概念漂移的出现较为密集，而TAEL利用数据块进行处理的方式可能会漏检，导致无法对基学习器进行及时地更新，从而使整个集成模型的性能下降. 在整体排名上TAEL的排名最高，说明了该方法能够提高集成模型的有效性，有较好处理不同类型概念漂移的能力.

表 3 不同方法在各数据集上的平均实时精度

Table 3. Average Real-Time Accuracy of Different Methods on Each Dataset

数据集	平均实时精度（排名）
数据集	DWCDS	DNN-2	DNN-4	DNN-8	DNN-16	HBP	Highway	Resnet	TAEL
Sea	0.7499 (4)	0.7081 (9)	0.7155 (8)	0.7495 (5)	0.7441 (7)	0.7771 (2)	0.7684 (3)	0.7448 (6)	0.8378 (1)
Hyperplane	0.6812 (9)	0.8600 (5)	0.8578 (6)	0.8487 (7)	0.7227 (8)	0.8692 (3)	0.8841 (2)	0.8637 (4)	0.9110 (1)
RBFBlips	0.8214 (8)	0.8256 (7)	0.8716 (2)	0.8655 (3)	0.4718 (9)	0.8350 (5)	0.8482 (4)	0.8300 (6)	0.9549 (1)
LED_abrupt	0.3700 (8)	0.5868 (3)	0.5809 (4)	0.5311 (7)	0.2784 (9)	0.5692 (6)	0.5893 (2)	0.5796 (5)	0.6229 (1)
LED_gradual	0.3804 (8)	0.5773 (4)	0.5898 (2)	0.5350 (7)	0.3031 (9)	0.5650 (6)	0.5839 (3)	0.5700 (5)	0.6199 (1)
Tree	0.5558 (2)	0.1948 (6)	0.2057 (3)	0.1338 (8)	0.1141 (9)	0.1432 (7)	0.2036 (4)	0.1992 (5)	0.6669 (1)
Electricity	0.7346 (1)	0.6228 (6)	0.6231 (5)	0.5635 (8)	0.5154 (9)	0.5676 (7)	0.6317 (4)	0.6343 (3)	0.7193 (2)
Kddcup99	0.9829 (1)	0.8796 (3)	0.7186 (6)	0.4763 (8)	0.3017 (9)	0.7670 (4)	0.7537 (5)	0.6535 (7)	0.9449 (2)
Covertype	0.8486 (1)	0.5251 (9)	0.5739 (8)	0.6243 (6)	0.6269 (5)	0.6465 (3)	0.6354 (4)	0.6183 (7)	0.7526 (2)
Weather	0.9566 (1)	0.8478 (3)	0.8050 (6)	0.8057 (5)	0.8043 (7)	0.8139 (4)	0.7813 (9)	0.8034 (8)	0.8970 (2)
平均排名	4.30	5.50	5.00	6.40	8.10	4.70	4.00	5.60	1.40
注：黑体数字表示最高平均实时精度及其排名.

下载: 导出CSV

| 显示表格

图4为TAEL和各个对比方法在所有数据集上的累积精度，表4为TAEL和各个对比方法的最终累积精度和综合排名. 由图4和表4可知，在标准数据集上TAEL的累积精度最高，在真实数据集上TAEL的累积精度也有较好的排名，分析其原因是该方法针对漂移类型对数据块逐一处理的策略能够使模型对不同类型的概念漂移做出及时响应，保持较高的精度.

图 4 不同方法在各数据集上的累积精度比较

Figure 4. Comparison of cumulative accuracy of different methods on each dataset

下载: 全尺寸图片幻灯片

表 4 不同方法在各数据集上的最终累积精度

Table 4. Final Cumulative Accuracy of Different Methods on Each Dataset

数据集	最终累积精度（排名）
数据集	DWCDS	DNN-2	DNN-4	DNN-8	DNN-16	HBP	Highway	Resnet	TAEL
Sea	0.7500(8)	0.7495(9)	0.7543(7)	0.7861(4)	0.7820(5)	0.8083(2)	0.7977(3)	0.7803(6)	0.8370(1)
Hyperplane	0.6763(9)	0.8600(5)	0.8580(6)	0.8483(7)	0.7230(8)	0.8691(3)	0.8840(2)	0.8636(4)	0.9110(1)
RBFBlips	0.8231(8)	0.8345(7)	0.8828(2)	0.8708(3)	0.5379(9)	0.8476(5)	0.8586(4)	0.8374(6)	0.9481(1)
LED_abrupt	0.3681(8)	0.5869(3)	0.5803(4)	0.5305(7)	0.2786(9)	0.5693(6)	0.5893(2)	0.5796(5)	0.6229(1)
LED_gradual	0.3821(8)	0.5776(4)	0.5898(2)	0.5344(7)	0.3032(9)	0.5650(6)	0.5843(3)	0.5699(5)	0.6199(1)
Tree	0.5558(2)	0.4329(5)	0.4575(3)	0.3330(8)	0.3033(9)	0.3591(7)	0.4472(4)	0.4310(6)	0.6636(1)
Electricity	0.7404(1)	0.6434(6)	0.6450(4)	0.5840(8)	0.5735(9)	0.5969(7)	0.6447(5)	0.6502(3)	0.6674(2)
Kddcup99	0.9833(1)	0.9832(2)	0.9195(7)	0.7813(8)	0.6160(9)	0.9823(3)	0.9614(4)	0.9276(6)	0.9562(5)
Covertype	0.8481(1)	0.6983(9)	0.7336(8)	0.7676(6)	0.7685(5)	0.7919(2)	0.7823(3)	0.7709(4)	0.7463(7)
Weather	0.9571(1)	0.8872(3)	0.8743(6)	0.8754(5)	0.8664(8)	0.8824(4)	0.8362(9)	0.8708(7)	0.8933(2)
平均排名	4.70	5.30	4.90	6.30	8.00	4.50	3.90	5.20	2.20
注：黑体数字表示最高的最终累积精度及其排名.

下载: 导出CSV

| 显示表格

本文使用非参数检验方法Friedman-Test^[35]对TAEL与对比方法相比较的性能优势进行统计检验. 对于给定的K（K=9）种方法和N（N=10）个数据集，令 $r_i^j$ 为第j个方法在第i个数据集上的秩，则第j个算法的秩和平均为

${R_j} = \frac{1}{N}\sum\limits_{i = 1}^N {r_i^j} .$

(21)

零假设H₀假定所有方法的性能是相同的. 在此前提下，当N与 $K$ 足够大时，Friedman统计值 ${\tau _F}$ 服从第一自由度为 $K - 1$ 、第二自由度为 $(K - 1)(N - 1)$ 的F分布：

${\tau _F} = \frac{{(N - 1){\tau _{{\chi ^2}}}}}{{N(K - 1) - {\tau _{{\chi ^2}}}}}, \qquad\qquad\qquad\quad$

(22)

${\tau _{{\chi ^2}}} = \frac{{12N}}{{K(K + 1)}}\left [\sum\limits_{j = 1}^K {R_j^2} - \frac{{K{{(K + 1)}^2}}}{4}\right].$

若计算得到的统计值大于某一显著性水平下F分布临界值，则拒绝零假设H₀，表明各方法的秩和存在显著差异，即测试方法性能存在显著差异；反之则接受零假设H₀，所有方法的性能没有明显差异.

在 $\alpha = 0.05$ 的情况下F分布临界值 $\tau _F^{0.05}(8,72) = 2.069\;8$ ，经计算可得在不同性能指标下的Friedman统计值 ${\tau _F}$ ，如所示. 从可以看出，平均实时精度和最终累积精度下的 ${\tau _F}$ 统计值均大于临界值 $\tau _F^{0.05}(8,72)$ ，拒绝零假设 ${H}_{0}$ ，说明所有方法性能存在显著差异.

表 5 平均实时精度和最终累积精度下的

${\tau _F}$

Table 5.

${\tau _F}$ of Average Real-Time Accuracy and Final Cumulative Accuracy

评价指标	${\tau _F}$	$\tau _F^{0.05}(8,72)$
平均实时精度	7.2260	2.0698
最终累积精度	4.5747	2.0698

下载: 导出CSV

| 显示表格

本文用Bonferroni-Dunn测试^[36]计算了所有方法的显著性差异，用于比较2种方法之间是否存在显著差异. 若2种方法的秩和平均差值大于临界差，则这2种方法的性能存在显著差异：

$CD = {q_\alpha }\sqrt {\frac{{K(K + 1)}}{{6N}}} ,$

(23)

其中当 $K = 9$ ， $N = 10$ 时，可以查表得到 ${q_{\alpha = 0.05}} = 2.724$ ，经计算得到显著性水平 $\alpha = 0.05$ 的情况 $CD = 3.336\;2$ . 不同方法在平均实时精度和最终累积精度上的统计分析结果如图5所示，在图中将没有显著性差异的方法使用黑线连接起来. 结果表明，在统计意义上，TAEL方法排名最好且具有明显的优势.

图 5 不同方法在平均实时精度和最终累积精度上的显著性差异分析

Figure 5. Analysis of critical difference in average real-time accuracy and final cumulative accuracy of different methods

下载: 全尺寸图片幻灯片

3.4.2 模型鲁棒性分析

为了衡量各个方法的算法稳定性，本节计算每个方法在各个数据集上的鲁棒性，图6展示了计算结果. 图6中每个小矩形的面积代表的是算法在某种数据集上的鲁棒性值的大小，每一列上展示的数值代表算法在所有数据集上的鲁棒性值总和，即该算法的整体鲁棒性. 由图6可知，在大多数情况下，TAEL的鲁棒性都能取得较好的排名，且整体鲁棒性最高，这说明该方法对数据的噪声和异常值具有更强的鲁棒性，能提高集成模型的整体泛化性能.

图 6 不同方法的鲁棒性比较

Figure 6. Comparison of robustness of different methods

下载: 全尺寸图片幻灯片

3.4.3 模型收敛性分析

为比较各个方法在概念漂移发生后的收敛性能，本节计算并分析了各个方法在5个合成数据集的概念漂移位点上的收敛速度. 在收敛位点的判定过程中，设定收敛判定阈值 $\gamma = 0.02$ ，参照位点个数 $K = 20$ . 表6展示的为各个方法在数据集上的已知漂移位点上计算得到的收敛速度. 由于个别方法在漂移位点处精度保持平稳波动，因此，对该位点的收敛速度不做统计，用“-”进行表示. 从表6可以看出，TAEL在多数情况下都具有较快的收敛速度，是因为该方法及时更新基学习器使其能尽快适应新的数据分布，集成有效性得到提高. 在整体排名中TAEL处于第一，说明该方法具有较快的收敛速度，收敛性能较好.

表 6 不同方法在各数据集上的收敛速度

Table 6. Recovery Speed Under Accuracy of Different Methods on Each Dataset

数据集	DWCDS		DNN-2		DNN-4		DNN-8		DNN-16
Sea	0.67/0.25/0.45		0.97/2.63/0.70		1.24/1.15/2.18		2.36/1.55/2.78		2.60/1.66/0.20
RBFBlips	0.56/0.85/0.16		1.17/1.56/0.41		0.38/0.61/0.22		0.56/0.94/0.21		-/-/-
LED_abrupt	3.70		9.77		14.92		19.85		17.94
LED_gradual	2.20/1.95/-		10.70/7.43/6.01		10.91/7.89/3.58		14.24/11.48/3.70		14.83/9.61/4.55
Tree	9.29/4.69/4.70		-/23.81/0.87		2.62/15.22/7.69		4.44/0.88/0.88		0.89/0.88/0.88
平均排名	3.54		5.85		4.85		5.54		5.77

数据集		HBP		Highway		Resnet		TAEL
Sea		0.77/0.51/1.82		2.76/0.49/1.81		0.57/0.56/0.73		0.45/1.97/1.50
RBFBlips		1.14/0.83/0.21		0.85/1.14/0.42		1.01/2.02/0.62		0.03/0.29/0.01
LED_abrupt		20.20		9.80		11.65		5.28
LED_gradual		13.12/10.86/7.42		9.66/6.68/5.37		12.80/7.06/5.47		8.25/5.34/3.71
Tree		3.56/0.87/1.75		-/17.56/1.75		-/21.49/1.75		3.85/3.92/3.85
平均排名		5.38		5.23		5.69		3.15
注：“-”表示对当前位点的收敛速度不进行统计；黑体数字表示最高收敛速度；LED_abrupt包含1个漂移位点，收敛速度只有1个；其他数据集包含3个漂移位点，对应3个收敛速度.

下载: 导出CSV

| 显示表格

4. 结束语

针对概念漂移发生后，在线集成模型无法及时响应数据流的变化而导致泛化性能降低、收敛速度减慢的问题，本文提出一种面向不同类型概念漂移的两阶段自适应集成学习方法. 本文通过检测漂移跨度来确定漂移类型，并采用一种针对漂移类型进行自适应调整的两阶段样本处理机制. 在该机制中，一方面通过样本过滤策略过滤历史样本块中的非关键样本，使历史数据分布更接近当前最新数据分布，提高了基学习器的有效性；另一方面通过样本扩充策略为当前样本集补充合适数量的历史关键样本，解决了当前基学习器的欠拟合问题，同时缓解了扩充后的类别不平衡. 更新后的基学习器组成的集成模型的有效性得到了提高，对不同类型的概念漂移能做出更精准及时的响应. 在集成学习中，集成的多样性同样影响了集成模型的性能，在未来的工作中，将进一步研究针对不同漂移类型提升集成多样性的方法.

作者贡献声明：郭虎升负责思想提出、方法设计、论文写作及修改；张洋负责论文写作、代码实现、数据测试及论文修改；王文剑负责写作指导、修改审定.

图 1 加速后的设计空间探索

Figure 1. Design space exploration after acceleration

下载: 全尺寸图片幻灯片

图 2 统计采样模拟

Figure 2. Statistical sample simulation

下载: 全尺寸图片幻灯片

图 3 综合模拟的流程（SFG ^[69]）

Figure 3. Synthetic simulation flow (SFG^[69])

下载: 全尺寸图片幻灯片

图 4 区间分析对破坏性缺失事件确定的区间基性能进行了分析^[104]

Figure 4. Interval analysis analyzes performance on an interval basis determined by disruptive miss events^[104]

下载: 全尺寸图片幻灯片

图 5 基于图的处理器性能模型^[180]

Figure 5. A graph-based processor performance model^[180]

下载: 全尺寸图片幻灯片

表 1 处理器微架构设计空间探索的加速方法分类

Table 1 Category of Acceleration Methods for Processor Microarchitecture Design Space Exploration

类型	子类型	典型方法
负载选择	基于微架构相关特征的方法	文献[15, 27]
	基于微架构无关特征的方法	MinneSPEC^[28]、文献[29–32]、BenchSubset^[33]、CASH^[34]
	基于微架构相关与无关特征的方法	文献[29, 35–37]、BenchPrime^[38]
部分模拟	统计采样模拟	采样单线程^[39-43]、采样多线程^[44-48]、采样访存^[49-51]
部分模拟	综合模拟	综合单线程^[52-55]、综合多线程^[56-58]、综合访存^[59-62]
设计点选择	采样方法	基于参数敏感度的方法^[15,63-67]、基于实验设计的方法^[6,25,34,67]
设计点选择	迭代搜索方法	启发式方法^[68-70]、组合优化方法^[68,71-74]、统计推理方法^{[14,25-26,67,75]}
模拟工具	软件模拟	SimpleScalar^[76]，SESC^[77]，gem5^[78]
	硬件模拟	FAST^[79], PROTOFLEX^[80-81], RAMP Gold^[82], HAsim^[83], FireSim^[84]
	敏捷开发	基于低级语言的平台^[85-87]、基于高级语言的平台^[50,88-89]
性能模型	特定负载预测模型	参数化模型^[2,4,90]、核函数模型^[13,68,91]、神经网络模型^[3,92-93]、树模型^[94-96]、集成学习模型^[67,97-98]
	跨负载预测模型	基于负载特征^[8,99-100]、基于硬件响应^[9,23,101]、基于迁移学习^[7,10-11]
	机械模型	分析模型^[102-103]、区间模型^[104-106]、图模型^[107-109]、概率统计模型^[110-112]、混合模型^[113-115]

下载: 导出CSV

表 2 加速方法对比

Table 2 Comparison of Acceleration Methods

类型	典型方法	加速比	准确率/%
负载选择	文献[37]	5.2	93.0
部分模拟	文献[116]	520	94.9
设计点选择	文献[117]	23000	99.0
模拟工具	文献[76]	1000	95.0
性能模型	文献[118]	180000	98.2

下载: 导出CSV

表 3 负载选择方法的对比

Table 3 Comparison of Workload Selecting Methods

方法类型	方法来源	使用微架构相关特征的方式	使用微架构无关特征的方式	聚类算法	负载选择比	误差/%
基于微架构相关特征的方法	文献[15]	参数显著性排名	✘	阈值聚类	7/12	-
基于微架构相关特征的方法	文献[27]	执行时间向量	✘	层次聚类	6/11	5
基于微架构无关特征的方法	文献[28]	✘	卡方检验	✘	-/23	-
	文献[29]	✘	主成分分析	层次聚类	7/79	-
	文献[30]	✘	基本块向量	距离最大	60/20 000	-
	文献[31]	✘	主成分分析	k均值聚类	9/21	15
	文献[32]	✘	基本块向量+主成分分析	层次聚类	4/47	-
	文献[33]	✘	分组主成分分析	共识聚类	-	-
	文献[34]	✘	独立成分分析	多种聚类	5/27	3
	文献[120]	✘	主成分分析/遗传算法	k质心聚类	50/118	5
	文献[121−122]	✘	凸壳体积、主成分分析	遗传算法	6/22	-
基于微架构相关与无关特征的方法	文献[29,35]	主成分分析		层次聚类	14/29	-
	文献[36]	主成分分析		层次聚类	10/23	-
	文献[37]	主成分分析		层次聚类	12/43	7
	文献[123]	多元因素分析		层次聚类	10/23	-
	文献[38]	主成分分析+线性判别		多种聚类	20/54	-
注：“负载选择比”列中的“/”表示选择的负载数量和全部负载数量之比，“-”表示文献中无数据. “✘”表示无该项.

下载: 导出CSV

表 4 常用基准套件汇总

Table 4 Summary of Common Benchmark Suits

类型	工作负载	简称
多媒体和通信	MediaBench^[124]	MediaBench
嵌入式	MiBench^[125]	MiBench
单线程	SPEC CPU 2000^[126]	SPEC2k
单线程	SPEC CPU 2006^[127]	SPEC2k6
单/多线程	SPEC CPU 2017^[18]	SPEC2k17
多线程	Princeton Application Repository for Shared-Memory Computers^[128]	PARSEC
多线程	Stanford Parallel Applications for Shared Memory^[129]	SPLASH

下载: 导出CSV

表 5 微架构相关特征

Table 5 Microarchitecture-Dependent Features

类型	特征
整体聚合	执行时间、CPI、功率
控制流	分支预测MPKI、BTB命中率
cache行为（Icache/Dcache/L2/L3）	访问数量、命中数量、MPKI
TLB行为（ITLB/DTLB/L2TLB）	访问数量、命中数量、MPKI
注：MPKI表示每千条指令缺失.

下载: 导出CSV

表 6 微架构无关特征

Table 6 Microarchitecture-Independent Features

类型	子类型	特征
指令流	指令混合	整型、浮点、SIMD等
		控制分支
		存储读/写
	寄存器通信	平均操作数数量
		平均使用次数
		重用距离
	指令级并行性	不同窗口大小的并行度
	指令级并行性	基本块大小
	指令局部性	指令工作集大小
	指令局部性	时间、空间重用距离
数据流	数据局部性	数据工作集大小
	数据局部性	时间、空间重用距离
	通信特征	私有数据读写次数
	通信特征	生产者写/消费者读次数
注：SIMD表示单指令多数据流.

下载: 导出CSV

表 7 部分模拟加速方法的对比

Table 7 Comparison of Partial Simulation Acceleration Methods

类型	目标	子类型	方法来源	指令流	数据流	微架构相关特征	加速比	误差/%
统计采样模拟	采样单线程	随机采样	文献[134]	✘	✘	✘	-	7~17
		均匀采样	文献[39]	✘	✘	✘	35~60	0.6
		均匀采样	文献[40]	✘	✘	✘	~4 000	3.5
		代表性采样	文献[41−42,135−136]	✔	✘	✘	62~107	3.7
			文献[43]	✔	✘	✘	1~1.4	0.5
			文献[137]	✔	✘	IPC, cache	~100	3
			文献[138]	✔	✘	IPC, cache	-	2~8
	采样多线程	基于时间	文献[44]	✔	✘	IPC	10	5
			文献[139]	✔	✘	IPC	5.8	3.5
			文献[45]	✔	✘	IPC	20	5.3
		基于负载和特定同步	文献[47]	✔	✘	✘	25	0.9
		基于负载和特定同步	文献[48]	✔	✘	✘	220	0.5
		基于循环迭代	文献[46]	✔	✘	✘	801	2.3
	采样访存	基于检查点	文献[49]	✘	✔	cache, BP	8 000~15 000	~0.6
			文献[51]	✘	✔	cache, BP	50~100	~0.6
			文献[50,140−141]	✘	✔	cache, BP	-	-
		基于预热	文献[142]	✔	✔	cache, BP	8 000~15 000	~0.6
			文献[143]	✔	✔	cache, BP	~100	1.5
			文献[144]	✔	✔	cache, BP	~70	0.3
			文献[145−147]	✔	✔	cache, BP	-	-
综合模拟	综合单线程		文献[54]	✔	✘	cache, BP	-	5~7
			文献[148]	✔	✘	cache, BP	-	4.1
			文献[149]	✔	✘	cache, BP	-	-
			文献[52−53]	✔	✘	cache, BP	-	8
			文献[150]	✔	✘	cache, BP	~1 000	6.6
			文献[55,151]	✔	✔	cache, BP	~1 000	2.4
			文献[116]	✔	✔	cache, BP	520	5.1
			文献[152]	✔	✔	cache, BP	-	3.2
			文献[153−155]	✔	✔	✘	-	-
	综合多线程		文献[58]	✔	✔	✘	9~385	3.8~9.8
			文献[156]	✔	✔	✘	1 000~10 000	4.9
			文献[56−57]	✔	✔	cache, BP	40~70	5.5
			文献[157]	✔	✔	cache, BP	21	8
	综合访存		文献[59]	✘	✔	✘	-	0.4~3.1
			文献[60−61]	✘	✔	✘	-	-
			文献[158−160]	✔	✔	✘	31	4.8
			文献[161]	✔	✔	✘	20	2.8
			文献[62]	✔	✔	✘	20~50	4.2
			文献[162]	✔	✔	✘	-	9
注：“-”表示文献无该数据. “✔”表示有使用该类数据，“✘”表示没有使用该类数据.

下载: 导出CSV

表 8 实验设计的对比

Table 8 Comparison of Design of Experiments

类型	实验设计	样本数
随机	均匀随机^[3,170]	M
基于参数级别	2级全阶乘^[25]	2^N
	中心复合设计^[25]	1+2N+2^N
	Box-Behnken^[25]	1+2^N
	PB设计^[15]	2N
	正交设计^[6-7]	相对固定
	拉丁超立方体^[92,97]、均匀拉丁超立方体^[34,171]	M
基于距离	智能采样^[3]	M
	最小的成对距离最大^[169]	M
	最大化距离矩阵的迹^[14]	M
	k均值^[172-173]	N
注：M为所需的样本数，N为参数个数.

下载: 导出CSV

表 9 迭代搜索加速方法的对比

Table 9 Comparison of Iterative Searching Acceleration Methods

类型	子类型	方法来源	代理模型	搜索/获取函数	硬件设计空间
启发式		文献[174]	-	参数聚类、贪心	单核片上系统
		文献[64]	-	敏感度、贪心	cache微架构
		文献[66]	-	敏感度、贪心	FPGA软核
		文献[175]	-	二进制搜索树	VLIW
		文献[176]	-	贪心、单目标化	CMP
组合优化	遗传算法	文献[71]	-	GA	单核片上系统
		文献[72]	2层次模拟	局部搜索+GA	单核CPU
		文献[73,177]	模糊系统	GA	VLIW
		文献[171]	多项式回归	GA	单核CPU
		文献[117]	-	爬山/GA/蚁群	CMP
		文献[74]	ANN预测级别	NSGA-II	CMP
		文献[69]	ANN	NSGA-II	CMP
		文献[178]	ANN	NSGA-II	VLIW
		文献[68]	ACOSSO	NSGA-II	CMP
	模拟退火	文献[178]	ANN预测级别	模拟退火	VLIW
	模拟退火	文献[179]	多种模型之一^[25]	多种搜索算法	CMP
统计推理	不确定度	文献[67,97]	AdaBoost.ANN	CoV	单核CPU
	不确定度	文献[172−173]	XGBoost	距离的最小值	单核CPU
	预期改善	文献[75,180]	克里金模型预测级别	EI(+GA)	CMP
	预期改善	文献[34]	随机深林	EI	CMP
	超体积改善	文献[13]	ACOSSO	EHVI	CMP
		文献[14]	高斯过程	EHVI	单核CPU
		文献[181]	AdaGBRT	HVI+均匀性	单核CPU
		文献[182]	BagGBRT	HVI+UCB	单核CPU
	帕累托	文献[25]	多种模型之一	候选帕累托最优解集	CMP
		文献[183−184]	马尔可夫决策	帕累托覆盖	CMP
		文献[26,168]	马尔可夫网预测分布	帕累托最优解集	CMP
注：“-”表示该方法只以软件模拟或基于RTL的电路评估的方式获取性能指标，其余方法可通过训练代理模型替代软件模拟来获取指标或指标之间的关系.

下载: 导出CSV

表 10 模拟工具的对比

Table 10 Comparison of Simulation Tools

类型	准确率	模拟速度	灵活性	开发难度
软件模拟	低	中（~10 MHz）	高	低
硬件模拟	中	快（~100 MHz）	低	中
敏捷设计	高	慢（1~5 kHz）	中	高

下载: 导出CSV

表 11 硬件模拟平台的对比

Table 11 Comparison of Hardware Simulation Platforms

年份	平台	功能模拟	时序模拟	核心数	速率/ (MIPS/核)
2007	FAST^[79]	QEMU	FPGA	1	1.20
2009	ProtoFlex^[80-81]	FPGA	软件	16	-
2010	RAMP Gold^[82]	FPGA	FPGA	64	0.78
2011	HAsim^[83]	FPGA	FPGA	15	8.47
2018	FireSim^[84]	FPGA	FPGA	4096	3.42
注：速率的单位为MIPS/核. “-”表示文献无该数据.

下载: 导出CSV

表 12 敏捷开发平台的对比

Table 12 Comparison of Agile Development Platforms

语言类型	平台	设计语言	指令集	年份
低级语言	OpenPiton^[85]	Verilog HDL	SPARCv9	2016
	LiveHD^[86]	Verilog HDL	RISC-V	2020
	BlackParrot ^[87]	SystemVerilog	RISC-V	2020
高级语言	CMD^[88]	BlueSpec	RISC-V	2018
	Agile^[197]	Chisel	RISC-V	2016
	Chipyard^[89]	Chisel	RISC-V	2020
	MINJIE^[50]	Chisel	RISC-V	2022
语言模型	llvm-mca^[198]	-	-	2018
	Ithemal^[199]	-	CISC	2019
	Chip-Chat^[200]	自然语言	-	2023
	ChipGPT^[201]	自然语言	RISC	2023
	RTLLM^[202]	自然语言	RISC	2023
注：“-”表示文献无该项.

下载: 导出CSV

表 13 性能模型的对比

Table 13 Comparison of Performance Models

类型	准确性	复杂度	可解释性
预测模型	低	低	低
机械模型	高	高	高

下载: 导出CSV

表 14 性能预测模型的对比

Table 14 Comparison of Performance Prediction Models

类型	准确性	复杂度	可解释性
参数化	低	低	高
核函数	中	中	低
神经网络	中	高	低
树模型	中	中	高
集成学习	高	高	中

下载: 导出CSV

表 15 特定负载预测模型的对比

Table 15 Comparison of Workload-Specific Prediction Models

类型	预测模型	硬件设计空间	预测指标	负载	误差/%	R²	采样/设计空间
参数化	线性回归^[90]	单核	CPI	MinnerSPEC	0.8	-	200/67×10⁶
	受限三次样条回归^[2,4]	单核、异构核	CPI, E, P	SPEC2k	4.9	-	4×10³/22×10⁹
	三次样条回归模型^[5]	单核、多核	T	18项负载	1.4	-	300/4.3×10⁹
	埃尔米特多项式插值^[210]	PHT, cache	E	SPEC2k, MediaBench	-	-	243/19×10³
核函数	支持向量机^[170]	单核	T, E	SPEC2k	0.5	-	12/4608
	内核典型相关分析^[211]	多核	T, E	ENePBench	6.2	0.88	450/2.8×10⁶
	ACOSSO^[68]	单核、多核	T, E, P	SPEC2k, SPLASH-2	-	-	450/128×10³
	ACOSSO^[13]	多核	T, E, P	SPLASH-2	-	-	100/332×10³
	高斯过程^[91]	核数	T	SPLASH-3, PARSEC-3	-	0.82	67/68
	高斯过程^[14]	单核	T, E, P	27项负载	-	-	14/994
神经网络	径向基函数网络^[92]	单核	CPI	MinnerSPEC	2.8	-	200/512
	小波神经网络^[93]	单核	CPI, E, P	SPEC2k	-	-	1024/246×10³
	神经网络^{[3,209,212-213]}	单核、多核	CPI	MinneSPEC等	2.3	-	221/23×10³
	神经网络+遗传算法^[214]	单核	CPI	SPEC2k	3.3		230/23×10³
树模型	模型树^[94]	性能计数器	CPI	SPEC2k6	7.8	0.98	-
	模型树^[95]	单核	T, E	图像压缩负载	1.3	0.95	3211/3288
	决策树^[138]	性能计数器	CPI	SPEC2k6,SysMark07等	2	-	-
	决策树^[96]	异构核	T, E	SD-VBS, MiBench	2.1	-	664/830
集成学习	自适应提升+神经网络^[67,97]	单核	CPI	SPEC2k6	-	-	264/8.4×10⁶
	梯度提升回归树^[169]	单核、多核	T	SPEC2k, SPLASH-2	1.1	-	3×10³/15×10⁶
	XGBoost^[172]	单核	E	riscv-tests	3.4	0.99	1120/1200
	提升法+梯度提升回归树^[181]	单核	CPI, E, P	SPEC2k17	-	-	100/2×10³
	装袋法+模型树^[98]	单核	CPI, E	SPEC2k	-	-	320/71×10⁶
	装袋法+梯度提升回归树^[182]	单核	CPI, E, P	SPEC2k17	-	-	100/37×10³
	堆叠法+决策树^[22]	单核、多核	T, E	SPEC2k6,SPLASH-2	-	-	100/605×10³
	堆叠法+异类模型^[118]	单核	CPI, E	SPEC2k	1.8	-	3×10³/2.5×10⁹
注：硬件设计空间中单核主要包括单核处理器微架构，多核指基于总线或片上网络的同构多核处理器. “T”指时间，“E”指功率，“P”指对多个性能指标探索帕累托最优解集，误差以CPI的百分比绝对误差衡量（越接近0越好），R²为相关系数（越接近1越好），“-”表示该工作无显式标注数据.

下载: 导出CSV

表 16 跨负载预测模型工作的对比

Table 16 Comparison of Cross-Workload Prediction Model Work

类型	来源	预测模型	跨负载方法核心	性能指标	设计点数	误差/%	R²
负载特征	文献[99]	归一化、PCA+GA、线性回归	负载特征、平均相似负载的结果	时间	35×25+0	-	-
	文献[8]	多项式回归+遗传算法	负载特征	CPI	360×7+0	8~10	>0.90
	文献[100]	模型树	负载特征	CPI、功率	500×25+0	-	0.90
	文献[34]	多种模型之一	负载特征、最近邻归类模型	CPI、功率	3 000×10+0	-	0.98
硬件响应	文献[9]	神经网络	模型本身泛化	时间	639×27+50	-	-
	文献[23]	矩阵补全算法	模型本身泛化	CPI、功率	128×20+20	10.0	-
	文献[101]	线性回归	响应边际关系、最近邻归类	CPI	60×23+600	6.3	0.92
	文献[215]	神经网络	响应签名、模型本身泛化	时间、EDP	1000×8+0	4.2	-
迁移学习	文献[11]	神经网络	线性回归	CPI、功率	512×5+32	7.0	0.95
	文献[10,216]	神经网络	贪心选择负载、线性回归	CPI、功率	512×5+32	3.0	-
	文献[7]	模型树+自适应提升	负载聚类、样本迁移TrAdaBoost	CPI	10×5+10	7.0	0.91
	文献[6]	神经网络+自适应提升	支持向量机	CPI	128×3+40	5.5	0.93
注：“-”表示文献无该数据. “设计点数”列中的表达形式为源样本数量×源负载数量+目标样本数量.

下载: 导出CSV

表 17 跨负载预测模型的对比

Table 17 Comparison of Cross-Workload Prediction Models

类型	核心	准确性	复杂度
负载特征	特征空间的相似性	低	低
硬件响应	硬件响应作为新维度	中	中
迁移学习	元模型的知识迁移	高	高

下载: 导出CSV

表 18 机械模型工作的对比

Table 18 Comparison of Mechanism Model Work

类型	来源	目标架构	组件	预测指标	仅微架构无关特征	误差/%	速率/(MIPS/核)
分析模型	文献[219]	cache	cache	cache缺失	✘	-	-
	文献[102]	乱序、单核	指令窗口、BP、cache	IPC	✘	5.5	100
	文献[103]	cache、多核	cache	IPC	✘	1.57	-
	文献[220]	cache	cache	功率、面积	✘	5	-
	文献[221]	按/乱序、多核	BP、cache、NoC等	功率、面积	✘	11~23	-
区间模型	文献[222]	乱序、单核	BP, cache	IPC	✘	5.8	-
	文献[104]	乱序、单核	BP, cache	IPC	✘	7	-
	文献[223]	乱序、多核	BP, cache	IPC	✘	4.6	~1
	文献[105,224]	按序、单核	指令依赖、BP, cache	CPI、功率	✘	2.5	~6
	文献[225]	乱序、单核	BP, cache	IPC、功率	✔	9.3	1.9
	文献[226]	乱序、多核	BP, cache	CPI	✔	11.2	-
	文献[227]	乱序、单核	SIMD、cache、带宽	CPI、功率	✔	25	-
	文献[228]	乱序、多核	SIMD、cache、带宽	时间、功率	✔	36	-
图模型	文献[107]	乱序、单核	BP, cache	CPI	✘	-	-
	文献[108]	乱序、单核	BP, cache	CPI	✘	-	-
	文献[109]	乱序、多核	BP, cache, NoC	IPC	✘	7.2	~12
概率统计模型	文献[110]	乱序、单核	BP, cache	IPC	✘	2~10	-
	文献[111]	乱序、多核	BP, cache	IPC	✘	7.9	~9
	文献[112]	cache	cache	cache缺失	✘	0.2	-
混合模型	文献[113]	乱序、单核	流水线深度	IPC	✘	-	-
	文献[114]	乱序、单核	cache、MSHR、预取	CPI、cache缺失	✘	9.4	~15
	文献[115]	乱序、单核	执行单元	CPI	✘	5.6	15.1
注：“-”表示文献无该数据.

下载: 导出CSV

表 19 机械模型的对比

Table 19 Comparison of Mechanism Models

类型	核心思想	准确性	复杂度
分析模型	数学公式	低	低
区间模型	事件分隔的区间	中	高
图模型	依赖图的关键路径	中	中
概率统计模型	事件发生的概率	中	中
混合模型	分析模型+预测模型	中	低

下载: 导出CSV

参考文献(237)

[1]	Azizi O, Mahesri A, Lee B C, et al. Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis[C]//Proc of the 27th Annual Int Symp on Computer Architecture. New York: ACM, 2010: 26–36
[2]	Lee B C, Brooks D M. Illustrative design space studies with microarchitectural regression models[C]//Proc of the 13th Int Conf on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2007: 340–351
[3]	Ipek E, McKee S A, Caruana R, et al. Efficiently exploring architectural design spaces via predictive modeling[C]//Proc of the 12th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2006: 195–206
[4]	Lee B C, Brooks D M. Accurate and efficient regression modeling for microarchitectural performance and power prediction[C]//Proc of the 12th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2006: 185–194
[5]	Lee B C, Collins J D, Wang Hong, et al. CPR: Composable performance regression for scalable multiprocessor models[C]//Proc of the 41st Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2008: 270–281
[6]	Li Dandan, Yao Shuzhen, Wang Senzhang, et al. Cross-program design space exploration by ensemble transfer learning[C]//Proc of the 36th IEEE/ACM Int Conf on Computer-Aided Design. Piscataway, NJ: IEEE, 2017: 201–208
[7]	Li Dandan, Wang Senzhang, Yao Shuzhen, et al. Efficient design space exploration by knowledge transfer[C]//Proc of the 11th IEEE/ACM/IFIP Int Conf on Hardware/Software Codesign and System Synthesis. New York: ACM, 2016: 12: 1−12: 10
[8]	Wu Weidan, Lee B C. Inferred models for dynamic and sparse hardware-software spaces[C]//Proc of the 45th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2012: 413–424
[9]	Wang Yu, Lee V, Wei G Y, et al. Predicting new workload or CPU performance by analyzing public datasets[J]. ACM Transactions on Architecture and Code Optimization, 2019, 15(4): 53: 1−53: 21
[10]	Dubach C, Jones T M, O’Boyle M F P. An empirical architecture-centric approach to microarchitectural design space exploration[J]. IEEE Transactions on Computers, 2011, 60(10): 1445−1458 doi: 10.1109/TC.2010.280
[11]	Dubach C, Jones T M, O’Boyle M F P. Microarchitectural design space exploration using an architecture-centric approach[C]//Proc of the 40th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2007: 262–271
[12]	Eeckhout L, De Bosschere K. Speeding up architectural simulations for high-performance processors[J]. Simulation, 2004, 80(9): 451−468 doi: 10.1177/0037549704044326
[13]	Wang Hongwei, Shi Jinglin, Zhu Ziyuan. An expected hypervolume improvement algorithm for architectural exploration of embedded processors[C]//Proc of the 53rd Annual Design Automation Conf. New York: ACM, 2016: 161: 1−161: 6
[14]	Bai Chen, Sun Qi, Zhai Jianwang, et al. BOOM-Explorer: RISC-V BOOM microarchitecture design space exploration framework[C/OL]//Proc of the 40th IEEE/ACM Int Conf on Computer Aided Design. Piscataway, NJ: IEEE, 2021[2023-12-17]. https://ieeexplore.ieee.org/document/9643455
[15]	Yi J J, Lilja D J, Hawkins D M. A statistically rigorous approach for improving simulation methodology[C]//Proc of the 9th Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2003: 281–291
[16]	Monchiero M, Canal R, González A. Power/performance/thermal design-space exploration for multicore architectures[J]. IEEE Transactions on Parallel and Distributed Systems, 2008, 19(5): 666−681 doi: 10.1109/TPDS.2007.70756
[17]	包云岗,常轶松,韩银和,等. 处理器芯片敏捷设计方法:问题与挑战[J]. 计算机研究与发展,2021,58(6):1131−1145 doi: 10.7544/issn1000-1239.2021.20210232 Bao Yungang, Chang Yisong, Han Yinhe, et al. Agile design of processor chips: Issues and challenges[J]. Journal of Computer Research and Development, 2021, 58(6): 1131−1145 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210232
[18]	Standard Performance Evaluation Corporation. SPEC CPU2017[EB/OL]. (2012-12-06)[2023-12-01]. https://www.spec.org/cpu2017
[19]	Yi J J, Lilja D J. Simulation of computer architectures: Simulators, benchmarks, methodologies, and recommendations[J]. IEEE Transactions on Computers, 2006, 55(3): 268−280 doi: 10.1109/TC.2006.44
[20]	Guo Qi, Chen Tianshi, Chen Yunji, et al. Accelerating architectural simulation via statistical techniques: A survey[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2016, 35(3): 433−446 doi: 10.1109/TCAD.2015.2481796
[21]	O’Neal K, Brisk P. Predictive modeling for CPU, GPU, and FPGA performance and power consumption: A survey[C]//Proc of the 2018 IEEE Computer Society Annual Symp on VLSI. Los Alamitos, CA: IEEE Computer Society, 2018: 763–768
[22]	Chen Tianshi, Guo Qi, Tang Ke, et al. ArchRanker: A ranking approach to design space exploration[C]//Proc of the 41st Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2014: 85–96
[23]	Ding Yi, Mishra N, Hoffmann H. Generative and multi-phase learning for computer systems optimization[C]//Proc of the 46th Int Symp on Computer Architecture. New York: ACM, 2019: 39–52
[24]	Panerati J, Beltrame G. A comparative evaluation of multi-objective exploration algorithms for high-level design[J]. ACM Transactions on Design Automation of Electronic Systems, 2014, 19(2): 15: 1–15: 22
[25]	Palermo G, Silvano C, Zaccaria V. ReSPIR: A response surface-based pareto iterative refinement for application-specific design space exploration[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2009, 28(12): 1816−1829 doi: 10.1109/TCAD.2009.2028681
[26]	Mariani G, Palermo G, Zaccaria V, et al. DeSpErate++: An enhanced design space exploration framework using predictive simulation scheduling[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015, 34(2): 293−306 doi: 10.1109/TCAD.2014.2379634
[27]	Cammarota R, Beni L A, Nicolau A, et al. Effective evaluation of multi-core based systems[C]//Proc of the 12th Int Symp on Parallel and Distributed Computing. Piscataway, NJ: IEEE, 2013: 19–25
[28]	KleinOsowski A J, Lilja D J. MinneSPEC: A new spec benchmark workload for simulation-based computer architecture research[J]. IEEE Computer Architecture Letters, 2002, 1(1): 7−10 doi: 10.1109/L-CA.2002.8
[29]	Eeckhout L, Vandierendonck H, De Bosschere K. Workload design: Selecting representative program-input pairs[C]//Proc of the 11th Int Conf on Parallel Architectures and Compilation Techniques. Los Alamitos, CA: IEEE Computer Society, 2002: 83–94
[30]	Breughe M, Eeckhout L. Selecting representative benchmark inputs for exploring microprocessor design spaces[J]. ACM Transactions on Architecture and Code Optimization, 2013, 10(4): 37: 1−37: 24
[31]	Joshi A, Phansalkar A, Eeckhout L, et al. Measuring benchmark similarity using inherent program characteristics[J]. IEEE Transactions on Computers, 2006, 55(6): 769−782 doi: 10.1109/TC.2006.85
[32]	Vandeputte F, Eeckhout L. Phase complexity surfaces: Characterizing time-varying program behavior[C]//Proc of the 3rd High Performance Embedded Architectures and Compilers. Berlin: Springer, 2008: 320–334
[33]	Zhan Hongping, Lin Weiwei, Mao Feiqiao, et al. BenchSubset: A framework for selecting benchmark subsets based on consensus clustering[J]. International Journal of Intelligent Systems, 2022, 37(8): 5248−5271 doi: 10.1002/int.22791
[34]	Sheidaeian H, Fatemi O. Toward a general framework for jointly processor-workload empirical modeling[J]. The Journal of Supercomputing, 2021, 77(6): 5319−5353 doi: 10.1007/s11227-020-03475-9
[35]	Phansalkar A, Joshi A, John L K. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite[C]//Proc of the 34th Int Symp on Computer Architecture. New York: ACM, 2007: 412–423
[36]	Limaye A, Adegbija T. A workload characterization of the SPEC CPU2017 benchmark suite[C]//Proc of the 2018 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2018: 149–158
[37]	Panda R, Song Shuang, Dean J, et al. Wait of a decade: Did SPEC CPU 2017 broaden the performance horizon[C]//Proc of the 23rd IEEE Int Symp on High Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2018: 271–282
[38]	Liu Qingrui, Wu Xiaolong, Kittinger L, et al. BenchPrime: Effective building of a hybrid benchmark suite[J]. ACM Transactions in Embedded Computing Systems, 2017, 16(5): 179: 1−179: 22
[39]	Wunderlich R E, Wenisch T F, Falsafi B, et al. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling[C]//Proc of the 30th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2003: 84–95
[40]	Hassani S, Southern G, Renau J. LiveSim: Going live with microarchitecture simulation[C]//Proc of the 22nd IEEE Int Symp on High Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2016: 606–617
[41]	Hamerly G, Perelman E, Lau J, et al. SimPoint 3.0: Faster and more flexible program phase analysis[J/OL]. Journal of Instruction-Level Parallelism, 2005[2023-12-18]. http://www.jilp.org/vol7/v7paper14.pdf
[42]	Sherwood T, Perelman E, Hamerly G, et al. Discovering and exploiting program phases[J]. IEEE Micro, 2003, 23(6): 84−93 doi: 10.1109/MM.2003.1261391
[43]	Shen Xipeng, Zhong Yutao, Ding Chen. Locality phase prediction[C]//Proc of the 11th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2004: 165–176
[44]	Ardestani E K, Renau J. ESESC: A fast multicore simulator using time-based sampling[C]//Proc of the 19th Int Symp on High Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2013: 448–459
[45]	Jiang Chuntao, Yu Zhibin, Jin Hai, et al. PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling[J]. ACM Transactions on Architecture and Code Optimization, 2013, 10(4): 49: 1–49: 24
[46]	Sabu A, Patil H, Heirman W, et al. LoopPoint: Checkpoint-driven sampled simulation for multi-threaded applications[C]//Proc of the 28th Int Symp on High-Performance Computer Architecture. Piscataway, NJ: IEEE, 2022: 604–618
[47]	Carlson T E, Heirman W, Van Craeynest K, et al. BarrierPoint: Sampled simulation of multi-threaded applications[C]//Proc of the 2014 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2014: 2–12
[48]	Grass T, Carlson T E, Rico A, et al. Sampled simulation of task-based programs[J]. IEEE Transactions on Computers, 2019, 68(2): 255−269 doi: 10.1109/TC.2018.2860012
[49]	Wenisch T F, Wunderlich R E, Ferdman M, et al. SimFlex: Statistical sampling of computer system simulation[J]. IEEE Micro, 2006, 26(4): 18−31 doi: 10.1109/MM.2006.79
[50]	Xu Yinan, Yu Zihao, Tang Dan, et al. Towards developing high performance RISC-V processors using agile methodology[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2022: 1178–1199
[51]	Bryan P D, Rosier M C, Conte T M. Reverse state reconstruction for sampled microarchitectural simulation[C]//Proc of the 2007 IEEE Int Symp on Performance Analysis of Systems & Software. Los Alamitos, CA: IEEE Computer Society, 2007: 190–199
[52]	Nussbaum S, Smith J E. Modeling superscalar processors via statistical simulation[C]//Proc of the 10th Int Conf on Parallel Architectures and Compilation Techniques. Los Alamitos, CA: IEEE Computer Society, 2001: 15–24
[53]	Eeckhout L, Nussbaum S, Smith J E, et al. Statistical simulation: Adding efficiency to the computer designer’s toolbox[J]. IEEE Micro, 2003, 23(5): 26−38 doi: 10.1109/MM.2003.1240210
[54]	Oskin M, Chong F T, Farrens M. HLS: Combining statistical and symbolic simulation to guide microprocessor designs[C]//Proc of the 27th Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2000: 71–82
[55]	Bell R H, John L K. Improved automatic testcase synthesis for performance model validation[C]//Proc of the 19th Annual Int Conf on Supercomputing. New York: ACM, 2000: 111–120
[56]	Genbrugge D, Eeckhout L. Statistical simulation of chip multiprocessors running multi-program workloads[C]//Proc of the 25th Int Conf on Computer Design. Piscataway, NJ: IEEE, 2007: 464–471
[57]	Genbrugge D, Eeckhout L. Chip multiprocessor design space exploration through statistical simulation[J]. IEEE Transactions on Computers, 2009, 58(12): 1668−1681 doi: 10.1109/TC.2009.77
[58]	Hughes C, Li T. Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis[C]//Proc of the 4th Int Symp on Workload Characterization. Los Alamitos, CA: IEEE Computer Society, 2008: 163–172
[59]	Balakrishnan G, Solihin Y. WEST: Cloning data cache behavior using stochastic traces[C]//Proc of the 18th IEEE Int Symp on High-Performance Comp Architecture. Los Alamitos, CA: IEEE Computer Society, 2012: 1–12
[60]	Awad A, Solihin Y. STM: Cloning the spatial and temporal memory access behavior[C]//Proc of the 20th IEEE Int Symp on High Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2014: 237–247
[61]	Wang Yipeng, Awad A, Solihin Y. Clone morphing: Creating new workload behavior from existing applications[C]//Proc of the 2017 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2017: 97–108
[62]	Wang Yipeng, Balakrishnan G, Solihin Y. MeToo: Stochastic modeling of memory traffic timing behavior[C]//Proc of the 24th Int Conf on Parallel Architecture and Compilation. Los Alamitos, CA: IEEE Computer Society, 2015: 457–467
[63]	Hekstra G J, La Hei G D, Bingley P, et al. TriMedia CPU64 design space exploration[C]//Proc of the 17th IEEE Int Conf on Computer Design: VLSI in Computers and Processors. Los Alamitos, CA: IEEE Computer Society, 1999: 599–606
[64]	Fornaciari W, Sciuto D, Silvano C, et al. A design framework to efficiently explore energy-delay tradeoffs[C]//Proc of the 9th Int Symp on Hardware/Software Codesign. New York: ACM, 2001: 260–265
[65]	Fornaciari W, Sciuto D, Silvano C, et al. A sensitivity-based design space exploration methodology for embedded systems[J]. Design Automation for Embedded Systems, 2002, 7(1): 7−33
[66]	Sheldon D, Kumar R, Lysecky R, et al. Application-specific customization of parameterized FPGA soft-core processors[C]//Proc of the 25th IEEE/ACM Int Conf on Computer-Aided Design. New York: ACM, 2006: 261–268
[67]	Li Dandan, Yao Shuzhen, Liu Yuhang, et al. Efficient design space exploration via statistical sampling and AdaBoost learning[C]//Proc of the 53rd Annual Design Automation Conf. New York: ACM, 2016: 142: 1−142: 6
[68]	Wang Hongwei, Zhu Ziyuan, Shi Jinglin, et al. An accurate acosso metamodeling technique for processor architecture design space exploration[C]//Proc of the 20th Asia and South Pacific Design Automation Conf. Piscataway, NJ: IEEE, 2015: 689–694
[69]	Mariani G, Palermo G, Zaccaria V, et al. Design-space exploration and runtime resource management for multicores[J]. ACM Transactions on Embedded Computing Systems, 2013, 13(2): 20: 1−20: 27
[70]	Jahr R, Calborean H, Vintan L, et al. Boosting design space explorations with existing or automatically learned knowledge[C]//Proc of the 15th Measurement, Modelling, and Evaluation of Computing Systems and Dependability and Fault Tolerance. Berlin: Springer, 2012: 221–235
[71]	Palesi M, Givargis T. Multi-objective design space exploration using genetic algorithms[C]//Proc of the 10th Int Symp on Hardware/Software Codesign. New York: ACM, 2002: 67–72
[72]	Eyerman S, Eeckhout L, De Bosschere K. Efficient design space exploration of high performance embedded out-of-order processors[C]//Proc of the 9th Design, Automation & Test in Europe Conf and Exhibition. Piscataway, NJ: IEEE, 2006: 351−356
[73]	Ascia G, Catania V, Di Nuovo A G, et al. Efficient design space exploration for application specific systems-on-a-chip[J]. Journal of Systems Architecture, 2007, 53(10): 733−750 doi: 10.1016/j.sysarc.2007.01.004
[74]	Mariani G, Palermo G, Silvano C, et al. Multi-processor system-on-chip design space exploration based on multi-level modeling techniques[C]//Proc of the 9th Int Conf on Embedded Computer Systems: Architectures, Modeling and Simulation. Piscataway, NJ: IEEE, 2009: 118–124
[75]	Mariani G, Palermo G, Zaccaria V, et al. OSCAR: An optimization methodology exploiting spatial correlation in multicore design spaces[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2012, 31(5): 740−753 doi: 10.1109/TCAD.2011.2177457
[76]	Burger D, Austin T M. The SimpleScalar tool set, version 2.0[J]. ACM SIGARCH Computer Architecture News, 1997, 25(3): 13−25 doi: 10.1145/268806.268810
[77]	Renau J, Fraguela B, Tuck J, et al. SESC simulator[EB/OL]. 2005[2023-12-01]. http://sesc.sourceforge.net
[78]	Binkert N, Beckmann B, Black G, et al. The gem5 simulator[J]. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1−7 doi: 10.1145/2024716.2024718
[79]	Chiou D, Sunwoo D, Kim J, et al. FPGA-accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators[C]//Proc of the 40th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2007: 249–261
[80]	Chung E S, Nurvitadhi E, Hoe J C, et al. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs[C]//Proc of the 16th Int ACM/SIGDA Symp on Field Programmable Gate Arrays. New York: ACM, 2008: 77–86
[81]	Chung E S, Papamichael M K, Nurvitadhi E, et al. ProtoFlex: Towards scalable, full-system multiprocessor simulations using FPGAs[J]. ACM Transactions on Reconfigurable Technology and Systems, 2009, 2(2): 15: 1–15: 32
[82]	Tan Zhangxi, Waterman A, Avizienis R, et al. RAMP Gold: An FPGA-based architecture simulator for multiprocessors[C]//Proc of the 47th Design Automation Conf. New York: ACM, 2010: 463–468
[83]	Pellauer M, Adler M, Kinsy M, et al. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing[C]//Proc of the 17th Int Symp on High Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2011: 406–417
[84]	Karandikar S, Mao H, Kim D, et al. FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud[C]//Proc of the 45th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2018: 29–42
[85]	Balkind J, McKeown M, Fu Yaosheng, et al. OpenPiton: An open source manycore research framework[C]//Proc of the 21st Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2016: 217–232
[86]	Wang Shenghong, Possignolo R T, Skinner H B, et al. LiveHD: A productive live hardware development flow[J]. IEEE Micro, 2020, 40(4): 67−75 doi: 10.1109/MM.2020.2996508
[87]	Petrisko D, Gilani F, Wyse M, et al. BlackParrot: An agile open-source RISC-V multicore for accelerator socs[J]. IEEE Micro, 2020, 40(4): 93−102 doi: 10.1109/MM.2020.2996145
[88]	Zhang Sizhuo, Wright A, Bourgeat T, et al. Composable building blocks to open up processor design[C]//Proc of the 51st Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2018: 68–81
[89]	Amid A, Biancolin D, Gonzalez A, et al. Chipyard: Integrated design, simulation, and implementation framework for custom socs[J]. IEEE Micro, 2020, 40(4): 10−21 doi: 10.1109/MM.2020.2996616
[90]	Joseph P J, Vaswani K, Thazhuthaveetil M J. Construction and use of linear regression models for processor performance analysis[C]//Proc of the 12th Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2006: 99–108
[91]	Agarwal N, Jain T, Zahran M. Performance prediction for multi-threaded applications[C]//Proc of the 2nd Int Workshop on AI-assisted Design for Architecture. New York: ACM, 2019: 71−76
[92]	Joseph P J, Vaswani K, Thazhuthaveetil M J. A predictive performance model for superscalar processors[C]//Proc of the 39th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2006: 161–170
[93]	Cho C B, Zhang Wangyuan, Li Tao. Informed microarchitecture design space exploration using workload dynamics[C]//Proc of the 40th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2007: 274–285
[94]	Ould-Ahmed-Vall E, Woodlee J, Yount C, et al. Using model trees for computer architecture performance analysis of software applications[C]//Proc of the 2007 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2007: 116–125
[95]	Powell A, Savvas-Bouganis C, Cheung P Y K. High-level power and performance estimation of FPGA-based soft processors and its application to design space exploration[J]. Journal of Systems Architecture, 2013, 59(10): 1144−1156 doi: 10.1016/j.sysarc.2013.08.003
[96]	Mankodi A, Bhatt A, Chaudhury B. Predicting physical computer systems performance and power from simulation systems using machine learning model[J]. Computing, 2022, 105(5): 1−19
[97]	Li Dandan, Yao Shuzhen, Wang Ying. Processor design space exploration via statistical sampling and semi-supervised ensemble learning[J]. IEEE Access, 2018, 6: 25495−25505 doi: 10.1109/ACCESS.2018.2831079
[98]	Guo Qi, Chen Tianshi, Chen Yunji, et al. Effective and efficient microprocessor design space exploration using unlabeled design configurations[C]//Proc of the 22nd Int Joint Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2011: 1671–1677
[99]	Hoste K, Phansalkar A, Eeckhout L, et al. Performance prediction based on inherent program similarity[C]//Proc of the 15th Int Conf on Parallel Architectures and Compilation Techniques. New York: ACM, 2006: 114–122
[100]	Guo Qi, Chen Tianshi, Chen Yunji, et al. Microarchitectural design space exploration made fast[J]. Microprocessors and Microsystems, 2013, 37(1): 41−51 doi: 10.1016/j.micpro.2012.07.006
[101]	Ahmadinejad H, Fatemi O. Moving towards grey-box predictive models at micro-architecture level by investigating inherent program characteristics[J]. IET Computers Digital Techniques, 2018, 12(2): 53−61 doi: 10.1049/iet-cdt.2016.0148
[102]	Taha T M, Wills S. An instruction throughput model of superscalar processors[J]. IEEE Transactions on Computers, IEEE, 2008, 57(3): 389−403 doi: 10.1109/TC.2007.70817
[103]	Xu Chi, Chen Xi, Dick R P, et al. Cache contention and application performance prediction for multi-core systems[C]//Proc of the 2010 IEEE Int Symp on Performance Analysis of Systems & Software. Los Alamitos, CA: IEEE Computer Society, 2010: 76–86
[104]	Eyerman S, Eeckhout L, Karkhanis T, et al. A mechanistic performance model for superscalar out-of-order processors[J]. ACM Transactions on Computer Systems, 2009, 27(2): 3: 1–3: 37
[105]	Breughe M B, Eyerman S, Eeckhout L. Mechanistic analytical modeling of superscalar in-order processor performance[J]. ACM Transactions on Architecture and Code Optimization, 2015, 11(4): 50: 1–50: 26
[106]	Carlson T E, Heirman W, Eeckhout L. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation[C]//Proc of the 2011 Conf on High Performance Computing Networking, Storage and Analysis. New York: ACM, 2011: 52: 1−52: 12
[107]	Wang Lei, Tang Yuxing, Deng Yu, et al. A scalable and fast microprocessor design space exploration methodology[C]//Proc of the 9th Int Symp on Embedded Multicore/Many-core Systems-on-Chip. Los Alamitos, CA: IEEE Computer Society, 2015: 33–40
[108]	Lee J, Jang H, Kim J. RpStacks: Fast and accurate processor design space exploration using representative stall-event stacks[C]//Proc of the 47th Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2014: 255–267
[109]	Jang H, Jo J E, Lee J, et al. RpStacks-MT: A high-throughput design evaluation methodology for multi-core processors[C]//Proc of the 51st Annual IEEE/ACM Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2018: 586–599
[110]	Noonburg D B, Shen J P. A framework for statistical modeling of superscalar processor performance[C]//Proc of the 3rd Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 1997: 298–309
[111]	Chen X E, Aamodt T M. A first-order fine-grained multithreaded throughput model[C]//Proc of the 15th Int Symp on High Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2009: 329–340
[112]	Liang Y, Mitra T. An analytical approach for fast and accurate design space exploration of instruction caches[J]. ACM Transactions on Embedded Computing Systems, 2013, 13(3): 43: 1−43: 29
[113]	Hartstein A, Puzak T R. The optimum pipeline depth for a microprocessor[C]//Proc of the 29th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2002: 7–13
[114]	Chen X E, Aamodt T M. Hybrid analytical modeling of pending cache hits, data prefetching, and mshrs[J]. ACM Transactions on Architecture and Code Optimization, 2011, 8(3): 59−70
[115]	Li L, Pandey S, Flynn T, et al. SimNet: Accurate and high-performance computer architecture simulation using deep learning[C]//Proc of the 2022 ACM SIGMETRICS/IFIP Performance Joint Int Conf on Measurement and Modeling of Computer Systems. New York: ACM, 2022: 67–68
[116]	Panda R, John L K. Proxy benchmarks for emerging big-data workloads[C]//Proc of the 26th Int Conf on Parallel Architectures and Compilation Techniques. Los Alamitos, CA: IEEE Computer Society, 2017: 105–116
[117]	Kang S, Kumar R. Magellan: A search and machine learning-based framework for fast multi-core design space exploration and optimization[C]//Proc of the 2008 Design, Automation and Test in Europe. New York: ACM, 2008: 1432–1437
[118]	Guo Qi, Chen Tianshi, Zhou Zhihua, et al. Robust design space modeling[J]. ACM Transactions on Design Automation of Electronic Systems, 2015, 20(2): 18: 1–18: 22
[119]	张乾龙,侯锐,杨思博,等. 体系结构模拟器在处理器设计过程中的作用[J]. 计算机研究与发展,2019,56(12):2702−2719 doi: 10.7544/issn1000-1239.2019.20190044 Zhang Qianlong, Hou Rui, Yang Sibo, et al. The role of architecture simulators in the process of CPU design[J]. Journal of Computer Research and Development, 2019, 56(12): 2702−2719 (in Chinese) doi: 10.7544/issn1000-1239.2019.20190044
[120]	Hoste K, Eeckhout L. Microarchitecture-independent workload characterization[J]. IEEE Micro, 2007, 27(3): 63−72 doi: 10.1109/MM.2007.56
[121]	Jin Zhanpeng, Cheng A C. Evolutionary benchmark subsetting[J]. IEEE Micro, 2008, 28(6): 20−36 doi: 10.1109/MM.2008.87
[122]	Jin Zhanpeng, Cheng A C. SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework[J]. ACM Transactions on Modeling and Computer Simulation, 2011, 21(3): 21: 1–21: 23
[123]	Jin Zhanpeng, Cheng A C. Improve simulation efficiency using statistical benchmark subsetting: An implantbench case study[C]//Proc of the 45th Annual Design Automation Conf. New York: ACM, 2008: 970–973
[124]	Lee C, Potkonjak M, Mangione-Smith W H. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems[C]//Proc of the 30th Annual Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 1997: 330–335
[125]	Guthaus M R, Ringenberg J S, Ernst D, et al. MiBench: A free, commercially representative embedded benchmark suite[C]//Proc of the 4th Annual IEEE Int Workshop on Workload Characterization. Piscataway, NJ: IEEE, 2001: 3–14
[126]	Standard Performance Evaluation Corporation. SPEC CPU2000[EB/OL]. (2007-06-07)[2023-12-01]. https://www.spec.org/cpu2000
[127]	Standard Performance Evaluation Corporation. SPEC CPU2006[EB/OL]. (2023-01-06)[2023-12-01]. https://www.spec.org/cpu2006
[128]	Bienia C, Kumar S, Singh J P, et al. The parsec benchmark suite: Characterization and architectural implications[C]//Proc of the 17th Int Conf on Parallel Architectures and Compilation Techniques. New York: ACM, 2008: 72–81
[129]	Woo S C, Ohara M, Torrie E, et al. The splash−2 programs: Characterization and methodological considerations[C]//Proc of the 22nd Annual Int Symp on Computer architecture. New York: ACM, 1995: 24–36
[130]	Chandra D, Guo Fei, Kim S, et al. Predicting inter-thread cache contention on a chip multi-processor architecture[C]//Proc of the 11th Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2005: 340–351
[131]	Hsu W C, Chen H, Yew P C, et al. On the predictability of program behavior using different input data sets[C]//Proc of the 6th Annual Workshop on Interaction between Compilers and Computer Architectures. Los Alamitos, CA: IEEE Computer Society, 2002: 45–53
[132]	Hoste K, Eeckhout L. Comparing benchmarks using key microarchitecture-independent characteristics[C]//Proc of the 2nd IEEE Int Symp on Workload Characterization. Los Alamitos, CA: IEEE Computer Society, 2006: 83–92
[133]	Yi J J, Sendag R, Eeckhout L, et al. Evaluating benchmark subsetting approaches[C]//Proc of the 2nd IEEE Int Symp on Workload Characterization. Los Alamitos, CA: IEEE Computer Society, 2006: 93–104
[134]	Conte T M, Hirsch M A, Menezes K N. Reducing state loss for effective trace sampling of superscalar processors[C]//Proc of the 14th Int Conf on Computer Design. Los Alamitos, CA: IEEE Computer Society, 1996: 468–477
[135]	Patil H, Cohn R, Charney M, et al. Pinpointing representative portions of large Intel^® Itanium^® programs with dynamic instrumentation[C]//Proc of the 37th Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2004: 81–92
[136]	Nair A A, John L K. Simulation points for SPEC CPU 2006[C]//Proc of the 26th Int Conf on Computer Design. Los Alamitos, CA: IEEE Computer Society, 2008: 397–403
[137]	Lau J, Perelman E, Calder B. Selecting software phase markers with code structure analysis[C]//Proc of the 4th Int Symp on Code Generation and Optimization. Los Alamitos, CA: IEEE Computer Society, 2006: 135–146
[138]	Lahiri K, Kunnoth S. Fast IPC estimation for performance projections using proxy suites and decision trees[C]//Proc of the 2017 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2017: 77–86
[139]	Carlson T E, Heirman W, Eeckhout L. Sampled simulation of multi-threaded applications[C]//Proc of the 2013 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2013: 2–12
[140]	Patil H, Pereira C, Stallcup M, et al. PinPlay: A framework for deterministic replay and reproducible analysis of parallel programs[C]//Proc of the 8th Annual IEEE/ACM Int Symp on Code Generation and Optimization. New York: ACM, 2010: 2–11
[141]	Patil H, Isaev A, Heirman W, et al. ELFies: Executable region checkpoints for performance analysis and simulation[C]//Proc of the 19th IEEE/ACM Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2021: 126–136
[142]	Wenisch T F, Wunderlich R E, Falsafi B, et al. TurboSMARTS: Accurate microarchitecture simulation sampling in minutes[J]. ACM SIGMETRICS Performance Evaluation Review, 2005, 33(1): 408−409 doi: 10.1145/1071690.1064278
[143]	Khan T M, Pérez D G, Temam O. Transparent sampling[C]//Proc of the 10th Int Conf on Embedded Computer Systems: Architectures, Modeling and Simulation. Piscataway, NJ: IEEE, 2010: 28–36
[144]	Eeckhout L, Luo Yue, De Bosschere K, et al. BLRL: Accurate and efficient warmup for sampled processor simulation[J]. The Computer Journal, 2005, 48(4): 451−459 doi: 10.1093/comjnl/bxh103
[145]	Haskins J W, Skadron K. Accelerated warmup for sampled microarchitecture simulation[J]. ACM Transactions on Architecture and Code Optimization, 2005, 2(1): 78−108 doi: 10.1145/1061267.1061272
[146]	Van Ertvelde L, Hellebaut F, Eeckhout L. Accurate and efficient cache warmup for sampled processor simulation through NSL–BLRL[J]. The Computer Journal, 2008, 51(2): 192−206
[147]	Jiang Chuntao, Yu Zhibin, Jin Hai, et al. Shorter on-line warmup for sampled simulation of multi-threaded applications[C]//Proc of the 44th Int Conf on Parallel Processing. Los Alamitos, CA: IEEE Computer Society, 2015: 350–359
[148]	Bell R, Eeckhout L, John L, et al. Deconstructing and improving statistical simulation in HLS[C]//Proc of the 2004 Workshop on Duplicating, Deconstructing and Debunking held in Conjunction with the 31st Annual Int Symp on Computer Architecture. New York: ACM, 2004: 2−12
[149]	Joshi A, Yi J J, Bell R H, et al. Evaluating the efficacy of statistical simulation for design space exploration[C]//Proc of the 2006 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2006: 70–79
[150]	Eeckhout L, Bell R H, Stougie B, et al. Control flow modeling in statistical simulation for accurate and efficient processor design studies[C]//Proc of the 31st Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2004: 350–361
[151]	Bell R H, Bhatia R R, John L K, et al. Automatic testcase synthesis and performance model validation for high performance PowerPC processors[C]//Proc of the 2006 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2006: 154–165
[152]	Lee H R, Sánchez D. Datamime: Generating representative benchmarks by automatically synthesizing datasets[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2022: 1144–1159
[153]	Joshi A, Eeckhout L, Bell R H, et al. Performance cloning: A technique for disseminating proprietary applications as benchmarks[C]//Proc of the 2nd IEEE Int Symp on Workload Characterization. Los Alamitos, CA: IEEE Computer Society, 2006: 105–115
[154]	Joshi A M, Eeckhout L, John L K, et al. Automated microprocessor stressmark generation[C]//Proc of the 14th Int Symp on High Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2008: 229–239
[155]	Joshi A, Eeckhout L, Bell R H, et al. Distilling the essence of proprietary workloads into miniature benchmarks[J]. ACM Transactions on Architecture and Code Optimization, 2008, 5(2): 10: 1–10: 33
[156]	Ganesan K, John L K. Automatic generation of miniaturized synthetic proxies for target applications to efficiently design multicore processors[J]. IEEE Transactions on Computers, 2014, 63(4): 833−846 doi: 10.1109/TC.2013.36
[157]	Deniz E, Sen A, Kahne B, et al. MINIME: Pattern-aware multicore benchmark synthesizer[J]. IEEE Transactions on Computers, 2015, 64(8): 2239−2252 doi: 10.1109/TC.2014.2349522
[158]	Lee K, Evans S, Cho S. Accurately approximating superscalar processor performance from traces[C]//Proc of the 2009 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2009: 238–248
[159]	Lee K, Cho S. In-N-Out: Reproducing out-of-order superscalar processor behavior from reduced in-order traces[C]//Proc of the 19th Annual Int Symp on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems. Los Alamitos, CA: IEEE Computer Society, 2011: 126–135
[160]	Lee K, Cho S. Accurately modeling superscalar processor performance with reduced trace[J]. Journal of Parallel and Distributed Computing, 2013, 73(4): 509−521 doi: 10.1016/j.jpdc.2012.12.002
[161]	Ganesan K, Jo J, John L K. Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads[C]//Proc of the 2010 IEEE Int Symp on Performance Analysis of Systems & Software. Los Alamitos, CA: IEEE Computer Society, 2010: 33–44
[162]	Panda R, Zheng Xinnian, John L K. Accurate address streams for LLC and beyond (SLAB): A methodology to enable system exploration[C]//Proc of the 2017 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2017: 87–96
[163]	Van Biesbrouck M, Sherwood T, Calder B. A co-phase matrix to guide simultaneous multithreading simulation[C]//Proc of the 2004 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2004: 45–56
[164]	Yi J J, Kodakara S V, Sendag R, et al. Characterizing and comparing prevailing simulation techniques[C]//Proc of the 11th Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2005: 266–277
[165]	Tairum Cruz M, Bischoff S, Rusitoru R. Shifting the barrier: Extending the boundaries of the barrierpoint methodology[C]//Proc of the 2018 IEEE Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2018: 120–122
[166]	Bell R H, John L K. Efficient power analysis using synthetic testcases[C]//Proc of the 1st IEEE Int Symp Workload Characterization. Piscataway, NJ: IEEE, 2005: 110–118
[167]	Penry D A, Fay D, Hodgdon D, et al. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors[C]//Proc of the 12th Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2006: 29–40
[168]	Mariani G, Palermo G, Zaccaria V, et al. DeSpErate: Speeding-up design space exploration by using predictive simulation scheduling[C/OL]//Proc of the 17th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2014[2023-12-18]. https://ieeexplore.ieee.org/document/6800432?arnumber=6800432
[169]	Li Bin, Peng Lu, Ramadass B. Accurate and efficient processor performance prediction via regression tree based modeling[J]. Journal of Systems Architecture, 2009, 55(10): 457−467
[170]	Pang Jiufeng, Li Xiafeng, Xie Jinsong, et al. Microarchitectural design space exploration via support vector machine[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2010, 46(1): 55−63
[171]	Cook H, Skadron K. Predictive design space exploration using genetically programmed response surfaces[C]//Proc of the 45th Annual Design Automation Conf. New York: ACM, 2008: 960–965
[172]	Zhai Jianwang, Bai Chen, Zhu Binwu, et al. McPAT-Calib: A microarchitecture power modeling framework for modern CPUs[C/OL]//Proc of the 40th IEEE/ACM Int Conf on Computer Aided Design. Piscataway, NJ: IEEE, 2021[2023-12-18]. https://ieeexplore.ieee.org/document/9643508
[173]	Zhai Jianwang, Bai Chen, Zhu Binwu, et al. McPAT-calib: A RISC-V boom microarchitecture power modeling framework[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(1): 243−256 doi: 10.1109/TCAD.2022.3169464
[174]	Givargis T, Vahid F, Henkel J. System-level exploration for Pareto-optimal configurations in parameterized systems-on-a-chip[C]//Proc of the 20th IEEE/ACM Int Conf on Computer Aided Design. Los Alamitos, CA: IEEE Computer Society, 2001: 25–30
[175]	Yazdani R, Sheidaeian H, Salehi M E. A fast design space exploration for VLIW architectures[C]//Proc of the 22nd Iranian Conf on Electrical Engineering. Piscataway, NJ: IEEE, 2014: 856–861
[176]	Kansakar P, Munir A. A design space exploration methodology for parameter optimization in multicore processors[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(1): 2−15 doi: 10.1109/TPDS.2017.2745580
[177]	Ascia G, Catania V, Di Nuovo A G, et al. Performance evaluation of efficient multi-objective evolutionary algorithms for design space exploration of embedded computer systems[J]. Applied Soft Computing, 2011, 11(1): 382−398 doi: 10.1016/j.asoc.2009.11.029
[178]	Mariani G, Palermo G, Silvano C, et al. An efficient design space exploration methodology for multi-cluster VLIW architectures based on artificial neural networks[C]//Proc of the 16th IFIP/IEEE Int Conf on Very Large Scale Integration. Piscataway, NJ: IEEE, 2008: 13−15
[179]	Zaccaria V, Palermo G, Castro F, et al. MULTICUBE Explorer: An open source framework for design space exploration of chip multi-processors[C]//Proc of the 23rd Int Conf on Architecture of Computing Systems. Hannover, Germany: VDE Verlag, 2010: 325–331
[180]	Mariani G, Brankovic A, Palermo G, et al. A correlation-based design space exploration methodology for multi-processor systems-on-chip[C]//Proc of the 47th Design Automation Conf. New York: ACM, 2010: 120–125
[181]	Wang Duo, Yan Mingyu, Liu Xin, et al. A high-accurate multi-objective exploration framework for design space of CPU[C/OL]//Proc of the 60th ACM/IEEE Design Automation Conf. Piscataway, NJ: IEEE, 2023[2023-12-18]. https://ieeexplore.ieee.org/document/10247790
[182]	Wang Duo, Yan Mingyu, Teng Yihan, et al. A high-accurate multi-objective ensemble exploration framework for design space of CPU microarchitecture[C]//Proc of the 33rd Great Lakes Symp on VLSI 2023. New York: ACM, 2023: 379–383
[183]	Beltrame G, Fossati L, Sciuto D. Decision-theoretic design space exploration of multiprocessor platforms[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2010, 29(7): 1083−1095
[184]	Beltrame G, Nicolescu G. A multi-objective decision-theoretic exploration algorithm for platform-based design[C]//Proc of the 14th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2011: 1192−1195
[185]	Sheldon D, Vahid F, Lonardi S. Soft-core processor customization using the design of experiments paradigm[C]//Proc of the 10th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2007: 821−826
[186]	Mariani G, Palermo G, Silvano C, et al. Meta-model assisted optimization for design space exploration of multi-processor systems-on-chip[C]//Proc of the 12th Euromicro Conf on Digital System Design, Architectures, Methods and Tools. Los Alamitos, CA: IEEE Computer Society, 2009: 383–389
[187]	Palermo G, Silvano C, Zaccaria V. Multi-objective design space exploration of embedded systems[J]. Journal of Embedded Computing, 2005, 1(3): 305−316
[188]	Wu Nan, Xie Yuan, Hao Cong. IronMan: GNN-assisted design space exploration in high-level synthesis via reinforcement learning[C]//Proc of the 31st Great Lakes Symp on VLSI. New York: ACM, 2021: 39–44
[189]	Wu Nan, Xie Yuan, Hao Cong. IronMan-Pro: Multiobjective design space exploration in HLS via reinforcement learning and graph neural network-based modeling[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(3): 900−913 doi: 10.1109/TCAD.2022.3185540
[190]	Kao S C, Jeong G, Krishna T. ConfuciuX: Autonomous hardware resource assignment for DNN accelerators using reinforcement learning[C]//Proc of the 53rd Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2020: 622–636
[191]	Feng Lang, Liu Wenjian, Guo Chuliang, et al. GANDSE: Generative adversarial network based design space exploration for neural network accelerator design[J]. ACM Transactions on Design Automation of Electronic Systems, 2023, 28(3): 35: 1−35: 20
[192]	Akram A, Sawalha L. A survey of computer architecture simulation techniques and tools[J]. IEEE Access, 2019, 7: 78120−78145 doi: 10.1109/ACCESS.2019.2917698
[193]	Manjikian N. Multiprocessor enhancements of the simplescalar tool set[J]. SIGARCH Computer Architecture News, 2001, 29(1): 8−15 doi: 10.1145/373574.373578
[194]	Qureshi Y M, Simon W A, Zapater M, et al. Gem5-X: A many-core heterogeneous simulation platform for architectural exploration and optimization[J]. ACM Transactions on Architecture and Code Optimization, 2021, 18(4): 44: 1–44: 27
[195]	Carlson T E, Heirman W, Eyerman S, et al. An evaluation of high-level mechanistic core models[J]. ACM Transactions on Architecture and Code Optimization, 2014, 11(3): 28: 1–28: 25
[196]	Tan Zhangxi, Waterman A, Cook H, et al. A case for fame: FPGA architecture model execution[C]//Proc of the 37th Annual Int Symp on Computer Architecture. New York: ACM, 2010: 290–301
[197]	Lee Y, Waterman A, Cook H, et al. An agile approach to building RISC-V microprocessors[J]. IEEE Micro, 2016, 36(2): 8−20 doi: 10.1109/MM.2016.11
[198]	Di Biagio A, Davis M. llvm-mca: A static performance analysis tool[EB/OL]. (2018−03−01)[2023-12-01]. https://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html
[199]	Mendis C, Renda A, Amarasinghe D S, et al. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks[C]//Proc of the 36th Int Conf on Machine Learning. New York: PMLR, 2019: 4505–4515
[200]	Blocklove J, Garg S, Karri R, et al. Chip-Chat: Challenges and opportunities in conversational hardware design[C/OL]//Proc of the 5th ACM/IEEE Workshop on Machine Learning for CAD. Piscataway, NJ: IEEE, 2023[2023-12-18]. https://ieeexplore.ieee.org/document/10299874
[201]	Chang Kaiyan, Wang Ying, Ren Haimeng, et al. ChipGPT: How far are we from natural language hardware design[J]. arXiv preprint, arXiv: 2305.14019, 2023
[202]	Lu Yao, Liu Shang, Zhang Qijun, et al. RTLLM: An open-source benchmark for design RTL generation with large language model[J]. arXiv preprint, arXiv: 2308.05345, 2023
[203]	Balkind J, Chang Tingjung, Jackson P J, et al. OpenPiton at 5: A nexus for open and agile hardware design[J]. IEEE Micro, 2020, 40(4): 22−31 doi: 10.1109/MM.2020.2997706
[204]	Bachrach J, Vo H, Richards B, et al. Chisel: Constructing hardware in a scala embedded language[C]//Proc of the 49th Annual Design Automation Conf. New York: ACM, 2012: 1216–1225
[205]	Patel H D, Shukla S K. Tackling an abstraction gap: Co-simulating SystemC DE with Bluespec ESL[C]//Proc the of 10th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2007: 279−284
[206]	Bourgeat T, Pit-Claudel C, Chlipala A, et al. The essence of Bluespec: A core language for rule-based hardware design[C]//Proc of the 41st ACM SIGPLAN Conf on Programming Language Design and Implementation. New York: ACM, 2020: 243–257
[207]	Käyrä M, Hämäläinen T D. A survey on system-on-a-chip design using Chisel HW construction language[C/OL]//Proc of the 47th Annual Conf of the IEEE Industrial Electronics Society. Piscataway, NJ: IEEE, 2021[2023-12-18]. https://ieeexplore.ieee.org/document/9589614
[208]	王凯帆,徐易难,余子濠,等. 香山开源高性能RISC-v处理器设计与实现[J]. 计算机研究与发展,2023,60(3):476−493 doi: 10.7544/issn1000-1239.202221036 Wang Kaifan, Xu Yinan, Yu Zihao, et al. XiangShan open-source high performance RISC-V processor design and implementation[J]. Journal of Computer Research and Development, 2023, 60(3): 476−493(in Chinese) doi: 10.7544/issn1000-1239.202221036
[209]	Lee B C, Brooks D M, Supinski B R de, et al. Methods of inference and learning for performance modeling of parallel applications[C]//Proc of the 12th ACM SIGPLAN Sympon Principles and Practice of Parallel Programming. New York: ACM, 2007: 249–258
[210]	Hallschmid P, Saleh R. Fast design space exploration using local regression modeling with application to ASIPs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2008, 27(3): 508−515 doi: 10.1109/TCAD.2008.915532
[211]	Zhang Changshu, Ravindran A, Datta K, et al. A machine learning approach to modeling power and performance of chip multiprocessors[C]//Proc of the 29th Int Conf on Computer Design. Los Alamitos, CA: IEEE Computer Society, 2011: 45–50
[212]	Beg A, Prasad P W C, Singh A K, et al. A neural model for processor-throughput using hardware parameters and software’s dynamic behavior[C]//Proc of the 12th Int Conf on Intelligent Systems Design and Applications. Piscataway, NJ: IEEE, 2012: 821–825
[213]	Paone E, Vahabi N, Zaccaria V, et al. Improving simulation speed and accuracy for many-core embedded platforms with ensemble models[C]//Proc of the 16th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2013: 671–676
[214]	Castillo P A, Mora A M, Guervós J J M, et al. Architecture performance prediction using evolutionary artificial neural networks[C]//Proc of the Applications of Evolutionary Computing. Berlin: Springer, 2008: 175–183
[215]	Khan S, Xekalakis P, Cavazos J, et al. Using predictive modeling for cross-program design space exploration in multicore systems[C]//Proc of the 16th Int Conf on Parallel Architecture and Compilation Techniques. Los Alamitos, CA: IEEE Computer Society, 2007: 327–338
[216]	Dubach C, Jones T M, O’Boyle M F P. Rapid early-stage microarchitecture design using predictive models[C]//Proc of the 27th Int Conf on Computer Design. Los Alamitos, CA: IEEE Computer Society, 2009: 297–304
[217]	Özisikyilmaz B, Memik G, Choudhary A N. Machine learning models to predict performance of computer system design alternatives[C]//Proc of the 37th Int Conf on Parallel Processing. Los Alamitos, CA: IEEE Computer Society, 2008: 495–502
[218]	Özisikyilmaz B, Memik G, Choudhary A N. Efficient system design space exploration using machine learning techniques[C]//Proc of the 45th Design Automation Conf. New York: ACM, 2008: 966–969
[219]	Ghosh A, Givargis T. Cache optimization for embedded processor cores: An analytical approach[J]. ACM Transactions on Design Automation of Electronic Systems, 2004, 9(4): 419−440 doi: 10.1145/1027084.1027086
[220]	Li Sheng, Chen Ke, Ahn J H, et al. CACTI-p: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques[C]//Proc of the 30th Int Conf on Computer-Aided Design. Los Alamitos, CA: IEEE Computer Society, 2011: 694–701
[221]	Li Sheng, Ahn J H, Strong R D, et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures[C]//Proc of the 42nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2009: 469–480
[222]	Karkhanis T S, Smith J E. A first-order superscalar processor model[C]//Proc of the 31st Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2004: 338–349
[223]	Genbrugge D, Eyerman S, Eeckhout L. Interval simulation: Raising the level of abstraction in architectural simulation[C/OL]//Proc of the 16th Int Symp on High-Performance Computer Architecture. Piscataway, NJ: IEEE, 2010[2023-12-18]. https://ieeexplore.ieee.org/document/5416636
[224]	Breughe M, Eyerman S, Eeckhout L. A mechanistic performance model for superscalar in-order processors[C]//Proc of the 2012 IEEE Int Symp on Performance Analysis of Systems & Software. Los Alamitos, CA: IEEE Computer Society, 2012: 14–24
[225]	Van den Steen S, Eyerman S, De Pestel S, et al. Analytical processor performance and power modeling using micro-architecture independent characteristics[J]. IEEE Transactions on Computers, 2016, 65(12): 3537−3551
[226]	De Pestel S, Van den Steen S, Akram S, et al. RPPM: Rapid performance prediction of multithreaded workloads on multicore processors[C]//Proc of the 2019 IEEE Int Symp on Performance Analysis of Systems and Software. Piscataway, NJ: IEEE, 2019: 257–267
[227]	Jongerius R, Mariani G, Anghel A, et al. Analytic processor model for fast design-space exploration[C]//Proc of the 33rd IEEE Int Conf on Computer Design. Los Alamitos, CA: IEEE Computer Society, 2015: 411–414
[228]	Jongerius R, Anghel A, Dittmann G, et al. Analytic multi-core processor model for fast design-space exploration[J]. IEEE Transactions on Computers, 2018, 67(6): 755−770 doi: 10.1109/TC.2017.2780239
[229]	Kwon J, Carloni L P. Transfer learning for design-space exploration with high-level synthesis[C]//Proc of the 2nd ACM/IEEE Workshop on Machine Learning for CAD. New York: ACM, 2020: 163–168
[230]	Zhang Zheng, Chen Tinghuan, Huang Jiaxin, et al. A fast parameter tuning framework via transfer learning and multi-objective Bayesian optimization[C]//Proc of the 59th ACM/IEEE Design Automation Conf. New York: ACM, 2022: 133–138
[231]	Zhang Keyi, Asgar Z, Horowitz M. Bringing source-level debugging frameworks to hardware generators[C]//Proc of the 59th ACM/IEEE Design Automation Conf. New York: ACM, 2022: 1171–1176
[232]	Xiao Qingcheng, Zheng Size, Wu Bingzhe, et al. HASCO: Towards agile hardware and software co-design for tensor computation[C]//Proc of the 48th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2021: 1055–1068
[233]	Esmaeilzadeh H, Ghodrati S, Kahng A B, et al. Physically accurate learning-based performance prediction of hardware-accelerated ML algorithms[C]//Proc of the 4th ACM/IEEE Workshop on Machine Learning for CAD. New York: ACM, 2022: 119–126
[234]	Sun Qi, Chen Tinghuan, Liu Siting, et al. Correlated multi-objective multi-fidelity optimization for HLS directives design[C]//Proc of the 24th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2021: 46–51
[235]	Wu Y N, Tsai P A, Parashar A, et al. Sparseloop: An analytical approach to sparse tensor accelerator modeling[C]//Proc of the 55th IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2022: 1377–1395
[236]	Huang Qijing, Kang M, Dinh G, et al. CoSA: Scheduling by constrained optimization for spatial accelerators[C]//Proc of the 48th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2021: 554–566
[237]	Mei Linyan, Houshmand P, Jain V, et al. ZigZag: Enlarging joint architecture-mapping design space exploration for DNN accelerators[J]. IEEE Transactions on Computers, 2021, 70(8): 1160−1174 doi: 10.1109/TC.2021.3059962