ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (9): 1965-1975.doi: 10.7544/issn1000-1239.2015.20140832

• 软件技术 • 上一篇    下一篇


王晓燕1,2,3, 陈晋川1, 郭小燕4, 杜小勇1,3,5   

  1. 1(数据工程与知识工程教育部重点实验室(中国人民大学) 北京 100872); 2(最高人民法院信息中心 北京 100745); 3(中国人民大学信息学院 北京 100872); 4(易安信中国研究院 北京 100084); 5(软件开发环境国家重点实验室(北京航空航天大学) 北京 100191) (
  • 出版日期: 2015-09-01
  • 基金资助: 

A Nash-Pareto Strategy Based Automatic Data Distribution Method and Its Supporting Tool

Wang Xiaoyan1,2,3, Chen Jinchuan1, Guo Xiaoyan4, Du Xiaoyong1,3,5   

  1. 1(Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872); 2(Information Center, The Supreme People’s Court, Beijing 100745); 3(School of Information, Renmin University of China, Beijing 100872); 4(EMC Labs China, Beijing 100084); 5(State Key Laboratory of Software Development Environment (Beihang University), Beijing 100191)
  • Online: 2015-09-01

摘要: 大数据时代的来临为数据存储与管理提出了新的挑战.随着数据量的迅猛增加,自动数据分布逐渐成为分布式系统中的研究重点和难点.根据对数据分布问题中数据、负载和节点3个要素的研究和分析,将数据分布问题抽象为称为DaWN(data,workload,node)的三角模型,并将3要素之间的相互关联关系抽象为数据分片、数据分配和负载执行3条纽带;据此,提出了解决自动数据分布问题的基本架构,对各功能模块的协动关系进行探讨;同时,结合已有的研究工作,采用Nash-Pareto优化均衡策略使得前述各机制相得益彰,实验结果验证了其有效性.为使研究工作更多地应用于实践,设计并实现了自动数据分布辅助原型工具ADDvisor(automatic data distribution advisor),协同支持自动数据分布的执行,共同促进大规模分布式联机事务处理系统的并行性能和自动化管理技术的发展.

关键词: 数据分布, 三角模型, 自动化解决方案, 优化均衡, 联机事务处理

Abstract: The era of big data brings new challenges in the field of data storage and management. With the dramatic increase of data volume, automatic data distribution has been one of the key techniques and intractable problem for distributed systems. Based on the studies on data, workload and node in this field, this work abstracts the problem of data distribution as a triangle model called DaWN (data, workload, node), and summarizes their relationships with each other as data fragmentation, data allocation and workload processing. According to DaWN, it proposes an automatic solution for data distribution in large-scale on-line transaction processing (OLTP) applications, and discusses the details and interactions of each module in this consolidation architecture. Combined with our existing research, it applies the optimal equilibrium conduct of Nash-Pareto strategy into practice. According to the results of a series of experiments, the proposedapproach shows nice overall performance and effectiveness. Meanwhile, this work also implements a prototype tool called ADDvisor for automatic data distribution supporting in the expect of smoothly promoting more research work into real world practice and effectively coordinating automatic data distribution in large scale OLTP distributed applications.

Key words: data distribution, triangle model, automatic solution, optimal equilibrium, on-line transaction processing (OLTP)