ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (5): 1097-1108.doi: 10.7544/issn1000-1239.2017.20151062

• 软件技术 • 上一篇    下一篇

一种正交分解大数据处理系统设计方法及实现

向小佳1,赵晓芳1,刘洋1,龚关俊1,张晗2   

  1. 1(中国科学院计算技术研究所 北京 100190); 2(北方工业大学计算机学院 北京 100144) (xiangxiaojia@ncic.ac.cn)
  • 出版日期: 2017-05-01
  • 基金资助: 
    国家自然科学基金项目(61202061,61202413);中国科学院计算技术研究所创新课题项目(20146080)

An Orthogonal Decomposition Based Design Method and Implementation for Big Data Processing System

Xiang Xiaojia1, Zhao Xiaofang1, Liu Yang1, Gong Guanjun1, Zhang Han2   

  1. 1(Institute of Computing Technology, Chinese Academy of Science, Beijing 100190); 2(School of Computer Science, North China University of Technology, Beijing 100144)
  • Online: 2017-05-01

摘要: MapReduce等计算框架的出现开启了大数据处理新纪元,以Hadoop,Spark为代表的大数据处理系统具有大吞吐率、跨平台、高可扩展的优势,并得到广泛应用.然而,为避免与具体的操作系统、硬件平台绑定,这些系统的设计与优化集中在计算模型、调度算法等方面,无法充分利用底层平台的优势.提出了一种基于正交分解的大数据处理系统设计与优化方法,将系统分解为松耦合的多个功能正交的模块,使存储、处理功能分离出来,交给能够利用底层平台操作系统甚至硬件资源的存储、执行引擎,原大数据系统退化为调度平台;进而,提出基于锁无关机制的存储底层优化策略和基于指令超级优化的执行引擎底层优化策略.以此为指导,以Hadoop作为兼容和改进的对象,实现了原型大数据处理系统Arion.Arion既能保持Hadoop的跨平台、高可扩展的优势,又能消除任务执行的瓶颈,其本地化的设计与优化手段对非Hadoop平台同样有效.通过在原型系统上的实验证明,Arion能够提升大数据处理任务的执行效率,最高达7.7%.

关键词: 大数据处理系统, 计算框架, 本地化, 锁无关, 超级优化, 执行引擎

Abstract: Big data stimulates a revolution in data storage and processing field, resulting in the thriving of big data processing systems, such as Hadoop, Spark, etc, which build a brand new platform with platform independence, high throughput, and good scalability. On the other hand, substrate platform underpinning these systems are ignored because their designation and optimization mainly focus on the processing model and related frameworks & algorithms. We here present a new loose coupled, platform dependent big data processing system designation & optimization method which can exploit the power of underpinning platform, including OS and hardware, and get more benefit from these local infrastructures. Furthermore, based on local OS and hardware, two strategies, that is, lock-free based storage and super optimization based data processing execution engine, are proposed. Directed by the aforementioned methods and strategies, we present Arion, a modified version of vanilla Hadoop, which show us a new promising way for Hadoop optimization, meanwhile keeping its high scalability and upper layer platform independence. Our experiments prove that the prototype Arion can accelerate big data processing jobs up to 7.7%.

Key words: big data processing system, computing framework, localization, lock free, super optimization, excecution engine

中图分类号: