一种正交分解大数据处理系统设计方法及实现

向小佳; 赵晓芳; 刘洋; 龚关俊; 张晗

doi:10.7544/issn1000-1239.2017.20151062

一种正交分解大数据处理系统设计方法及实现

An Orthogonal Decomposition Based Design Method and Implementation for Big Data Processing System

摘要

摘要: MapReduce等计算框架的出现开启了大数据处理新纪元，以Hadoop，Spark为代表的大数据处理系统具有大吞吐率、跨平台、高可扩展的优势，并得到广泛应用.然而，为避免与具体的操作系统、硬件平台绑定，这些系统的设计与优化集中在计算模型、调度算法等方面，无法充分利用底层平台的优势.提出了一种基于正交分解的大数据处理系统设计与优化方法，将系统分解为松耦合的多个功能正交的模块，使存储、处理功能分离出来，交给能够利用底层平台操作系统甚至硬件资源的存储、执行引擎，原大数据系统退化为调度平台；进而，提出基于锁无关机制的存储底层优化策略和基于指令超级优化的执行引擎底层优化策略.以此为指导，以Hadoop作为兼容和改进的对象，实现了原型大数据处理系统Arion.Arion既能保持Hadoop的跨平台、高可扩展的优势，又能消除任务执行的瓶颈，其本地化的设计与优化手段对非Hadoop平台同样有效.通过在原型系统上的实验证明，Arion能够提升大数据处理任务的执行效率，最高达7.7%.

Abstract: Big data stimulates a revolution in data storage and processing field, resulting in the thriving of big data processing systems, such as Hadoop, Spark, etc, which build a brand new platform with platform independence, high throughput, and good scalability. On the other hand, substrate platform underpinning these systems are ignored because their designation and optimization mainly focus on the processing model and related frameworks & algorithms. We here present a new loose coupled, platform dependent big data processing system designation & optimization method which can exploit the power of underpinning platform, including OS and hardware, and get more benefit from these local infrastructures. Furthermore, based on local OS and hardware, two strategies, that is, lock-free based storage and super optimization based data processing execution engine, are proposed. Directed by the aforementioned methods and strategies, we present Arion, a modified version of vanilla Hadoop, which show us a new promising way for Hadoop optimization, meanwhile keeping its high scalability and upper layer platform independence. Our experiments prove that the prototype Arion can accelerate big data processing jobs up to 7.7%.

HTML全文

参考文献(0)

施引文献

资源附件(0)