ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (12): 2702-2710.doi: 10.7544/issn1000-1239.2014.20131329

• 软件技术 • 上一篇    下一篇

MapReduce框架下基于超平面投影划分的Skyline计算

王淑艳1,杨鑫2,李克秋2   

  1. 1(大连理工大学软件学院 辽宁大连 116024);2(大连理工大学计算机科学与技术学院 辽宁大连 116024) (wangshuyandlut@gmail.com)
  • 出版日期: 2014-12-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61225010,61432002,61173162,61300084);微软亚洲研究院与中国科学院计算机网络信息中心合作项目

Skyline Computing on MapReduce with Hyperplane-Projections-Based Partition

Wang Shuyan1, Yang Xin2, Li Keqiu2   

  1. 1(School of Software, Dalian University of Technology, Dalian, Liaoning 116024); 2(School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024)
  • Online: 2014-12-01

摘要: 近年来,Skyline计算在决策应用中起着越来越重要的作用.针对单机处理的研究已较为成熟.现今大数据爆炸,Skyline计算面临着大数据处理的问题.MapReduce是一个并行模型,广泛应用于数据密集型应用处理中.众所周知,MapReduce处理要求任务是可分解的.Skyline计算在MapReduce上执行时,分解任务的方法有网格划分、基于角度的划分等.网格划分仅在数据维度较低时表现良好;基于角度的划分适用于低维和高维数据,但在划分前需要一个复杂并且费时的坐标转换过程.现采用一种与基于角度的划分类似的基于超平面投影的划分来分解数据集,这种划分适用于低维和高维数据,而且其在划分前的坐标转换较为简单.根据超平面投影的划分提出了一种在MapReduce上处理Skyline计算的算法MR-HPP(MapReduce with hyperplane-projections-based partition),并在该算法的过滤阶段提出了一种有效的过滤算法PSF(presorting filter).大量基于Hadoop平台的对比实验表明该算法的准确性、高效性和稳定性.

关键词: Skyline计算, 大数据, MapReduce, 超平面投影划分, 过滤

Abstract: Recently, Skyline computing has been playing a more and more important role in decision-making applications. Centralized processing has become relatively mature. Today with explosion of big data, Skyline computing faces the same problem of big data processing. MapReduce is a parallel model and it is widely used in data-intensive processing. As we all know, processing on MapReduce requires the task be decomposable. There are some partition methods for Skyline computing on MapReduce, such as grid partition, angle-based partition and so on. Grid partition can only get good performance on low dimensional dataset. Angle-based partition applies to both low dimensional and high dimensional dataset. But it needs a complex and time-consuming coordinates conversion process before partitioning. In this paper, we employ a method similar to angle-based partition method called hyperplane-projections-based partition to break down our dataset. It applies to both low dimensional and high dimensional dataset and at the same time the coordinates conversion process before partitioning is very simple. We propose an algorithm to process Skyline computing on MapReduce called MR-HPP(MapReduce with hyperplane-projections-based partition) based on hyperplane-projections partition. Moreover, we propose an effective filter method called PSF(presorting filter) in the filter period of MR-HPP. Extensive comparative experiments based on Hadoop have proved that our method is accurate, efficient and stable.

Key words: Skyline computing, big data, MapReduce, hyperplane-projections-based partition, filter

中图分类号: