基于结构并行的MRBP算法
MapReduce Back Propagation Algorithm Based on Structure Parallelism
-
摘要: BP(back propagation)算法是一种常用的神经网络学习算法,而基于Hadoop集群MapReduce编程模型的BP(MapReduce back propagation, MRBP)算法在处理大数据问题时,表现出良好的性能,因而得到了广泛应用.但是,由于该算法缺乏神经节点之间细粒度结构并行的能力,当遇到数据维度较高、网络节点较多时,性能还显不足.另一方面,Hadoop集群计算节点通信不能由用户直接控制,现有基于集群系统的结构并行策略不能直接用于MRBP算法.为此,提出一种适合于Hadoop集群的结构并行MRBP (structure parallelism based MapReduce back propagation, SP-MRBP)算法,该算法将神经网络各层划分为多个结构,通过逐层并行-逐层集成(layer-wise parallelism,layer-wise ensemble, LPLE)的方式,实现了MRBP算法的结构并行.同时,推导出了SP-MRBP算法和MRBP算法计算时间解析表达式,以此分析了2种算法时间差和SP-MRBP算法最优并行规模.据了解,这是首次将结构并行策略引入MRBP算法中.实验表明,当神经网络规模较大时,SP-MRBP较之原算法,具有较好的性能.Abstract: Back propagation (BP) algorithm is a widely used learning algorithm that is used for training multiple layer neural networks. BP algorithm based on Hadoop cluster and MapReduce parallel programming model (MRBP) shows good performance on processing big data problems. However, it lacks the capability of fine-grained parallelism. Thus, when confronted with high dimension data and neural networks with large nodes, the performance is low relatively. On the other hand, since the users can’t control the communication of Hadoop computing nodes, the existing structure parallel scheme based on clusters can’t be directly applied to MRBP algorithm. This paper proposes a structure parallelism based MRBP algorithm (SP-MRBP), which adopts layer-wise parallelism, layer-wise ensemble (LPLE) strategy to implement structure parallel computing. Also, we derive the analytical expressions of the proposed SP-MRBP algorithm and the classic MRBP algorithm, and obtain the time differences between the both algorithms as well as the optimal number of parallel structures of SP-MRBP algorithm. To the best knowledge of the authors, it is the first time to introduce the structure parallelism scheme to the MRBP algorithm. The experimental results show that, compared with the classic MRBP algorithm, our algorithm has better performance on processing efficiency when facing large neural networks.