基于分布内存的层次短语机器翻译并行化算法

赵博; 黄书剑; 戴新宇; 袁春风; 黄宜华

doi:10.7544/issn1000-1239.2014.20131335

基于分布内存的层次短语机器翻译并行化算法

Parallel Algorithm for Hierarchical Phrase Machine Translation Based on Distributed Memory Storage

摘要

摘要: 近年来，为了提高统计机器翻译系统的准确性，普遍应用海量语料训练出大规模语言模型和翻译模型.而模型规模的不断增大，给统计机器翻译带来了突出的计算性能问题，使得现有的单机串行化翻译处理难以在较快的时间内完成计算，该问题在处理联机翻译时更为突出.为了克服单机机器翻译算法在这方面的局限性，提高大规模统计机器翻译处理的计算性能，面向一个实际的联机翻译系统，提出了一个分布式和并行化翻译解码算法框架，对整个大规模语言模型和翻译模型同时采用分布式存储和并行化查询机制，在此基础上进一步研究实现完整的翻译解码并行化算法.研究实现了一个基于分布式内存数据库的层次短语并行化机器翻译解码器，该解码器使用分布式内存数据库存储和查询大数据量的翻译模型表和语言模型表，克服了传统的机器翻译系统所面临的内存容量和并发度方面的限制.为了进一步提高并行解码速度，还研究实现了另外3项优化技术：1)将翻译模型表的同步规则和Trie树结构的语言模型表转化为基于内存数据库的“键-值”结构的Hash索引表的方法；2)对Cube-Pruning算法进行了修改使其适用于批量查询；3)采用并优化了批量查询方式减少语言和翻译模型查询时的网络传输开销.所提出的解码算法实现了基于大规模语料统计机器翻译时的快速解码，并具备优异的系统可扩展性.实验结果表明：与单机解码器相比，单句翻译速度可提高2.7倍，批量翻译作业的总体解码性能可提高至少11.7倍，实现了显著的计算性能提升.

Abstract: In recent years, in order to improve the accuracy of SMT (statistical machine translation) system, massive corpus has been widely applied to train language and translation models. As the scale of the language and translation models increase, computing performance becomes a challenging issue for SMT, which makes existing single-machine translation algorithms and systems difficult to complete the computation in time, especially when dealing with online translation. In order to overcome the limitations of single-machine translation decoding algorithm and improve the computing performance of large-scale SMT toward a practical online translation system, this paper proposes a distributed and parallel translation decoding algorithm and framework by adopting a distributed storage and parallel query mechanism upon both the language and translation models. We develop a hierarchical phrase parallel decoder by using a distributed memory database to store and query large-scale translation and language model tables. To further improve the speed of parallel decoding, we also make three additional optimizations: 1) Transform the synchronous rules in translation model table and the Trie data structure of language model table into a Hash indexed key-value structure for use in the distributed memory database; 2) Modify the cube-pruning algorithm to make it suitable for batch query; 3) Adopt and optimize the batch query for language model and translation model tables to reduce the network overhead. Our implemented algorithm can achieve fast decoding of SMT based on large-scale corpus and provide excellent scalability. Experimental results show that, compared with the single-machine decoder, our parallel decoder can reach 2.7 times of speedup for single sentence translation and reach 11.7 times of speedup for batch translation jobs, achieving significant performance improvement.

HTML全文

参考文献(0)

施引文献

资源附件(0)