多核处理器中基于Radix-Join的嵌套循环连接优化

邓亚丹  景  宁  熊  伟

多核处理器中基于Radix-Join的嵌套循环连接优化

邓亚丹景宁熊伟

Nested Loop Join Optimization Based on Radix-Join in Chip Multi-Processor

Deng Yadan, Jing Ning, and Xiong Wei

摘要

摘要: 针对目前主流的多核处理器,研究了基于共享Cache多核处理器的数据库Nested Loop Join(NINLJ)优化.针对无索引情况下的NLJ,提出了基于Radix-NL-Join算法的NLJ多线程执行框架.从减少Cache访问冲突和提高Cache命中率两个方面优化了NINLJ多线程执行框架中的聚集划分和聚集连接线程.主要贡献如下:1.针对多线程访问共享Cache容易出现共享Cache访问冲突的问题,优化了聚集划分阶段的多线程聚集划分线程的启动时机;2.针对聚集连接阶段,聚集连接线程Cache访问性能不佳,利用聚集连接线程顺序访问聚集的优势,采用预取线程提高聚集连接线程的性能;3.在实验中,基于开源数据库EaseDB实现了上述多线程执行框架,测试了多线程NLJ的性能.实验结果表明,提出的NLJ多线程执行框架,可以充分利用多核处理器的计算资源,并有效地解决共享Cache在多线程条件下的Cache访问冲突问题,大大提高了NLJ的性能，相对于未采用Cache优化的多线程Radix-NL-Join算法,其性能提升了26%左右.

Abstract: Aiming at current chip multi-processor(CMP), presented in this paper is a non-indexed nested loop join (NINLJ) optimization based on shared cache CMP. The authors firstly present multithreaded NINLJ execution framework based on radix-NL-join algorithm, and then, through reducing cache conflict and improving cache hit ratio, optimize cache performance of cluster partition thread and cluster join thread in the framework. The main contributions are as follows: 1.Aiming at the shared cache confliction when multiple threads access shared cache simultaneously, the start time of cluster partition thread is optimized to reduce shared cache confliction in cluster partition phase; 2. In cluster join phase, cluster join threads have poor cache behaviors. To solve this performance bottleneck, the advantage of sequent cluster access is utilized when cluster join threads executing, and preload thread is adopted to preload cluster from main memory to L2-cache before cluster join threads need it; 3.In the experiments, the framework is realized in EaseDB, and the performance of multithreaded NINLJ is tested. The experiment results show that the multithreaded NINLJ execution framework could fully utilize computing resource of CMP and effectively solve shared cache conflict in multithreaded environment, and the performance of NINLJ is improved. The algorithm proposed outperforms traditional multithreaded Radix-NL-Join by 26% on average.

HTML全文

参考文献(0)

施引文献

资源附件(0)