高级检索
    张峻, 代锋, 马宜科, 张勇东. 多层次细粒度并行HEVC帧内模式选择算法[J]. 计算机研究与发展, 2016, 53(4): 873-883. DOI: 10.7544/issn1000-1239.2016.20148455
    引用本文: 张峻, 代锋, 马宜科, 张勇东. 多层次细粒度并行HEVC帧内模式选择算法[J]. 计算机研究与发展, 2016, 53(4): 873-883. DOI: 10.7544/issn1000-1239.2016.20148455
    Zhang Jun, Dai Feng, Ma Yike, Zhang Yongdong. Multi-Level and Fine-Grained Parallel HEVC Intra Mode Decision Method[J]. Journal of Computer Research and Development, 2016, 53(4): 873-883. DOI: 10.7544/issn1000-1239.2016.20148455
    Citation: Zhang Jun, Dai Feng, Ma Yike, Zhang Yongdong. Multi-Level and Fine-Grained Parallel HEVC Intra Mode Decision Method[J]. Journal of Computer Research and Development, 2016, 53(4): 873-883. DOI: 10.7544/issn1000-1239.2016.20148455

    多层次细粒度并行HEVC帧内模式选择算法

    Multi-Level and Fine-Grained Parallel HEVC Intra Mode Decision Method

    • 摘要: 在众核平台上并行加速是解决高效视频编码(high efficiency video coding, HEVC)标准编码复杂度高的有效方法.传统的粗粒度并行方案如Tiles和WPP未能在并行度和编码质量之间取得较好的平衡,对编码质量影响较大或者并行度不高.充分挖掘HEVC帧内模式选择中的并行性,提出了一种在CTU内使用的多层次细粒度的帧内模式选择算法.具体说来,对帧内模式选择过程进行了子任务划分,分析并消除了相邻编码块之间多种阻碍并行计算的数据依赖关系,包括帧内预测参考像素依赖、预测模式依赖和熵编码依赖等,实现了同一个CTU内所有层次的细粒度编码块的代价计算和模式选择并行进行.将算法在Tile-Gx36平台上实现,实验结果表明此并行算法与HEVC参考代码HM相比能获得18倍的整体编码加速比而且编码质量损失较小(码率上升3%).

       

      Abstract: The coding mode space of high efficiency video coding (HEVC) is extremely large so it needs huge amount of computations for HEVC encoders to do mode decision (MD). Parallelizing HEVC encoding on many-core platforms is an efficient and promising approach to fulfill the high computational demands. Traditional coarse-grained parallelizing schemes such as Tiles and wavefront parallel processing (WPP) either cause too much quality loss or cant afford a high parallelism degree. In this paper, the potential parallelism in HEVC intra MD process is exploited, and a multi-level and fine-grained highly parallel intra MD method which works in a coding tree unit (CTU) is proposed. Specifically, the intra MD process in a CTU is divided into six types of sub-tasks, and the data dependencies among adjacent blocks that hinder parallel processing are analyzed and removed, including intra prediction dependency, prediction mode dependency and entropy coding dependency; consequently the MD computation for all fine-grained coding blocks of different levels within the same CTU can be computed concurrently. The proposed parallel MD method is implemented on Tile-Gx36 platform. Experimental results show that the proposed parallel MD method gets an overall speed up of more than 18x with acceptable quality loss (about 3% bit-rate increasing), compared with the non-parallel baseline HM.

       

    /

    返回文章
    返回