• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Dandan, Li Zusong, Wang Jian, Zhang Longbing, Hu Weiwu, Liu Zhiyong. Adaptive Stack Cache with Fast Address Generation[J]. Journal of Computer Research and Development, 2007, 44(1): 169-176.
Citation: Dandan, Li Zusong, Wang Jian, Zhang Longbing, Hu Weiwu, Liu Zhiyong. Adaptive Stack Cache with Fast Address Generation[J]. Journal of Computer Research and Development, 2007, 44(1): 169-176.

Adaptive Stack Cache with Fast Address Generation

More Information
  • Published Date: January 14, 2007
  • With the processor-memory performance gap continuing to grow, the performance of memory access becomes the major bottleneck of the performance improvement for modern microprocessors. Adaptive stack cache with fast address generation policy is proposed by investigating memory access behavior of programs. Adaptive stack cache with fast address generation policy decouples stack references from other data references, improves instruction-level parallelism, reduces data cache pollution and decreases data cache miss rate. Stack access latency can be reduced by using fast address generation scheme proposed here. Adaptive stack cache with fast address generation policy can also avoid unnecessary memory traffic. Stack cache can be disabled adaptively, when it is overflown. It can also be applied to multithread scheme by adding thread identifier. The results obtained indicate that about 25.8% of all memory reference instructions in SPEC CPU2000 benchmarks are executed in parallel by adopting adaptive stack cache with fast address generation. On average 9.4% data cache miss is reduced. The performance is improved significantly. The average IPC speedup is 6.9%.
  • Related Articles

    [1]Chen Shuping, Wei Hongmei, Wang Fei, Li Yi, He Wangquan, Qi Fengbin. Method to Create Aggregate Tree for Hardware Supported Collectives[J]. Journal of Computer Research and Development, 2024, 61(2): 503-517. DOI: 10.7544/issn1000-1239.202220684
    [2]Liu Shifang, Zhao Yonghua, Yu Tianyu, Huang Rongfeng. Efficient Implementation of Parallel Symmetric Matrix Tridiagonalization Algorithm on GPU Cluster[J]. Journal of Computer Research and Development, 2020, 57(12): 2635-2647. DOI: 10.7544/issn1000-1239.2020.20190731
    [3]Li Tao, Liu Xuechen, Zhang Shuai, Wang Kai, Yang Yulu. Parallel Support Vector Machine Training with Hybrid Programming Model[J]. Journal of Computer Research and Development, 2015, 52(5): 1098-1108. DOI: 10.7544/issn1000-1239.2015.20131492
    [4]Cao Hongjia, Lu Yutong, Xie Min, and Zhou Enqiang. Experiences and Scalability Analysis of Parallel Job Startup[J]. Journal of Computer Research and Development, 2013, 50(8): 1755-1761.
    [5]Li Qiang, Sun Ninghui, Huo Zhigang, Ma Jie. Optimizing MPI Alltoall Communications in Multicore Clusters[J]. Journal of Computer Research and Development, 2013, 50(8): 1744-1754.
    [6]Lü Huiwei, Cheng Yuan, Bai Lu, Chen Mingyu, Fan Dongrui, Sun Ninghui. Parallel Simulation of Many-Core Processor and Many-Core Clusters[J]. Journal of Computer Research and Development, 2013, 50(5): 1110-1117.
    [7]Xie Min, Lu Yutong, Zhou Enqiang, Cao Hongjia, and Yang Xuejun. Implementation and Evaluation of MPI Checkpointing System over Lustre File System[J]. Journal of Computer Research and Development, 2007, 44(10): 1709-1716.
    [8]Zhao Yonghua, Chi Xuebin, Cheng Qiang. Efficient Algorithms for Matrix Eigenproblem Solver on SMP Cluster[J]. Journal of Computer Research and Development, 2007, 44(2): 334-340.
    [9]Zhang Wenli, Chen Mingyu, and Fan Jianping. Emulation and Forecast of HPL Test Performance[J]. Journal of Computer Research and Development, 2006, 43(3): 557-562.
    [10]Zhou Enqiang, Lu Yutong, and Shen Zhiyu. Implementation of Checkpoint System Towards Large Scale Parallel Computing[J]. Journal of Computer Research and Development, 2005, 42(6): 987-992.

Catalog

    Article views (650) PDF downloads (504) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return