高级检索
    刘 立, 陈明宇, 包云岗, 许建卫, 樊建平. 一种基于页面级流缓存结构的流检测和预取算法[J]. 计算机研究与发展, 2009, 46(10): 1758-1767.
    引用本文: 刘 立, 陈明宇, 包云岗, 许建卫, 樊建平. 一种基于页面级流缓存结构的流检测和预取算法[J]. 计算机研究与发展, 2009, 46(10): 1758-1767.
    Liu Li, Chen Mingyu, Bao Yungang, Xu Jianwei, Fan Jianping. A Stream Checking and Prefetching Algorithm Based on Page Level Stream Buffer Architecture[J]. Journal of Computer Research and Development, 2009, 46(10): 1758-1767.
    Citation: Liu Li, Chen Mingyu, Bao Yungang, Xu Jianwei, Fan Jianping. A Stream Checking and Prefetching Algorithm Based on Page Level Stream Buffer Architecture[J]. Journal of Computer Research and Development, 2009, 46(10): 1758-1767.

    一种基于页面级流缓存结构的流检测和预取算法

    A Stream Checking and Prefetching Algorithm Based on Page Level Stream Buffer Architecture

    • 摘要: 为了提高网络内存的访存性能,基于一种页面级流缓存和预取结构提出了可变步长的带状流检测算法VSS(variable stride stream)和基于时钟步长的流预取优化算法来优化网络访存性能.带状流检测算法解决了固定步长流检测下循环访问中虚拟页地址的跳跃问题,消除了断流,可以有效提高流检测的覆盖率.基于时钟步长的流预取优化动态调整预取长度,可以解决有些预取不能及时取回的问题,进一步提高预取性能.通过和顺序预取算法的比较可以看出,VSS算法可以实现高准确率、低通信开销的预取.通过模拟分析了这种流缓存和预取机制在网络访存系统中的应用,验证了以少量性能下降换取灵活的远程内存扩展方法的可行性.

       

      Abstract: Proposed in this paper is a VSS (variable stride stream) algorithm to improve the memory access characteristics. A prefetching algorithm based on VSS to optimize memory performance is presented on a page level stream buffer architecture. With VSS, the problem of virtual address jumping in cycle access is resolved that can improve the stream covering rate. When time stride is less than remote delay, the prefetching cant be accomplished in time. The stream prefetching algorithm optimized by time stride can dynamically adjust prefetching length that can improve prefetching performance. The VSS prefetching algorithm has higher accuracy and lower communication cost in contrast with sequence prefetching algorithm. The performance of the architecture and prefetching algorithm is evaluated through a performance model. The application slowdown caused by remote memory is evaluated through the model based on memory access traces such as Linpack and SPEC2000. The results show that with the help of cache and prefetching engine, for most applications having regular memory access patterns, the performance is similar to that on full memory configurationon high-speed network. So it is feasible to build flexible extended remote memory architecture to break the memory capacity restriction for some memory-bound applications with a little performance decrease and the memory can be extended easily and unlimitedly.

       

    /

    返回文章
    返回