ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (11): 2508-2515.doi: 10.7544/issn1000-1239.2017.20160795

• 系统结构 • 上一篇    下一篇

DCuckoo:基于片内摘要的高性能散列表

蒋捷1,杨仝1,2,张梦瑜1,代亚非1,黄亮3,郑廉清4   

  1. 1(北京大学信息科学技术学院 北京 100871); 2(国防科学技术大学高性能计算协同创新中心 长沙 410073); 3(94782部队65分队 杭州 310021); 4(西京学院控制工程系 西安 710123) (jiangjiecs@pku.edu.cn)
  • 出版日期: 2017-11-01
  • 基金资助: 
    国家自然科学基金项目(61232004, 61672061);国家"九七三"重点基础研究发展计划基金项目(2014CB340400);国家重点研发计划项目(2016YFB1000300);中国科学院先导科技专项课题(XDA06040602)

DCuckoo: An Efficient Hash Table with On-Chip Summary

Jiang Jie1, Yang Tong1,2, Zhang Mengyu1, Dai Yafei1, Huang Liang3, Zheng Lianqing4   

  1. 1(School of Electronics Engineering and Computer Science, Peking University, Beijing 100871); 2(Collaborative Innovation Center of High Performance Computing, National University of Defense Technology, Changsha 410073); 3(Military 94782/65, Hangzhou 310021); 4(Department of Control Engineering, Xijing University, Xi'an 710123)
  • Online: 2017-11-01

摘要: 散列表(Hash table)由于其支持高效的记录更新与检索操作,在计算机相关的各个领域中有着广泛的应用.但散列表有2个明显的缺点:冲突和低效的内存利用.最小完美散列使用N个位置存储N条记录,解决了冲突和空间效率的问题,但该算法不支持增量的更新.目标是设计一种高效的散列表,能够支持高速查询、最坏情况可以保证的高速更新、高效的空间使用以及动态的容量改变.结合 Cuckoo 散列和 d-left 散列的实现,提出了一个新的散列表设计方案——DCuckoo.DCuckoo 使用多级子表并应用了 Cuckoo 散列中移动已有元素的机制以提高装载率,且只保留了最末级子表的指针以减少空间浪费.为了进一步优化查询性能,DCuckoo 在片内内存中使用指纹和位图作为摘要,在查询时先匹配指纹,以减少对片外内存的访问次数.对 DCuckoo 进行了一系列实验,与其他5种散列表进行比较,发现 DCuckoo 达到了设计目标,并且在各项指标上均好于已有的散列表设计.

关键词: 散列表, 检索, 指纹, 片内内存, 键值存储

Abstract: Hash tables are extensively used in many computer-related areas because of their efficiency in query and insertion operations. However, Hash tables have two disadvantages: collisions and memory inefficiency. To solve these two disadvantages, minimal perfect Hash table uses N locations to store N incoming elements. However, MPHT doesn't support incremental updates. Therefore, in this paper, combining Cuckoo hashing and d-left hashing, we propose a novel Hash table architecture called DCuckoo, which ensures fast query speed, fast update speed in worst cases, efficient utilization of memory and dynamic capacity change. In DCuckoo, multiple sub-tables and Cuckoo hashing's mechanism of transferring existing elements are used to improve the load factor. Pointers except for ones in the last sub-table are eliminated for less wasted space. Also, in order to optimize the query performance, fingerprints and bitmaps are used as a summary in on-chip memory to reduce off-chip memory accesses. The bucket will be probed only if the corresponding fingerprint is matched in on-chip memory. We conduct a series of experiments to compare the performance of DCuckoo and other five Hash table schemas. Results demonstrate that DCuckoo eliminates shortcomings of both Cuckoo hashing and d-left hashing, hence DCuckoo achieves the four design goals.

Key words: Hash table, lookup, fingerprint, on-chip memory, key-value store

中图分类号: