ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (12): 2703-2716.doi: 10.7544/issn1000-1239.2020.20190686

• 系统结构 • 上一篇    

一种基于HashGraph的NoSQL型分布式存储因果一致性模型

田俊峰,王彦骉   

  1. (河北大学网络空间安全与计算机学院 河北保定 071002) (河北省高可信信息系统重点实验室(河北大学) 河北保定 071002) (tjf@hbu.edu.cn)
  • 出版日期: 2020-12-01
  • 基金资助: 
    国家自然科学基金青年科学基金项目(61802106)

Causal-Pdh: Causal Consistency Model for NoSQL Distributed Data Storage Using HashGraph

Tian Junfeng, Wang Yanbiao   

  1. (School of Cyber Security and Computer, Hebei University, Baoding, Hebei 071002) (Key Laboratory on High Trusted Information System in Hebei Province (Hebei University), Baoding, Hebei 071002)
  • Online: 2020-12-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China for Young Scientists (61802106).

摘要: 分布式环境中的数据因果一致性指的是对具有因果依赖性的数据进行更新时,须同步更新其他分布式副本中的依赖性元数据,同时满足较高的可用性和性能需求.为解决现有成果中更新可见延迟较高的问题,在数据中心稳定向量的基础上,结合混合逻辑时钟和HashGraph原理,提出了Causal-Pdh模型.使用部分向量和校验值作为消息签名代替了所有向量,并且借鉴HashGraph的原理,改进了各个数据中心同步最新条目的过程,各个父节点随机与其他父节点同步最新状态,从而降低了虚拟投票所使用的时间.最后通过实验验证了Causal-Pdh模型不仅没有影响客户端的吞吐量,而且在时钟偏移较严重时降低了20.85%的用户PUT等待延迟,在系统中存在查询放大的情况时,PUT响应时间降低了23.27%.

关键词: 数据一致性, 因果一致性, 分布式存储, Hash图, 混合逻辑时钟

Abstract: The causal consistency of data in a distributed environment means that when data with causal dependence is updated, the dependency metadata in other distributed copies must be updated simultaneously, while meeting higher availability and performance requirements. To solve the problem of users put latency and updating visible latency in existing results, based on the data center stable vectors, combined with the principle of hybrid logical clocks and the HashGraph, we propose the Causal-Pdh model. To reduce the communication overhead caused by exchanging data between replicates, partial stabel vectors required by synchronizing data and Hash value as the message signatures are used instead of the whole data center stable vectors. The principle of virtual voting in HashGraph is used to improve the process of synchronizing the latest entries in each data center. Just like Gossip about Gossip: each parent node also randomly exchanges the latest status, and updates the clock regularly. This progress reduces the time of virtual voting between the replicates. Finally, it is verified by experiments that the Causal-Pdh model not only doesnt affect the throughput of the client query, but also reduces the wait latency of users put operation by 20.85% when the clock skew is severe. When the query is amplified in the system, the response time of request is reduced by 23.37%.

Key words: data consistency, causal consistency, distributed storage, HashGraph, hybrid logical clocks

中图分类号: