Abstract:
Blockchain system has been broadly applied and studied in the last few years with query being essential to facilitate a wide range of blockchain application scenarios such as data provenance in the logistics chain. As blockchain data continues to grow, the efficiency and verifiability of data query have become hot issues in blockchain research. The high concurrent blocks of the DAG-based blockchain make it difficult for queries to traverse sequentially like the traditional chain structure. Breadth-first or depth-first query strategies can be adopted according to the DAG structure, but this method has the problems of low efficiency and poor verification. We propose an efficient and verifiable query mechanism, called Lever, for DAG-based blockchains based on the learned index. The mechanism aims to improve the efficiency and verifiability of blockchain query by introducing learned index technology with the data distribution characteristic to optimize the indexing process in the blockchain. Learned index is a novel indexing technology that reduces the storage space and query time of indexes by learning the distribution of data. In this paper, learned index is applied to the mapping between epoch height and timestamp of the DAG-based blockchain, greatly improving query speed and efficiency. At the same time, in order to speed up the data filtering of multiple blocks in an epoch, a Bloom filter is added to the block header, and an aggregated Bloom filter is produced by miner for each epoch. In addition, to guarantee the authentication (i.e., correctness and completeness) of query results, the mechanism introduces a sorted Merkle tree to perform proof with the Merkle tree branch by combining the filtering and sorting processes, effectively reducing the size of the verification object (
VO), thereby enhancing the verifiability of blockchain query. Experimental results show that Lever can effectively improve the efficiency and verifiability of DAG-based blockchain query, and Lever improves query performance by up to 10 times and reduces
VO size overhead by 90% compared with the basic query mechanism of Conflux.