ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 September 2018, Volume 55 Issue 9
A Survey on Unsupervised Image Retrieval Using Deep Features
Zhang Hao,Wu Jianxin
2018, 55(9):  1829-1842.  doi:10.7544/issn1000-1239.2018.20180058
Asbtract ( 2687 )   HTML ( 37)   PDF (2841KB) ( 1406 )  
Related Articles | Metrics
Content-based image retrieval (CBIR) is a challenging task in computer vision. Its goal is to find images among the database images which contain the same instance as the query image. A typical image retrieval approach contains two steps: extract a proper representation vector from each raw image, and then retrieve via nearest neighbor search on those representations. The quality of the image representation vector extracted from raw image is the key factor to determine the overall performance of an image retrieval approach. Image retrieval have witnessed two developing stages, namely hand-craft feature based approaches and deep feature based approaches. Furthermore, there are two phases in each stage, i.e., one phase of using global feature and another phase of using local feature based approaches. Due to the limited representation power of hand-craft features, nowadays, the research focus of image retrieval has shifted to how to make the full utility of deep features. In this study, we give a brief review of the development progress of unsupervised image retrieval based on different ways to extract image representations. Several representative unsupervised image retrieval approaches are then introduced and compared on benchmark image retrieval datasets. At last, we discuss a few future research perspectives.
Visual Analysis for Anomaly Detection in Time-Series: A Survey
Han Dongming,Guo Fangzhou,Pan Jiacheng,Zheng Wenting,Chen Wei
2018, 55(9):  1843-1852.  doi:10.7544/issn1000-1239.2018.20180126
Asbtract ( 2257 )   HTML ( 67)   PDF (2688KB) ( 1067 )  
Related Articles | Metrics
Anomaly detection for time-series denotes the detection and analysis of abnormal and unusual patterns, trends and features. Automatic methods sometimes fail to detect anomalies that are subtle, fuzzy or uncertain, while visual analysis can overcome this challenge by integrating the capability of human users and data mining approaches through visual representations of the data and visual interface. In this paper, we identify the challenges of anomaly detection, and describe the existing works of visual analysis along two categories: types of anomalies (attributes, topologies and hybrids), and anomaly detection means (direct projection, clustering and machine learning). We highlight future research directions.
Blockchain Data Analysis: A Review of Status, Trends and Challenges
Chen Weili,Zheng Zibin
2018, 55(9):  1853-1870.  doi:10.7544/issn1000-1239.2018.20180127
Asbtract ( 6597 )   HTML ( 246)   PDF (3117KB) ( 3838 )  
Related Articles | Metrics
Blockchain technology is a new emerging technology that has the potential to revolutionize many traditional industries. Since the creation of Bitcoin, which represents blockchain 1.0, blockchain technology has been attracting extensive attention and a great amount of user transaction data has been accumulated. Furthermore, the birth of Ethereum, which represents blockchain 2.0, further enriches data type in blockchain. While the popularity of blockchain technology bringing about a lot of technical innovation, it also leads to many new problems, such as user privacy disclosure and illegal financial activities. However, the public accessible of blockchain data provides unprecedented opportunity for researchers to understand and resolve these problems through blockchain data analysis. Thus, it is of great significance to summarize the existing research problems, the results obtained, the possible research trends, and the challenges faced in blockchain data analysis. To this end, a comprehensive review and summary of the progress of blockchain data analysis is presented. The review begins by introducing the architecture and key techniques of blockchain technology and providing the main data types in blockchain with the corresponding analysis methods. Then, the current research progress in blockchain data analysis is summarized in seven research problems, which includes entity recognition, privacy disclosure risk analysis, network portrait, network visualization, market effect analysis, transaction pattern recognition, illegal behavior detection and analysis. Finally, the directions, prospects and challenges for future research are explored based on the shortcomings of current research.
Deep Neural Network Compression and Acceleration: A Review
Ji Rongrong,Lin Shaohui,Chao Fei,Wu Yongjian,Huang Feiyue
2018, 55(9):  1871-1888.  doi:10.7544/issn1000-1239.2018.20180129
Asbtract ( 3223 )   HTML ( 68)   PDF (4080KB) ( 1990 )  
Related Articles | Metrics
In recent years, deep neural networks (DNNs) have achieved remarkable success in many artificial intelligence (AI) applications, including computer vision, speech recognition and natural language processing. However, such DNNs have been accompanied by significant increase in computational costs and storage services, which prohibits the usages of DNNs on resource-limited environments such as mobile or embedded devices. To this end, the studies of DNN compression and acceleration have recently become more emerging. In this paper, we provide a review on the existing representative DNN compression and acceleration methods, including parameter pruning, parameter sharing, low-rank decomposition, compact filter designed, and knowledge distillation. Specifically, this paper provides an overview of DNNs, describes the details of different DNN compression and acceleration methods, and highlights the properties, advantages and drawbacks. Furthermore, we summarize the evaluation criteria and datasets widely used in DNN compression and acceleration, and also discuss the performance of the representative methods. In the end, we discuss how to choose different compression and acceleration methods to meet the needs of different tasks, and envision future directions on this topic.
Survey of Query Processing and Mining Techniques over Large Temporal Graph Database
Wang Yishu,Yuan Ye,Liu Meng,Wang Guoren
2018, 55(9):  1889-1902.  doi:10.7544/issn1000-1239.2018.20180132
Asbtract ( 1723 )   HTML ( 20)   PDF (2466KB) ( 727 )  
Related Articles | Metrics
A temporal graph, as a graph structure with time dimension, plays a more and more important role in query processing and mining of graph data. Different with the traditional static graph, structure of the temporal graph changes with the time series, that is to say the edge of temporal graph is activated by time. And each edge of the temporal graph has the label of recording time, which makes the temporal graph contain more information than the static graph, so the existing data query processing methods cannot be used in the temporal graph. Therefore how to solve the problem of query processing and mining on the temporal graph has attracted much attention of researchers. This paper summarizes the existing query processing and mining methods on temporal graphs. Firstly, this paper gives the application background and basic definition of temporal graph, and combs the existing three typical models which are used to model temporal graph in the existing works. Secondly, this paper introduces and analyzes the existing work on temporal graph from three aspects: graph query processing method, graph mining method and temporal graph management system. Finally, the possible research directions on temporal graph are prospected to provide reference for related research.
A Survey on Scholar Profiling Techniques in the Open Internet
Yuan Sha, Tang Jie, Gu Xiaotao
2018, 55(9):  1903-1919.  doi:10.7544/issn1000-1239.2018.20180139
Asbtract ( 2212 )   HTML ( 33)   PDF (3315KB) ( 1066 )  
Related Articles | Metrics
Scholar profiling from the open Internet has become a hot research topic in recent years. Its goal is to extract the attribute information of a scholar. Scholar profiling is a fundamental issue in large-scale expert databases for finding experts, evaluating academic influence, and so on. In the open Internet, scholar profiling faces new challenges, such as large amount of data, data noise and data redundancy. The traditional user profiling methods and algorithms cannot be directly used in the user profiling system in the open Internet environment. In this paper, the existing technologies are summarized and classified to provide reference for further research. Firstly, we analyze the problem of scholar profiling, and give a general overview of the information extraction method, which is the basic theory of user profiling. Then, the three basic tasks of scholar profiling including scholar information annotation, research interest mining and academic impact prediction are introduced in detail. What’s more, the successful application system of scholar profiling called AMiner is introduced. Finally, open research issues are discussed and possible future research directions are prospected.
Recent Advances in Datacenter Flow Scheduling
Hu Zhiyao, Li Dongsheng, Li Ziyang
2018, 55(9):  1920-1930.  doi:10.7544/issn1000-1239.2018.20180156
Asbtract ( 1681 )   HTML ( 16)   PDF (2375KB) ( 800 )  
Related Articles | Metrics
Flow scheduling techniques impose an important impact on the performance of the data center. Flow scheduling techniques aim at optimizing the user experience by controlling and scheduling the transmission link, priority and transmission rate of data flows. Flow scheduling techniques can achieve various optimization objects such as reducing the average or weighted flow completion time, decreasing the delay of long-tail flows, optimizing the transmission of flows with deadline constraints, improving the utilization of the network link. In this paper, we mainly review the recent research involving flow scheduling techniques. First, we briefly introduce data center and flow scheduling problem and challenges. These challenges mainly lie in the means to implement flow scheduling on network devices or terminal hosts, and how to design low-overhead highly-efficient scheduling algorithms. Especially, the coflow scheduling problem is proved NP-Hard to solve. Then, we review the latest progress of flow scheduling techniques from two aspects, i.e., single-flow scheduling and coflow scheduling. The divergence between single-flow scheduling techniques and coflow scheduling techniques is the flow relationship under different applications like Web search and big data analytics. In the end of the paper, we outlook the future development direction and point out some unsolved problems involving flow scheduling.
Recent Progress in Low Differential Uniformity Functions over Finite Fields
Qu Longjiang, Chen Xi, Niu Tailin, Li Chao
2018, 55(9):  1931-1945.  doi:10.7544/issn1000-1239.2018.20180159
Asbtract ( 1163 )   HTML ( 5)   PDF (1305KB) ( 369 )  
Related Articles | Metrics
To prevent differential attack on the cipher, cryptographic functions are required to have low differential uniformity. Perfect nonlinear (PN) functions, almost perfect nonlinear (APN) functions and differentially 4-uniform permutations are the most important cryptographic functions with low differential uniformity. Here we survey the recent main research results about cryptographic functions with low differential uniformity such as PN functions, APN functions and differentially 4-uniform permutations. First, we recall the connections between PN functions and the mathematical objects such as the semifield, which survey the known constructions of PN functions and the pseudo-planar functions. Second, the properties and judgement of APN functions are analyzed. We also list the known constructions of APN functions and recall the inequivalent results between them. Third, we summarize the known results on the constructions of differentially 4-uniform permutations and discuss their equivalence. Then, we recall the applications of low differential uniformity functions in the design of actual ciphers. Lastly, we propose some research problems on cryptographic functions with low differential uniformity.
Research on Visual Question Answering Techniques
Yu Jun, Wang Liang, Yu Zhou
2018, 55(9):  1946-1958.  doi:10.7544/issn1000-1239.2018.20180168
Asbtract ( 2667 )   HTML ( 47)   PDF (1926KB) ( 1061 )  
Related Articles | Metrics
With the significant advances of deep learning in computer vision and natural language processing, the existing methods are able to accurately understand the semantics of visual contents and natural languages, and carry out research on cross-media data representation and interaction. In recent years, visual question answering (VQA) has become a hot spot in cross-media expression and interaction area. The target of VQA is to learn a model to understand the visual content referred by a natural language question, and answer it automatically. This paper summarizes the research progresses in recent years on VQA from the aspects of concepts, models and datasets, and discusses the shortcomings of the current works. Finally, the possible future directions of VQA are discussed on methodology, applications and platforms.
A Two-Stage Community Detection Algorithm Based on Label Propagation
Zheng Wenping, Che Chenhao, Qian Yuhua, Wang Jie
2018, 55(9):  1959-1971.  doi:10.7544/issn1000-1239.2018.20180277
Asbtract ( 1364 )   HTML ( 12)   PDF (4361KB) ( 620 )  
Related Articles | Metrics
Due to the random process in node selection and label propagation, the stability of the results of traditional LPA is poor. A two-stage community detection algorithm is proposed based on label propagation, abbreviated as LPA-TS. In the first step of LPA-TS, the labels of nodes are updated according to their participation coefficients in non-decreasing order, and the node label is determined according to the similarity of nodes in the process of node label updating. Some clusters found by Step1 might not satisfy the weak community condition. If a cluster is not a weak community, in the beginning of Step2, we will merge it with the cluster that has most connections with it. Next, we treat each community as a node, and the number of edges between two communities as their edge weights between corresponding nodes. We compute the participation coefficients of each node of the resulted network, and use similar process to get the final results of the communities. The proposed algorithm LPA-TS reduces the randomness in the process of node selection and label propagation; hence, we might obtain stable communities by LPA-TS. In addition, LPA-TS combines small scale communities with adjacent communities in the second phase of the algorithm to improve the quality of community detection. Compared with other classical community detection algorithms on some real networks and artificial networks, the proposed algorithm shows preferable performance on stability, NMI, ARI and modularity.
An Approach for Storytelling by Correlating Events from Social Networks
Li Yingying, Ma Shuai, Jiang Haoyi, Liu Zhe, Hu Chunming, Li Xiong
2018, 55(9):  1972-1986.  doi:10.7544/issn1000-1239.2018.20180155
Asbtract ( 1249 )   HTML ( 19)   PDF (4887KB) ( 436 )  
Related Articles | Metrics
Social networks, such as Twitter and Sina weibo, have become popular platforms to report the public event. They provide valuable data for us to monitor events and their evolution. However, informal words and fragmented texts make it challenging to extract descriptive information. Monitoring the event progression from fast accumulation of microblogs is also difficult. To this end, we monitor the event progression with a common topic from the social network. This can help us to gain an overview and a detailed documentation of the events. In this paper, we use three consecutive components to meet this end. First, we use a structure based approach to detect events from the microblog dataset. Second, we cluster the events by their topics based on their latent semantic information, and define each cluster as a story. Third, we use a graph based approach to generate a storyline for each story. The storyline is denoted by a directed acyclic graph (DAG) with a summary to express the progression of events in the story. The user experience evaluation indicates that this method can help us to monitor events and their progression by achieving improved accuracy and comprehension compared with the state of art methods.
Exploration on Neural Information Retrieval Framework
Guo Jiafeng, Fan Yixing
2018, 55(9):  1987-1999.  doi:10.7544/issn1000-1239.2018.20180133
Asbtract ( 1882 )   HTML ( 23)   PDF (3042KB) ( 953 )  
Related Articles | Metrics
After decades of research, information retrieval technology has been significantly advanced and widely applied in our daily life. However, there is still a huge gap between modern search engines and true intelligent information accessing systems. In our opinion, an intelligent information accessing system should be able to crawl, read and understand the content of the big Web data, index and search the key semantic information, and reason, decide and generate the right results based on users’ information need. To develop such kind of systems, we need theoretical breakthrough on the search architecture and models. In recent years, to address the intelligent information accessing problem, we have conducted systematical research on neural information retrieval framework. We have achieved a few of original contributions on text representation, data indexing and relevance matching. However, there is still a long way in this direction and we will continue our exploration on neural information retrieval in the future.
Survey on Approximate Storage Techniques
Wu Yu, Yang Juan, Liu Renping, Ren Jinting, Chen Xianzhang, Shi Liang, Liu Duo
2018, 55(9):  2002-2015.  doi:10.7544/issn1000-1239.2018.20180295
Asbtract ( 1452 )   HTML ( 22)   PDF (2788KB) ( 772 )  
Related Articles | Metrics
With the rapid development of cloud computing and Internet of things, how to store the explosively growing data becomes a challenge for storage systems. In tackling this challenge, approximate storage technology draws broad attention for its huge potential in saving the cost of storage and improving the system performance. Approximate storage techniques trade off the accuracy of the outputs for performance or energy efficiency taking advantages of the intrinsic tolerance to inaccuracies of many common applications. In this way, the applications improve their performance or energy efficiency while meeting the user requirements. Therefore, how to exploit the features of storages and fault-tolerant applications to improve data access performance, decrease space overhead, and reduce energy consumption is becoming a key problem for storage systems. In this paper, we first introduce the definition of approximate storage technology and show the techniques for identifying the approximate areas in the data. Then, we elaborate the approximate storage techniques for CPU cache, main memory, and secondary storage, respectively. We discuss the advantages and disadvantages of these approximate storage techniques along with the corresponding application scenarios. In the end of this paper, we summarize the features of approximate storage techniques and discuss the research directions of approximate storage techniques.
Survey on High Density Magnetic Recording Technology
Wang Guohua, David Hung-Chang Du, Wu Fenggang, Liu Shiyong
2018, 55(9):  2016-2028.  doi:10.7544/issn1000-1239.2018.20180264
Asbtract ( 1229 )   HTML ( 13)   PDF (2383KB) ( 713 )  
Related Articles | Metrics
In the era of big data, the demand for large-capacity disks has been growing. With minimal technology changes to the existing disk head and storage media of hard disks, the shingled magnetic recording (SMR) technology is the best choice to increase the disk storage capacity. The interlaced magnetic recording (IMR) technology is a newly developed technology in recent years, which can achieve higher storage density and random write performance than SMR. In this paper, we first introduce the shingled track layout of SMR drive and the resulting write amplification problem. We also review the data management methods that mitigate write amplification problem, the evaluation of performance characterizations, and the research on SMR-based upper applications. Then we introduce the interlaced track layout of IMR drive and its data write amplification problem. We also analyze the future research topics of IMR drive. Finally, we compare SMR drive and IMR drive from the storage density, random write performance, and other aspects. A variety of SMR-based upper applications, like file system, database, and RAID, prove that SMR drive can be effectively used to replace conventional disks to build large-scale storage systems. The advantages of IMR drive over SMR drive will make it have a bright future.
A Tiny-Log Based Persistent Transactional Memory System
Chen Juan, Hu Qingda, Chen Youmin, Lu Youyou, Shu Jiwu, Yang Xiaohui
2018, 55(9):  2029-2037.  doi:10.7544/issn1000-1239.2018.20180294
Asbtract ( 1043 )   HTML ( 6)   PDF (2156KB) ( 331 )  
Related Articles | Metrics
In recent years,in order to exploit the performance advantage of persistent memory, researchers have designed various lightweight persistent transactional memory systems. Atomicity and consistency of transactions are mostly ensuredusing the logging mechanism. However,compared with conventional memory,memory cells of persistent memory tend to have higher write latency and limited endurance. This paper observes two problems in the existing persistent transactional memory systems: Firstly,existing systems do not distinguish between different types of write operations in the transaction. No matter whether the writes are updating existing data in memory or adding data to newly allocated memory, existing systems use the same logging mechanism to ensure consistency. Secondly,existing systems persist the address and data of every write operation to the log, even if most of them can be compressed to reduce the size. Both of the above problems lead to redundant log operations,resulting in extra write latency and write wearing. In order to solve the above problems,this paper designs and implements TLPTM,a tiny-log persistent transactional memory system. It is based on two optimization techniques: (1)AALO (allocation-aware log optimization), effectively avoids the logging overhead generated by the operations of adding data to newly allocated memory; (2)CBLO (compression-based log optimization), compresses the log before writing it to the NVM and reduces the overhead of logwriting. The experimental results show that compared with Mnemosyne, AALO improves the system performance by 15%~24%, and TLPTM using both optimizations reduces the write wearing of logging by 70%~81%.
A Log-Structured Key-Value Store Based on Non-Volatile Memory
You Litong, Wang Zhenjie, Huang Linpeng
2018, 55(9):  2038-2049.  doi:10.7544/issn1000-1239.2018.20180258
Asbtract ( 1322 )   HTML ( 25)   PDF (2337KB) ( 614 )  
Related Articles | Metrics
Non-volatile memory (NVM) technologies are promising that would change the future of storage. NVM possesses many attractive capabilities such as byte addressability, low access latency, and persistence. It provides a great opportunity for the integration of DRAM and NVM in a unified main storage space. NVM could access data through the memory bus and CPU related instructions, which makes it possible to design a fast and persistent storage system in non-volatile memory. Existing key-value stores proposed for block devices implement NVM as block devices, which conceal the performance that NVM provides. A few existing key-value stores for NVM fail to provide consistency when hardware supports (e.g., cache flush) on power failures are unavailable. In this paper, we present a non-volatile memory key-value storage system, named TinyKV, which utilizes the log structure as its core framework. We propose a static concurrent, cache-friendly Hash table implementation using the characteristics of the key-value workloads. TinyKV separates the maintenance for data log of each worker thread in order to guarantee high concurrency. In addition, we implement the log structure technology for memory management and design a multi-tier memory allocator to ensure consistency. To reduce write latency, we reduce writes to NVM and cache flushing instructions by using cache flushing instructions. Our experiments demonstrate that TinyKV outperforms traditional key-value stores in both throughput and scalability.
Largepages Supported Hierarchical DRAMNVM Hybrid Memory Systems
Chen Ji, Liu Haikun, Wang Xiaoyuan, Zhang Yu, Liao Xiaofei, Jin Hai
2018, 55(9):  2050-2065.  doi:10.7544/issn1000-1239.2018.20180269
Asbtract ( 1356 )   HTML ( 11)   PDF (5092KB) ( 564 )  
Related Articles | Metrics
Hybrid memory systems composed of non-volatile memory (NVM) and DRAM can offer large memory capacity and DRAM-like performance. However, with the increasing memory capacity and application footprints, the address translation overhead becomes another system performance bottleneck due to lower translation lookaside buffer (TLB) converge. Large pages can significantly improve the TLB converge, however, they impede fine-grained page migration in hybrid memory systems. In this paper, we propose a hierarchical hybrid memory system that supports both large pages and fine-grained page caching. We manage NVM and DRAM with large pages and small pages, respectively. The DRAM is used as a cache to NVM by using a direct mapping mechanism. We propose a cache filtering mechanism to only fetch frequently-access (hot) data into the DRAM cache. CPUs can still access the cold data directly in NVM through a DRAM bypassing mechanism. We dynamically adjust the threshold of hot data classification to adapt to the diversifying and dynamic memory access patterns of applications. Experimental results show that our strategy improves the application performance by 69.9% and 15.2% compared with a NVM-only system and the state-of-the-art CHOP scheme, respectively. The performance gap is only 8.8% compared with a DRAM-only memory system with large pages support.
R-Tree Optimization Method Using Internal Parallelism of Flash Memory-Based Solid-State Drives
Chen Yubiao, Li Jianzhong, Li Yingshu, Li Faming, Gao Hong
2018, 55(9):  2066-2082.  doi:10.7544/issn1000-1239.2018.20180254
Asbtract ( 1239 )   HTML ( 5)   PDF (5119KB) ( 422 )  
Related Articles | Metrics
Recently, flash memory-based solid state disk has more magnificent improvement on internal design than before, which brings rich internal parallelism to solid state disk. R-tree index is widely applied in spatial data management, but up to now, the proposed R-tree optimization methods on solid state disk do not take the internal parallelism into consideration, and also the approach designed for traditional magnetic disk is not suitable for solid state disk. So all of the previous R-tree optimization doesn’t use internal parallelism mechanism of solid state disk to make the query and update operation more efficient. In order to exploit internal parallelism to speed up R-tree. Firstly, a parallel batch asynchronous I/O submitting library is realized. Secondly, optimizing algorithms to accelerate the R-tree search and update operations are achieved by aggregating read or write operations to batch submit through the previous library, Thirdly, we analyze the minimal speed up expectation theoretically, and prove that normal solid state can achieve speed up of at least 1.86 times expectation speed-up with 4 channels and 2.93 times expectation speed-up with 8 channels. Through the experiments on two kind of solid state disk, our optimization R-tree can achieve stable 3 times speed up for query operation compared with original R-tree, and also speed up of about 2 times for update operation. No matter for query intensive or update intensive application scenarios, there is speedup between them.
APMSS: The New Solid Storage System with Asymmetric Interface
Niu Dejiao, He Qingjian, Cai Tao, Wang Jie, Zhan Yongzhao, Liang Jun
2018, 55(9):  2083-2093.  doi:10.7544/issn1000-1239.2018.20180198
Asbtract ( 1075 )   HTML ( 5)   PDF (3588KB) ( 329 )  
Related Articles | Metrics
The solid storage system is an important way to solve the problem of computer memory wall. But the existing block-based storage management strategy can’t use the advantages of byte-addressable characteristics in solid storage system and causes write amplification, which seriously reduces the I/O performance and lifetime of NVM devices. In term of this problem, the new solid storage system with asymmetric interface named APMSS is presented, and the management of read and write access request are separated by the analysis of two type access request. The read access request is still managed by block unit to avoid increasing the overhead of I/O stack and keep high performance by cache. The minimized direct write strategy is designed and the write access request is managed by dynamic granularity to reduce the communication and written data to solid storage system. At the same time, the multi-granularity mapping algorithm is designed to avoid exchanging amplification between memory and solid storage system and improve the I/O performance. Then the write amplification of solid storage system could be avoided and the write performance of NVM devices could be improved. The prototype is implemented based on PMBD which is the open source solid storage system simulator. Fio and Filebench are used to test the read and write performance of Ext2 on PMBD, Ext4 on PMBD and Ext4 on APMSS. The test results show that Ext4 on APMSS can improve sequential write performance by 9.6%~29.8% compared with Ext2 and Ext4 on PMBD.