ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 March 2021, Volume 58 Issue 3
Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model
Xie Zhen, Tan Guangming, Sun Ninghui
2021, 58(3):  445-457.  doi:10.7544/issn1000-1239.2021.20180601
Asbtract ( 1045 )   HTML ( 329)   PDF (3400KB) ( 568 )  
Related Articles | Metrics
Performance models provide insightful perspectives to allow us to predict performance and propose optimization guidance. Although there has been much research, pinpointing bottlenecks of various memory access patterns and reaching high performance of both regular and irregular programs on various hardware configurations are still not trivial. In this work, we propose a novel model called probability-process-ram (PPR) to quantify the amount of compute and data transfer time on general-purpose multicore processors. The PPR model predicts the number of instruction for single-core and probability of memory access between each memory hierarchy through a newly designed cache simulator. By using the automatically extracted best optimization method and expectation, we use PPR model for analyzing and optimizing sparse matrix-vector multiplication and 1D convolution as case study for typical irregular and regular computational kernels. Then we obtain best block sizes for sparse matrices with various sparsity structures, as well as optimal optimization guidance for 1D convolution with different instruction sets support and data sizes. Comparison with Roofline model and ECM model, the proposed PPR model greatly improves prediction accuracy by the newly designed cache simulator and achieves comprehensive feedback ability.
Bidirectional-Bitmap Based CSR for Reducing Large-Scale Graph Space
Gan Xinbiao, Tan Wen, Liu Jie
2021, 58(3):  458-466.  doi:10.7544/issn1000-1239.2021.20200090
Asbtract ( 806 )   HTML ( 19)   PDF (1793KB) ( 361 )  
Related Articles | Metrics
Graph500 is an important and famous benchmark to evaluate data-intensive applications for supercomputers in the big data era. The graph traversal processing ability of pre-exascale system is mainly restricted to memory space and communication bandwidth, especially the utilization of memory space ultimately determines the testing graph scale, and the graph testing scale absolutely dominates the performance of Graph500. Hence, Bi-CSR (bidirectional-bitmap CSR) is proposed based on CSR (compressed sparse row) for testing Graph500 on Tianhe pre-exascale system, The Bi-CSR would reduce large-scale graph space by introducing row-bitmap and column-bitmap to compress sparse matrix storage for Graph500.The aim of row-bitmap based on CSR is mainly cutting down graph memory space, while column-bitmap based on CSR would not only further reduce memory space but also improve graph traversal by using array of VPE(vector processing element), because VPEs are optimized and equipped in Tianhe pre-exascale system, which would speedup graph traversal when making fully use of VPEs. Accordingly, Bi-CSR would greatly reduce large-scale graph space while introducing row-bitmap and column-bitmap to compress sparse matrix storage of Graph500 for Tianhe pre-exascale system. Experimental results demonstrate that Bi-CSR would reduce large-scale graph space by 70% when testing input of Graph500 is 2\+\{37\} on Tianhe pre-exascale system with 2.131E+12 TEPS (traversed edges per second).
Effective High-Level Synthesis for High-Performance Graph Processing
Tang Jiawu, Zheng Long, Liao Xiaofei, Jin Hai
2021, 58(3):  467-478.  doi:10.7544/issn1000-1239.2021.20190679
Asbtract ( 4501 )   HTML ( 4528)   PDF (1364KB) ( 3127 )  
Related Articles | Metrics
Graph processing has become one of the mainstream big data applications. For graph applications such as biological networks, social networks, and Web graphs, traditional GPU and CPU architectures suffer in terms of power consumption and performance due to graph algorithms’ characteristics. It is demonstrated that specialized hardware acceleration can significantly promote the performance and energy-efficiency of graph processing. As we know, writing and verifying the correct hardware-level codes are tedious and time-consuming. Although general-purpose high level synthesis (HLS) systems allow users to write the applications using high-level languages such as C by automatically generating it into the underlying hardware codes. However, for the irregular graph applications, these HLS systems still lack effective support for massive parallelism and memory access, potentially leading to significantly low performance. In this paper, we propose an effective HLS for high-performance graph processing. We adopt the dataflow architecture to achieve efficient parallel pipelining, ensuring load balancing. Through the developed programming primitives, users can quickly customize the vertex-centric graph algorithm and translate it into a modular intermediate representation (IR), which in turn maps to a parameterized hardware template. We build our HLS on Xilinx Virtex UltraScale+XCVU9P. Results on a variety of graph algorithms and datasets show that our HLS system can outperform state-of-the-art spatial by 7.9-30.6x speedups.
Energy-Efficient Strategy Based on Data Recovery in Storm
Pu Yonglin, Yu Jiong, Lu Liang, Li Ziyang, Guo Binglei, Liao Bin
2021, 58(3):  479-496.  doi:10.7544/issn1000-1239.2021.20200489
Asbtract ( 577 )   HTML ( 13)   PDF (4698KB) ( 316 )  
Related Articles | Metrics
As one of the most popular platforms in big data stream computing, Storm is developed for high performance in the design process, which ignores the problem of high energy consumption and restricts the development of the platform. Aiming at this problem, the task allocation model, the topology information monitoring model, the data recovery model, and the energy consumption model are set up. Moreover, an energy-efficient strategy based on data recovery in Storm(DR-Storm) is proposed. The proposed strategy is composed of the throughput detection algorithm and the data recovery algorithm. According to the topology information, the throughput detection algorithm calculates cluster throughput which is feedbacked by the topology information monitoring model and estimates whether the task in cluster topology should be terminated by information feedback. The data recovery algorithm selects a backup node for data storage according to the data recovery model and estimates whether cluster topology is appropriate for data recovery by the feedback of the topology information monitoring model. In addition, the DR-Storm recovers data within the cluster topology from the memory of the backup node. We evaluate the DR-Storm by measuring the cluster latency as well as the energy consumption efficiency in a big data stream computing environment. The experimental results show that the proposed strategy can reduce cluster latency and power while the energy consumption is saved efficiently compared with existing researches.
Scalability for Monolithic Schedulers of Cluster Resource Management Framework
Mao Anqi, Tang Xiaochun, Ding Zhao, Li Zhanhuai
2021, 58(3):  497-512.  doi:10.7544/issn1000-1239.2021.20200501
Asbtract ( 605 )   HTML ( 8)   PDF (2379KB) ( 293 )  
Related Articles | Metrics
The significant advantages of monolithic cluster resource management system in ensuring the consistency of global resource status and applying multiple scheduling models make it widely used in actual systems. Howerver, the performance of the monolithic resource manager in a large cluster management environment does not meet expectations, because it uses a single node to maintain the global resource state. When the resource manager is receiving and processing large-scale periodic heartbeat information, the load pressure on the resource manager will increase sharply, which leads to a scalability bottleneck. In order to solve these problems, this paper proposes the idea of “no change, no update” to replace the periodic update mechanism of the resource manager. In our paper, we briefly summarize three main topics. Firstly, we introduce a differential-based heartbeat information processing model in the computing node. When the resource status of the computing node has not changed, it will not send the message to the resource manager, thereby reducing the size and number of messages. Secondly, we propose a ring network monitoring model between computing nodes. By adopting this mode, the periodic monitoring pressure can be transferred to the computing nodes. Finally, we implement these two models on YARN. After experimental verification, we can conclude that when the cluster reaches 10 000 nodes and the heartbeat interval is 3 s, the YARN based on our models increases the heartbeat information processing efficiency and resource update efficiency by about 40%. In addition, the scale of the cluster managed by improved YARN is more than 1.88 times that of the original YARN.
Review on Text Mining of Electronic Medical Record
Wu Zongyou, Bai Kunlong, Yang Linrui, Wang Yiqi, Tian Yingjie
2021, 58(3):  513-527.  doi:10.7544/issn1000-1239.2021.20200402
Asbtract ( 1599 )   HTML ( 77)   PDF (846KB) ( 1098 )  
Related Articles | Metrics
Electronic medical records (EMR), produced with the development of hospital informa-tionization and contained rich medical information and clinical knowledge, play important roles in guiding and assisting clinical decision-making and drug mining. Therefore, how to efficiently mine important information in a large amount of electronic medical records is an essential research topic. In recent years, with the vigorous development of computer technology, especially machine learning and deep learning, data mining in the special field of electronic medical records have been raised to a new height. This review aims to guide future development in the field of electronic medical record text mining by analyzing the current status of electronic medical record research. Specifically, this paper begins with an introduction to the characteristics of electronic medical record data and introduces how to preprocess electronic medical record data; then four typical tasks around electronic medical record data mining (medical named entity recognition, relationship extraction, text classification and smart interview) introduce popular model methods; finally, from the perspective of the application of electronic medical record data mining in characteristic diseases, two specific diseases of diabetes and cardio-cerebrovascular diseases are combined and a brief introduction to the existing application scenarios of electronic medical records is given.
Robust Face Expression Recognition Based on Gender and Age Factor Analysis
Liao Haibin, Xu Bin
2021, 58(3):  528-538.  doi:10.7544/issn1000-1239.2021.20200288
Asbtract ( 925 )   HTML ( 28)   PDF (5916KB) ( 651 )  
Related Articles | Metrics
A robust face expression recognition method based on deep conditional random forest is proposed to solve the problem of factors such as race, gender and age in non-controllable environment. Different from the traditional single task facial expression recognition models, we devise an effective multi-task face expression recognition architecture that is capable of learning from auxiliary attributes like gender and age. In the study, we find that facial attributes of gender and age have a great impact on facial expression recognition. In order to capture the relationship between facial attributes and facial expressions, a deep conditional random forest based on facial attributes is proposed for face expression recognition. In the feature extraction stage, multi-instance learning integrated with attention mechanism is used to extract face features to remove variations including illumination, occlusion and low resolution. In the facial expression recognition stage, according to the facial attributes of gender and age, the multi-condition random forest method is used to recognize facial expressions. A large number of experiments have been carried out on the open CK+, ExpW, RAF-DB and AffectNet face expression databases: the recognition rate reaches 99% on the normalized CK+ face database and 70.52% on the challenging natural scene database. The experimental results show that our proposed method has better performance than the state-of-the-art methods; furthermore, it is robust to occlusion, noise and resolution variation in the wild.
Credit Fraud Detection for Extremely Imbalanced Data Based on Ensembled Deep Learning
Liu Ying, Yang Ke
2021, 58(3):  539-547.  doi:10.7544/issn1000-1239.2021.20200324
Asbtract ( 830 )   HTML ( 30)   PDF (1483KB) ( 501 )  
Related Articles | Metrics
The existence of class imbalance in credit fraud data significantly undermines model performance. In particular, when the sample distribution is extremely unbalanced, noise caused by information distortion, statistical discrepancy and reporting bias will severely damage the process of training models, leading to potential issues such as overfitting. For this reason, this paper proposes an algorithm based on ensembled deep belief network, which is meant to tackle credit fraud data featured by extreme imbalance. First, this paper proposes joint sampling strategy combining under-sampling and over-sampling to retrieve training subset data. Then, we introduce an algorithm of constructing classifier clusters through two stages. Support vector classifiers and random forest classifiers are combined by using Boosting algorithm to overcome classification interface deviation of support vector machine. Finally, deep belief network is exploited to assemble classifiers’ predictions and output final classification result. Besides, traditional evaluation methods put too much emphasis on majority samples, ignoring the reality where the minority matters even more. The revenue cost index that considers identification of both positive and negative samples has thereby been introduced. This paper conducts empirical study on European credit card data and concludes a 3% higher performance on revenue cost index of the proposed algorithm than others’ average. The experiment also evaluates the influence of imbalance ratio over algorithm’s performance and finds that proposed algorithm outperforms others in this aspect.
Recent Advances in Image Steganography Based on Deep Learning
Fu Zhangjie, Li Enlu, Cheng Xu, Huang Yongfeng, Hu Yuting
2021, 58(3):  548-568.  doi:10.7544/issn1000-1239.2021.20200360
Asbtract ( 1566 )   HTML ( 61)   PDF (4530KB) ( 1169 )  
Related Articles | Metrics
Image steganography is one of the research hotspots of information security. In the early stages of image steganography study, pixel modifications change the statistical properties of the image in a non-negligible way. So it is difficult to resist detection based on high-dimensional statistical characterization. With the development of deep learning, researchers have proposed a number of deep learning-based image steganography methods, which make image modification more imperceptible and the steganography process more intelligent. To further study image steganography technology, this paper provides a thorough review of the image steganography methods based on deep learning. First of all, according to the image steganography process, this paper profoundly analyzes the deep learning-based image steganography methods from three aspects. The first is to introduce the cover image acquisition methods from two perspectives: generative adversarial networks and adversarial samples. The second is to illustrate the steganographic distortion design methods. The third is to expound the methods of generating stego images based on the encoding-decoding network. In addition, the advantages and disadvantages of coverless image steganography are also discussed and summarized in this paper. This kind of method can achieve imperceptible steganography without the cover image, so it has inherent superiority in resisting steganalysis. Finally, we conclude the pros and cons of deep learning-based image steganography methods and coverless image steganography. On the basis of the analysis, we discuss and prospect the future research direction of image steganography.
A Method for Joint Detection of Attacks in Named Data Networking
Wu Zhijun, Zhang Rudan, Yue Meng
2021, 58(3):  569-582.  doi:10.7544/issn1000-1239.2021.20200448
Asbtract ( 519 )   HTML ( 11)   PDF (3156KB) ( 218 )  
Related Articles | Metrics
The interest flooding attack (IFA) and conspiracy interest flooding attack (CIFA) are typical security threats faced by the named data networking (NDN). Aiming at the problem that existing detection methods cannot effectively identify the attack types due to single detection features and the detection rate is not high enough, this paper proposes a method based on association rule algorithm and decision tree algorithm to detect attacks in NDN. First of all, by extracting the data information in the content cache (CS) of NDN routing node, the new detection feature “CS packet growth rate” in CS is mined. It is found in the experiment that “cache growth rate” is a favorable basis for distinguishing attack types. Secondly, association rule algorithm is used to combine the new detection feature with multiple detection features in pending interest table (PIT) to find the correlation between each feature. After preprocessing the output results of multiple association rules, they are used as input into the decision tree as a training set. Finally, the detection model generated by the decision tree algorithm is used to detect the attack. This method uses decision tree algorithm and association rule algorithm to jointly detect attacks in NDN, which not only avoids misjudgment caused by single detection features, but also enriches the classification attributes of decision trees. The simulation results show that this method can accurately distinguish and detect IFA and CIFA and improve the detection rate.
Lattice-Based Forward Secure Proxy Signatures
Xie Jia, Hu Yupu, Jiang Mingming
2021, 58(3):  583-597.  doi:10.7544/issn1000-1239.2021.20200321
Asbtract ( 382 )   HTML ( 7)   PDF (804KB) ( 150 )  
Related Articles | Metrics
With advantages of both forward security and proxy, the forward secure proxy signature has been widely applied in mobile communication and electronic auction since it was proposed. However, most of the existing forward secure proxy signatures are based on the classic number theory problem, such as the problem of discrete logarithms and the problem of factorization, which are no longer secure when the general quantum computers become a reality. So looking for the quantum-immune forward secure proxy signature is much urgent. Among the four quantum-immune public key cryptographies, lattice-based cryptography enters a rapid development period in the last ten years and has got many achievements, having the advantages of quantum-immune, computing simply and efficiently, and the worst-case to average-case security guarantees. In this paper, we firstly introduce the concept and the security model of forward secure proxy signature in lattice-based cryptography, and propose two forward secure proxy lattice-based signature schemes based on the small integer solution problem, which is the NP-hard problem. One is the first lattice-based forward proxy signature in the random oracle model, which is proven secure against the polynomial time adversary(both of the unauthorized proxy signer and the malicious original signer). And the forward security is satisfied at the expense of efficiency. The other is proven unforgeable and forward secure in the standard model, which is also the first lattice-based attempt in the standard model.
Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture
Xu Kunhao, Nie Tiezheng, Shen Derong, Kou Yue, Yu Ge
2021, 58(3):  598-608.  doi:10.7544/issn1000-1239.2021.20190567
Asbtract ( 501 )   HTML ( 21)   PDF (1253KB) ( 266 )  
Related Articles | Metrics
Similarity join is an important task in data cleaning, data integration and other fields, which has attracted extensive attention in recent years. With the increasing amount of data, the improvement of real-time processing requirement and the bottleneck of CPU performance improvement, the traditional serial algorithms of similarity join have been unable to meet the requirement of current big data processing. As a co-processor, GPU has achieved good acceleration results in machine learning and other fields in recent years. It is of great practical significance to study the parallel similarity join algorithms based on GPU. This paper proposes a parallel similarity join algorithm based on CPU-GPU heterogeneous architecture. Firstly, GPU is used to construct inverted index based on SoA (struct of arrays), which solves the problem of low efficiency of traditional index structure in parallel reading and writing. Then, to address the performance problem of serial algorithms, a parallel dual-length filtering algorithm based on filter-verification framework is proposed. Inverted index and prefix filtering algorithm are used to further improve the filtering performance. And in our approach, the calculation for exact similarity verification is performed by CPU to make full use of heterogeneous computing resources of CPU-GPU. Finally, experiments are carried out on several datasets. Compared with the serial similarity join algorithms, the results show that our proposed algorithms have better filtering performance and lower index generation time than existing algorithms, and also have better processing performance and higher speedup ratio on the similarity join.
Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing
Li Song, Hu Yanming, Hao Xiaohong, Zhang Liping, Hao Zhongxiao
2021, 58(3):  609-623.  doi:10.7544/issn1000-1239.2021.20200285
Asbtract ( 502 )   HTML ( 8)   PDF (1962KB) ( 172 )  
Related Articles | Metrics
Aiming at the problem that the existing high-dimensional space AkNN query algorithm does not consider the association relationship between dimensions when reducing the data dimensionality, we propose the method which groups the dimensions based on association rules between dimensions to reduce the data dimensionality first. The algorithm reduces the loss of data information by dividing the related dimensions into a group for dimensionality reduction. At the same time, in order to solve the problem of data offset caused by Hash reduction, the sign bits are set and the query result is refined based on the characteristics of the sign bits. To improve the efficiency of mining association rules between dimensions, a new frequent itemset mining algorithm based on the UFP-tree is proposed in this paper. In order to improve the efficiency of the AkNN query, we map the data into binary codes and query based on the codes. And the coding functions are filtered by information entropy to improve the coding quality. In the process of refining the query results, weights are dynamically set based on the information entropy of the encoded bits of the candidate set data, and the final AkNN results are returned by comparing the dynamic weighted Hamming distance and the number of sign bit collisions. Theoretical and experimental studies show that the proposed method can effectively deal with AkNN query problems in high-dimensional spaces.
Towards Private Key-Value Data Collection with Histogram
Zhang Xiaojian, Xu Yaxin, Fu Nan, Meng Xiaofeng
2021, 58(3):  624-637.  doi:10.7544/issn1000-1239.2021.20200319
Asbtract ( 479 )   HTML ( 5)   PDF (4690KB) ( 271 )  
Related Articles | Metrics
Recently, user data collection and analysis with local differential privacy has extended into key-value data. The trade-off between the size and sparsity of domain and perturbation method directly constrains the accuracy of the collection and analysis of such data. To remedy the deficiency caused by the domain size and perturbating method, this paper employs histogram technology to propose an efficient solution, called HISKV, to collect key-value data. HISKV firstly uses a user-grouping strategy and partial privacy budget to find the optimal length of truncation and enables each user to truncate his/her key-value data set. And then, based on the truncated set, each user samples one key-value pair and uses the discretization and perturbation method to process this pair. To perturb key-value data efficiently, a novel mechanism in HISKV, named LRR_KV is proposed, which allocates different perturbing probability for different keys. In LRR_KV, each user adopts this mechanism to add noise to his/her sampled pair, and sents the report to a collector. Based on the reports from all of the users, the collector estimates the frequency of each key and the mean of the values. To evaluate the utility of HISKV, we firstly conduct theoretical analysis on unbias, variance, and error bound of LRR_KV, and then perform experiments on real and synthetic datasets to compare different methods. The experimental results show that HISKV outperforms its competitors.
A Programming Paradigm Combining Programmer and Neural Network to Promote Automated Program Generation
Zhou Peng, Wu Yanjun, Zhao Chen
2021, 58(3):  638-650.  doi:10.7544/issn1000-1239.2021.20200298
Asbtract ( 1022 )   HTML ( 25)   PDF (2087KB) ( 422 )  
Related Articles | Metrics
Program generation is one of the core research challenges in AI. At present, the neural network methods driven by input-output data are very popular for program generation modeling. Because of incomplete information input to the model, and complete dependency and limited memory capacity of the neural network, the learning performances of these models suffer from the challenges of poor generalization, generated program accuracy assurance, and being not competent for dealing with common program structures. The memory capacity of neural network can not meet the variable storage requirements of conventional programs. Thus, we propose a programming paradigm merging the strengths of human’s experience and perception with those of neural network’s learning from data samples. Human programmers provide the overall program structure roughly, and leverage the neural network to generate the local trivial detail automatically. The programs run on an abstract computer like digital computer, but are end-to-end differentiable, which consists of differentiable controller, extended external memory relative to internal memory cells of neural network, and differentiable instruction set represented by differentiable state transfer function library. So, it can not only receive input-output samples but also program and execute instructions like traditional digital computers, and its extended memory visible from outside provides more capacity for program variable representation. The advantage is to promote program generation’s practical applicability. The experimental results indicate that the method is effective and gets much better learning performance than other typical methods in program generation.
Kernel Configuration Infographic Based on Multi-Label and Its Application
Hou Pengpeng, Zhang Heng, Wu Yanjun, Yu Jiageng, Tai Yang, Miao Yuxia
2021, 58(3):  651-667.  doi:10.7544/issn1000-1239.2021.20200186
Asbtract ( 540 )   HTML ( 14)   PDF (3182KB) ( 161 )  
Related Articles | Metrics
The Linux kernel provides flexible configuration items to customize the kernel for various application scenarios. However, the number of kernel configuration items is huge and growing rapidly, and the values of configuration items often change in different kernel versions, even professional kernel teams face many challenges when setting the values of configuration items. This paper presents an infographic containing a variety of information for kernel configuration items. The infographic contains the dependencies among configuration items, function labels, performance labels, security labels, and configuration item enable rates. In addition, the infographic provides a visualization interface, which is more intuitive, efficient and user-friendly. The infographic can be widely used in scenarios such as kernel startup optimization, kernel size tailoring, kernel security enhancement, kernel performance optimization, kernel configuration item abnormality detection, kernel configuration item intelliqient question and answering, and kernel configuration item recommendation. To verify the validity of the infographic, we have designed a configuration item-oriented retrieval framework KCIR based on the infographic, which implements query expansion based on multi-label information and text extension based on the dependencies between kernel configuration items, and our experiments demonstrate that KCIR is more effective than the traditional retrieval frameworks. The use of the infographic in the information retrieval field illustrates its effectiveness and practicality.
An Ideal Performance Oriented Approach for Cross-Framework Compiler Analysis
Lai Qingkuan, Lü Fang, He Chunlin, He Xianbo, Feng Xiaobing
2021, 58(3):  668-680.  doi:10.7544/issn1000-1239.2021.20190728
Asbtract ( 474 )   HTML ( 20)   PDF (5531KB) ( 186 )  
Related Articles | Metrics
Compiler performance is the embodiment that computer system architecture makes full use of its advantages. Compiler optimization is influenced by the machine platform and compiler characteristics. Compiler analysis is carried out between the target compiler and the multiple reference compiler,  and the target platform and the multiple reference platform,  that is to say,  the combination of the compiler and the platform is the foundation of the analysis. Only in the circumstance of multiple combinations,  we can provide the maximum possibility of performance room for improvement and prioritization schemes in detail for the target compiler optimization. However,  adding to the combination of compiler and platform will add to the analysis amount of work which is unable to measure in most cases. For this purpose,  one kind of analytical technique in face of cross-platform and cross-compiler on the strength of the peak value architecture is put forward. On the strength of the peak value architecture set,  we structure the ideal performance space for the target compiler. In the combination of the advantage of fine grit,  we optimize the location technique,  and provide the optimization options with advantages and optimized direction for the target compiler,  so as to implement compiler optimization. In the end,  by means of experiments that we have carried out,  we verify the practical nature and pervasive nature of this analytical technique,  what is more,  we provide the optimized direction for the target compiler (gcc) on Intel platform.