ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 May 2021, Volume 58 Issue 5
Adversarial Attacks and Defenses for Deep Learning Models
Li Minghui, Jiang Peipei, Wang Qian, Shen Chao, Li Qi
2021, 58(5):  909-926.  doi:10.7544/issn1000-1239.2021.20200920
Asbtract ( 107 )   PDF (1577KB) ( 128 )  
Related Articles | Metrics
Deep learning is one of the main representatives of artificial intelligence technology, which is quietly enhancing our daily lives. However, the deployment of deep learning models has also brought potential security risks. Studying the basic theories and key technologies of attacks and defenses for deep learning models is of great significance for a deep understanding of the inherent vulnerability of the models, comprehensive protection of intelligent systems, and widespread deployment of artificial intelligence applications. This paper discusses the development and future challenges of the adversarial attacks and defenses for deep learning models from the perspective of confrontation. In this paper, we first introduce the potential threats faced by deep learning at different stages. Afterwards, we systematically summarize the progress of existing attack and defense technologies in artificial intelligence systems from the perspectives of the essential mechanism of adversarial attacks, the methods of adversarial attack generation, defensive strategies against the attacks, and the framework of the attacks and defenses. We also discuss the limitations of related research and propose an attack framework and a defense framework for guidance in building better adversarial attacks and defenses. Finally, we discuss several potential future research directions and challenges for adversarial attacks and defenses against deep learning model.
Research and Challenge of Distributed Deep Learning Privacy and Security Attack
Zhou Chunyi, Chen Dawei, Wang Shang, Fu Anmin, Gao Yansong
2021, 58(5):  927-943.  doi:10.7544/issn1000-1239.2021.20200966
Asbtract ( 59 )   PDF (2954KB) ( 58 )  
Related Articles | Metrics
Different from the centralized deep learning mode, distributed deep learning gets rid of the limitation that the data must be centralized during the model training process, which realizes the local operation of the data, and allows all participants to collaborate without exchanging data. It significantly reduces the risk of user privacy leakage, breaks the data island from the technical level, and improves the efficiency of deep learning. Distributed deep learning can be widely used in smart medical care, smart finance, smart retail and smart transportation. However, typical attacks such as generative adversarial network attacks, membership inference attacks and backdoor attacks, have revealed that distributed deep learning still has serious privacy vulnerabilities and security threats. This paper first compares and analyzes the characteristics of the three distributed deep learning modes and their core problems, including collaborative learning, federated learning and split learning. Secondly, from the perspective of privacy attacks, it comprehensively expounds various types of privacy attacks faced by distributed deep learning, and summarizes the existing privacy attack defense methods. At the same time, from the perspective of security attacks, the paper analyzes the attack process and inherent security threats of the three security attacks: data poisoning attacks, adversarial sample attacks, and backdoor attacks, and analyzes the existing security attack defense technology from the perspectives of defense principles, adversary capabilities, and defense effects. Finally, from the perspective of privacy and security attacks, the future research directions of distributed deep learning are discussed and prospected.
A Review of Fuzzing Techniques
Ren Zezhong, Zheng Han, Zhang Jiayuan, Wang Wenjie, Feng Tao, Wang He, Zhang Yuqing
2021, 58(5):  944-963.  doi:10.7544/issn1000-1239.2021.20201018
Asbtract ( 72 )   PDF (1225KB) ( 75 )  
Related Articles | Metrics
Fuzzing is a security testing technique, which is playing an increasingly important role, especially in detecting vulnerabilities. Fuzzing has experienced rapid development in recent years. A large number of new achievements have emerged, so it is necessary to summarize and analyze relevant achievements to follow fuzzing’s research frontier. Based on 4 top security conferences (IEEE S&P, USENIX Security, CCS, NDSS) about network and system security, we summarized fuzzing’s basic workflow, including preprocessing, input building, input selection, evaluation, and post-fuzzing. We discussed each link’s tasks, challenges, and the corresponding research results. We emphatically analyzed the fuzzing testing method based on coverage guidance, represented by the American Fuzzy Lop tool and its improvements. Using fuzzing testing technology in different fields will face vastly different challenges. We summarized the unique requirements and corresponding solutions for fuzzing testing in specific areas by sorting and analyzing the related literature. Mostly, we focused on the Internet of Things and the kernel security field because of their rapid development and importance. In recent years, the progress of anti-fuzzing testing technology and machine learning technology has brought challenges and opportunities to the development of fuzzing testing technology. These opportunities and challenges provide direction reference for the further research.
Research Progress of Neural Networks Watermarking Technology
Zhang Yingjun, Chen Kai, Zhou Geng, Lü Peizhuo, Liu Yong, Huang Liang
2021, 58(5):  964-976.  doi:10.7544/issn1000-1239.2021.20200978
Asbtract ( 40 )   PDF (1865KB) ( 45 )  
Related Articles | Metrics
With the popularization and application of deep neural networks, the trained neural network model has become an important asset and has been provided as machine learning services (MLaaS) for users. However, as a special kind of user, attackers can extract the models when using the services. Considering the high value of the models and risks of being stolen, service providers start to pay more attention to the copyright protection of their models. The main technique is adopted from the digital watermark and applied to neural networks, called neural network watermarking. In this paper, we first analyze this kind of watermarking and show the basic requirements of the design. Then we introduce the related technologies involved in neural network watermarking. Typically, service providers embed watermarks in the neural networks. Once they suspect a model is stolen from them, they can verify the existence of the watermark in the model. Sometimes, the providers can obtain the suspected model and check the existence of watermarks from the model parameters (white-box). But sometimes, the providers cannot acquire the model. What they can only do is to check the input/output pairs of the suspected model (black-box). We discuss these watermarking methods and potential attacks against the watermarks from the viewpoint of robustness, stealthiness, and security. In the end, we discuss future directions and potential challenges.
A Survey of Intelligent Malware Detection on Windows Platform
Wang Jialai, Zhang Chao, Qi Xuyan, Rong Yi
2021, 58(5):  977-994.  doi:10.7544/issn1000-1239.2021.20200964
Asbtract ( 56 )   PDF (3010KB) ( 55 )  
Related Articles | Metrics
In recent years, malware has brought many negative effects to the development of information technology. In order to solve this problem, how to effectively detect malware has always been a concern. With the rapid development of artificial intelligence, machine learning and deep learning technologies are gradually introduced into the field of malware detection. This type of technology is called intelligent malware detection technology. Compared with traditional detection methods, intelligent detection technology does not need to manually formulate detection rules due to the application of artificial intelligence technology. Besides, intelligent detection technology has stronger generalization capabilities, and can better detect previously unseen malware. Intelligent malware detection has become a research hotspot in the field of detection. This paper mainly introduces current work related to intelligent malware detection, which includes the main parts required for intelligent detection processes. Specifically, we have systematically explained and classified related work for intelligent malware detection in this paper, which includes the features commonly used in intelligent detection, how to perform feature processing, the commonly used classifiers in intelligent detection, and the main problems faced by current malware intelligent detection. Finally, we summarize the full paper and clarify the potential future research directions, aiming to contribute to the development of intelligent malware detection.
An Unsupervised Method for Timely Exfiltration Attack Discovery
Feng Yun, Liu Baoxu, Zhang Jinli, Wang Xutong, Liu Chaoge, Shen Mingzhe, Liu Qixu
2021, 58(5):  995-1005.  doi:10.7544/issn1000-1239.2021.20200902
Asbtract ( 41 )   PDF (1222KB) ( 42 )  
Related Articles | Metrics
In recent years, exfiltration attacks have become one of the severest threats to cyber security. In addition to malware, human beings, especially insiders, can also become the executor of the attack. The obvious anomalous digital footprint left by an insider can be minuscule, which brings challenges to timely attack discovery and malicious operation analysis and reconstruction in real-world scenarios. To address the challenge, a method is proposed, which treats each user as an independent subject and detects the anomaly by comparing the deviation between current behavior and the normal historical behavior. We take one session as a unit to achieve timely attack discovery. We use unsupervised algorithms to avoid the need for a large number of labeled data, which is more practical to real-world scenarios. For the anomalous session detected by the algorithm, we further propose to construct event chains. On the one hand, it can restore the specific exfiltration operation; on the other hand, it can determine the attack more accurately by matching it with the exfiltration attack mode. Then, the experiments are undertaken using the public CMU CERT insider threat dataset, and the results show that the accuracy rates were more than 99%, and there were no false-negative and low false-positive, demonstrate that our method is effective and superior.
Privacy-Preserving Network Attack Provenance Based on Graph Convolutional Neural Network
Li Teng, Qiao Wei, Zhang Jiawei, Gao Yiyang, Wang Shenao, Shen Yulong, Ma Jianfeng
2021, 58(5):  1006-1020.  doi:10.7544/issn1000-1239.2021.20200942
Asbtract ( 40 )   PDF (4206KB) ( 47 )  
Related Articles | Metrics
APT(advanced persistent threat) attacks have a long incubation time and a vital purpose. It can destroy the inside’s enterprise security fortress, employing variant Trojans, ransomware, and botnet. However, the existing attack source tracing methods only target a single log or traffic data, making it impossible to trace the complete process of multi-stage attacks. Because of the complicated log relationship, serious state explosion problems will occur in the log relationship graph, making it difficult to classify and identify attacks accurately. Simultaneously, data privacy protection is rarely considered in using log and traffic data for attack tracing approaches. We propose an attack tracing method based on a Graph Convolutional Network (GCN) with user data privacy protection to solve these problems. Supervised learning solves the state explosion caused by multiple log relationship connections, optimizing the Louvain community discovery algorithm to improve detection speed and accuracy. Moreover, using map neural networks to attack classification effectively and combining privacy protection scheme leveraging CP-ABE (Ciphertext-Policy Attribute Based Encryption) properties realize log data secure sharing in public cloud. In this paper, the detection speed and efficiency of four APT attack testing methods are reproduced. Experimental results show that the detection time of this method can be reduced by 90% at most, and the accuracy can reach 92%.
A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning
Yang Wang, Gao Mingzhe, Jiang Ting
2021, 58(5):  1021-1034.  doi:10.7544/issn1000-1239.2021.20200912
Asbtract ( 79 )   PDF (2242KB) ( 48 )  
Related Articles | Metrics
With the popularity of the Internet and the rapid development of 5G communication technology, the threats to cyberspace are increasing, especially the exponential increase in the number of malware and the explosive increase in the number of variants of their families. The traditional signature-based malware detection is too slow to handle the millions of new malwares emerged every day, while the false positive and false negative rates of general machine learning classifiers are significantly too high. At the same time malware packing, obfuscation and other adversarial techniques have caused more trouble to the situation. Based on this, we propose a static malware detection framework based on multi-feature ensemble learning. By extracting the non-PE (Portable Executable) structure feature, visible string feature, sink assembly code sequences feature, PE structure feature and function call relationship feature from the malware, we construct models matching each feature, and use Bagging and Stacking ensemble algorithms to reduce the risk of overfitting. Finally we adopt the weighted voting algorithm to further aggregate the output results of the ensemble model. The experimental results show the detection accuracy of multi-feature multi-model aggregation algorithm can reach 96.99%, which prove the method has better malware identification ability than other static detection methods, and higher recognition rate for malwares using packing or obfuscation techniques.
Digital Currency Features Oriented Fine-Grained Code Injection Attack Detection
Sun Cong, Li Zhankui, Chen Liang, Ma Jianfeng, Qiao Xinbo
2021, 58(5):  1035-1044.  doi:10.7544/issn1000-1239.2021.20200937
Asbtract ( 28 )   PDF (1008KB) ( 28 )  
Related Articles | Metrics
Digital currencies have developed rapidly and emerged as a critical form of our payment system. Consequently, the applications and platforms of digital currencies and their payment services are extensively exposed to various exploits by malware. In a typical scenario, modern ransomware usually leverages digital currencies as the medium of payment. The state-of-the-art code injection attack detections have rarely considered such digital currency-related memory features, thus can hardly identify the malicious behaviors of ransomware. To mitigate this issue, we propose a fine-grained scheme of memory forensics to facilitate the detection of host-based code injection attacks with the ability to identify ransomware. We capture the digital currency-related memory features exhibited in the procedure of inducing the victims’ payment. We incorporate such memory features into a set of general memory features and implement a fine-grained detection system on code injection attacks. According to the experimental results, the new scheme of memory forensics effectively improves the performance of the state-of-the-art detection system on different metrics. Meanwhile, our approach enables the detection systems of host-based code injection attacks to capture the behaviors of ransomware precisely. Moreover, the extraction of the newly proposed memory features is efficient, and our detection system is capable of detecting unknown malware families.
RAIN: A Lightweight Block Cipher Towards Software, Hardware and Threshold Implementations
Cao Meichun, Zhang Wenying, Chen Yanqin, Xing Zhaohui, Wu Lei
2021, 58(5):  1045-1055.  doi:10.7544/issn1000-1239.2021.20200933
Asbtract ( 19 )   PDF (787KB) ( 29 )  
Related Articles | Metrics
The lightweight block cipher RAIN proposed in this paper is based on the SPN(substitution permutation network) structure widely used in international block cipher design. It provides strong avalanche utility through iterative confusion layer S-box and diffusion layer, which not only guarantees strong security, but also takes into account the implementation of software and hardware. The algorithm supports 64b block and 128b block. Two different block lengths are implemented using the same round function structure, and the scheme is simple and beautiful. The confusion layer is implemented using a 4b S-box. When the S-box is implemented, not only its security is considered, but also the software and hardware implementation of the S-box is considered. The hybrid operation of the diffusion layer provides high implementation performance. We evaluated the algorithm and give differential analysis, impossible differential analysis, integral attack and invariant subspace analysis. In the process of analysis, we combined some of the latest analysis methods and automated search based on MILP. Our algorithm can resist the existing analysis methods, and has greater safety redundancy. RAIN algorithm is efficient on software and hardware implementation, and it has excellent performance on PC, ARM platform and hardware FPGA platform. The algorithm S-box can be converted into basic logic operations, and the cost of resisting side channel attacks is low.
Security Analysis of SIMON32/64 Based on Deep Learning
Wang Huijiao, Cong Peng, Jiang Hua, Wei Yongzhuang
2021, 58(5):  1056-1064.  doi:10.7544/issn1000-1239.2021.20200900
Asbtract ( 29 )   PDF (1991KB) ( 17 )  
Related Articles | Metrics
With the rapid development of the Internet of Things, lightweight block cipher provides a solid foundation for the data security in various resource constrained environments. Currently, the security analysis of lightweight block ciphers tends to be more and more automated and intelligent. Applying deep learning to analyze the security of lightweight block ciphers appears to be a new research hotspot in this area. In this paper, the neural network technology is used to the security analysis of SIMON32/64, a lightweight block cipher algorithm released by the National Security Agency (NSA) in 2013. The feedforward neural network and the convolutional neural network are used to simulate the case of single input differential to multi output differential in multi differential cryptanalysis. Some deep learning distinguishers of 6-round (or even 9-round) reduced SIMON32/64 are designed, and both the advantages and disadvantages of the two neural network structures under different conditions are investigated. A candidate key sieving method for the 9-round reduced SIMON32/64 is also presented by extending the 7-round distinguisher of the feed-forward and the convolution neural networks, where one round forward and one round backward of this 7-round distinguisher are respectively considered. The experimental results show that 65535 candidate keys were dramatically reduced to 675 by only using 128 chosen plaintext pairs. Compared with the traditional differential distinguishers of reduced SIMON32/64, the new distinguishers combined with deep learning notably reduce both the time complexity and data complexity.
Secret Image Sharing Schemes Based on Region Convolution Neural Network
Liu Yanxiao, Wu Ping, Sun Qindong
2021, 58(5):  1065-1074.  doi:10.7544/issn1000-1239.2021.20200898
Asbtract ( 25 )   PDF (3165KB) ( 24 )  
Related Articles | Metrics
Digital image has become an important information carrier in the era of rapid network development, the security protection of image information has also become an important research topic in the security field. Secret image sharing is a threshold based approach that can protect confidential information in an image among multiple users. This scheme encrypts the secret image into several shadow images according a threshold and distributes them to different users. When the number of users reaches the threshold, the original image can be reconstructed, otherwise the user cannot obtain any information about the original image. The classification and recognition of image information is the premise and basis for image secret sharing, CNN (convolutional neural network) has higher accuracy and faster speed in image classification and recognition. In this paper, we combine CNN based image recognition and classification with secret image sharing together to applying the tool of deep learning in the field of information protection. First, we adopt Faster RCNN (region convolutional neural network) model to segment a secret image into multiple regions, where each region has dierent level, then progressive secret image sharing and secret image sharing with essential shadows are constructed, where the region with higher importance level needs higher threshold in reconstruction, this feature makes the image secret sharing scheme suitable for more application scenarios Compared with traditional image recognition methods based on artificial features, the use of Faster RCNN can greatly improve the efficiency of image classification and recognition, thereby further enhancing the application value of image secret sharing.
GRD-GNN: Graph Reconstruction Defense for Graph Neural Network
Chen Jinyin, Huang Guohan, Zhang Dunjie, Zhang Xuhong, Ji Shouling
2021, 58(5):  1075-1091.  doi:10.7544/issn1000-1239.2021.20200935
Asbtract ( 30 )   PDF (7680KB) ( 21 )  
Related Articles | Metrics
Recent years, graph neural network (GNN) has been widely applied in our daily life for its satisfying performance in graph representation learning, and such as e-commerce, social media and biology, etc. However, research has suggested that GNNs are vulnerable to adversarial attacks carefully crafted, leading the GNN model to fail. Therefore, it is essential to improve the robustness of graph neural network. Several defense methods have been proposed to improve the robustness of GNNs. However, how to reduce the attack success rate of adversarial attacks while ensuring the performance of the main task of the GNN still remains a challenge. Through the observation of various adversarial samples, it is concluded that the node pairs connected by adversarial edges have characteristics of low structural similarity and low node feature similarity compared with the clean ones. Based on the observation, we propose a graph reconstruction defense for graph neural network named GRD-GNN. Considering both graph structure and node features, both the number of common neighbors and the similarity of nodes are applied to guide the graph reconstruction. GRD-GNN not only removes the adversarial edges, but also adds edges that are beneficial to the performance of the GNN to enhance the graph structure. At last, comprehensive experiments on three real-world datasets verify the art-of-the-state defense performance of proposed GRD-GNN compared with baselines. Additionally, the explanation of the results of experiments and analysis of effectiveness of the method are shown in the paper.
Evaluating Privacy Risks of Deep Learning Based General-Purpose Language Models
Pan Xudong, Zhang Mi, Yan Yifan, Lu Yifan, Yang Min
2021, 58(5):  1092-1105.  doi:10.7544/issn1000-1239.2021.20200908
Asbtract ( 23 )   PDF (3325KB) ( 19 )  
Related Articles | Metrics
Recently, a variety of Transformer-based GPLMs (general-purpose language models), including Google’s BERT (bidirectional encoder representation from transformers), are proposed in NLP (natural language processing). GPLMs help achieve state-of-the-art performance on a wide range of NLP tasks, and are applied in industrial applications. Despite their generality and promising performance, a recent research work first shows that an attacker, who has access to the textual embeddings produced by GPLMs, can infer whether the original text contains a specific keyword with high accuracy. However, the previous work has the following limitations. First, they only consider the occurrence of one sensitive word as the sensitive information to steal, which is still far from a threatening privacy violation. Besides, their attack requires several rather strict security assumptions on the attacker’s capability, e.g., the attacker knows which GPLM produces the victim’s textual embeddings. Moreover, they only consider the GPLMs designed for English texts. To address the aforementioned limitations and serve as a complement to their work, this paper proposes a more comprehensive privacy theft chain which is designed to explore whether there are even more privacy risks in general-purpose language models. Via experiments on 13 commercial GPLMs, we empirically show that an attacker can step by step infer the GPLM type behind the textual embedding with near 100% accuracy, then infer the textual length with over 70% on average and finally probe sensitive words that possibly occur in the original text, which brings useful information for the attacker to finally reconstruct the sensitive semantics. Besides, this paper also evaluates the privacy risks of three typical general-purpose language models in Chinese. The results confirm that privacy risks also exist in Chinese general-purpose language models, which calls for mitigation studies in the future.
An Evasion Algorithm to Fool Fingerprint Detector for Deep Neural Networks
Qian Yaguan, He Niannian, Guo Yankai, Wang Bin, Li Hui, Gu Zhaoquan, Zhang Xuhong, Wu Chunming
2021, 58(5):  1106-1117.  doi:10.7544/issn1000-1239.2021.20200903
Asbtract ( 56 )   PDF (1757KB) ( 60 )  
Related Articles | Metrics
With the successful application of deep neural networks in various fields, the protection of intellectual property of models becomes more important. Since training the deep neural network requires a large number of computing resources, labor costs, and time costs, some people attempt to build a local substitute model with lower cost by stealing the target model’s parameters. For protecting the intellectual property of model owners, a model fingerprint matching method is proposed recently, which uses the fingerprint examples near the decision boundary of the model and their fingerprints to check whether their models have been stolen. The advantage of this method is that it does not affect the performance of the model itself. However, this protection strategy has some vulnerabilities, and we propose an evasion algorithm to successfully bypass the protection. The key component of our evasion algorithm is a fingerprint-example detector termed as Fingerprint-GAN. The Fingerprint-GAN first learns the feature representation and distribution of normal examples in a latent space. According to the difference of the feature representation in the latent space between the fingerprint examples and the normal examples, the Fingerprint-GAN finds the fingerprint examples. Finally, the labels of the fingerprint examples different from the predictions are returned to fool fingerprint matching method of the target model owner. Extensive experiments are conducted on CIFAR-10 and CIFAR-100. The results show that the detection rate of this algorithm for fingerprint examples can reach 95% and 94%, respectively, while the model owner’s fingerprint matching success rate is only 19%, which proves the unreliability of the model fingerprint matching protection method.
Content Type Based Jumping Probability Caching Mechanism in NDN
Guo Jiang, Wang Miao, Zhang Yujun
2021, 58(5):  1118-1128.  doi:10.7544/issn1000-1239.2021.20190871
Asbtract ( 21 )   PDF (3255KB) ( 11 )  
Related Articles | Metrics
In-network caching, which makes every networking node have a universal cache function, has become a key technology in NDN (named data networking) to achieve efficient access to information and to effectively reduce Internet backbone traffic. When users need to obtain information, any networking node (e.g., router) caching their content can directly provide the corresponding content after receiving their request so as to improve the response efficiency of user requests. However, NDN adopts a ubiquitous caching policy, which caches the content repeatedly and indiscriminately on the transmission path between the content provider and user, resulting in data redundancy and indiscriminate content caching. To this end, we propose a based on content type jumping probability caching mechanism in NDN. According to content features (e.g., delay requirement and bandwidth occupation), we first divide into four content types including dynamic, realtime, big data, and small data. We then build the cache policy with hops pending, which stores data on transmission nodes discontinuously in order to reduce redundant cache in space. Based on content types, we provide differential caching service to reduce redundancy furtherly and to improve the user's efficiency in retrieving content, such as no-cache, networking edge-based probability cache, networking sub-edge-based probability cache, and networking core-based probability cache. The experimental results confirm that the proposed caching mechanism can reduce data redundancy and the latency of content retrieving.