ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 June 2018, Volume 55 Issue 6
Situation, Trends and Prospects of Deep Learning Applied to Cyberspace Security
Zhang Yuqing, Dong Ying, Liu Caiyun, Lei Kenan, Sun Hongyu
2018, 55(6):  1117-1142.  doi:10.7544/issn1000-1239.2018.20170649
Asbtract ( 4251 )   HTML ( 216)   PDF (3633KB) ( 3364 )  
Related Articles | Metrics
Recently, research on deep learning applied to cyberspace security has caused increasing academic concern, and this survey analyzes the current research situation and trends of deep learning applied to cyberspace security in terms of classification algorithms, feature extraction and learning performance. Currently deep learning is mainly applied to malware detection and intrusion detection, and this survey reveals the existing problems of these applications: feature selection, which could be achieved by extracting features from raw data; self-adaptability, achieved by early-exit strategy to update the model in real time; interpretability, achieved by influence functions to obtain the correspondence between features and classification labels. Then, top 10 obstacles and opportunities in deep learning research are summarized. Based on this, top 10 obstacles and opportunities of deep learning applied to cyberspace security are at first proposed, which falls into three categories. The first category is intrinsic vulnerabilities of deep learning to adversarial attacks and privacy-theft attacks. The second category is sequence-model related problems, including program syntax analysis, program code generation and long-term dependences in sequence modeling. The third category is learning performance problems, including poor interpretability and traceability, poor self-adaptability and self-learning ability, false positives and data unbalance. Main obstacles and their opportunities among the top 10 are analyzed, and we also point out that applications using classification models are vulnerable to adversarial attacks and the most effective solution is adversarial training; collaborative deep learning applications are vulnerable to privacy-theft attacks, and prospective defense is teacher-student model. Finally, future research trends of deep learning applied to cyberspace security are introduced.
Private Spatial Decomposition with Adaptive Grid
Zhang Xiaojian, Jin Kaizhong, Meng Xiaofeng
2018, 55(6):  1143-1156.  doi:10.7544/issn1000-1239.2018.20160963
Asbtract ( 1369 )   HTML ( 4)   PDF (5569KB) ( 639 )  
Related Articles | Metrics
Grid-based differentially private spatial decomposition has attracted considerable research attention in recent years. The trade-off among the size of spatial data, data skew, and Laplace noise directly constrains the accuracy of decomposition. Most state-of-the-art methods based on grid cannot efficiently accommodate the three constraints. To address the above questions, this paper proposes a three-layer adaptive grid, called STAG, to decompose the spatial data with differential privacy. STAG employs Bernoulli random sampling method to retrieve the samples as decomposition data in the second level. According to the different query granularities in the second level, some cells whose counts are smaller than the given threshold will be filtered by exponential mechanism and high-pass filter techniques. For the cells whose counts are over the threshold, STAG uses Down-Split method to decompose them into fine-grained cells in the third level. For the filtered cells, STAG utilizes Up-Merge method to group them into coarse-grained cells with optimal grouping skill in the first level. STAG method is compared with the existing methods such as UG, AG, Kd-Stand, and Kd-Hybrid on the large-scale real datasets. The experimental results show that the STAG outperforms its competitors, achieves the accurate decomposition and results of range query.
Covert Sequence Channel Based on HTTP/2 Protocol
Liu Zhengyi, Song Tian
2018, 55(6):  1157-1166.  doi:10.7544/issn1000-1239.2018.20170451
Asbtract ( 1198 )   HTML ( 7)   PDF (2110KB) ( 553 )  
Related Articles | Metrics
Covert communication technology offers effective privacy-preserving and secure data transmission services with covertness in behavior and content. Existing covert storage channels have always been questioned about their covertness. On the other hand, covert timing channels mainly use middle and lower layer network protocols as overt channels, which usually requires complex encoding methods to reduce bit error rates. It is hard to satisfy the transmission rate requirements through current covert timing channels as well. In this paper, we present H2CSC, a new covert sequence channel approach over the next-generation application layer HTTP/2 protocol. H2CSC controls and manipulates the responses of HTTP/2 Web server to its requests, forming a kind of covert sequence from the stream IDs of those response frames. Then, H2CSC exploits combinatorial coding methods to embed covert bits into these frame sequences. It takes advantage of HTTP/2 protocol to provide channel reliability and security. We implement H2CSC method in the widely used Apache Web server as a function module, and examine the channel’s effectiveness and robustness in the real system. We further evaluate the covertness of this channel by using a detection method based on logistic regression of corrected conditional entropy. The experimental results show that H2CSC could provide 574bps of covert transmission rates with excellent robustness and covertness.
Trust-Based Multi-Objectives Task Assignment Model in Cloud Service System
Shu Jian, Liang Changyong, Xu Jian
2018, 55(6):  1167-1179.  doi:10.7544/issn1000-1239.2018.20170404
Asbtract ( 1255 )   HTML ( 5)   PDF (5001KB) ( 515 )  
Related Articles | Metrics
Cloud computing with other emerging information technologies promotes the transformation and upgrading of the service industry. The new mode of cloud service brings the convenience and agility by the remote service and on-demand use. Meanwhile, it also expands the existing information security boundary, trigging new security problems. Trust mechanism provides a good solution for the security problem of cloud service. This paper builds a service-oriented architecture of task assigning system in the cloud environment. It introduces the trust mechanism into cloud services by measuring the trust requirement of tasks and the trust degree of service resources. Considering the execution time, cost and trust as optimization objectives, we propose a business process driven multi-objective task assignment model in the cloud service system. Some typical structures: sequence, parallel, parallel-AND, parallel-OR, parallel-XOR, simple loop and combined loop are introduced to represent the functions of business process structures. The three objectives are set to keep security and trust of the system based on high efficiency and low cost. An improved strength Pareto genetic algorithm 2 (SPGA2) with the local search strategy is proposed to improve the search efficiency of solution space in multi-objective task assignment problem. Finally, simulation experiments verify the availability of the model and the superiority of the algorithm.
A Task Scheduling Method for Cloud Workflow Security
Wang Yawen, Guo Yunfei, Liu Wenyan, Hu Hongchao, Huo Shumin, Cheng Guozhen
2018, 55(6):  1180-1189.  doi:10.7544/issn1000-1239.2018.20170425
Asbtract ( 1211 )   HTML ( 8)   PDF (3197KB) ( 632 )  
Related Articles | Metrics
Most of the cloud workflow systems work in the static and homogeneous environment, which will not only lead to fault propagation, reducing the fault tolerant capability of the system, but also make it easier for attackers to acquire the system environment information, helping them to launch accurate attacks. To solve the problem, the task scheduling method for cloud workflow security is proposed. On the basis of the multi-level task division mode in the workflow system, this method employs the task scheduling to avoid the consistent attacks on specific tasks. In order to effectively prevent the attackers from detecting the task execution environment, the diverse operating system images are used to build the heterogeneous task executors, and then the task execution environment is switched dynamically based on these heterogeneous executors, ensuring the randomness of the system environment of cloud workflow. Furthermore, in order to improve the security gain of the heterogeneous systems, the heterogeneity degrees of the executors are quantified, and the quantization results are mapped to the scheduling selection probability, ensuring a significant difference in task execution environments before and after the scheduling. In the experiment, three kinds of attack methods are simulated to test the security of the improved cloud workflow system, and experimental results demonstrate that this method can effectively improve the security of the cloud workflow systems.
Revocable Attribute Based Encryption in Cloud Storage
Wang Guangbo, Liu Haitao, Wang Chenlu, Wang Pengcheng, Lian Lin, Hui Wentao
2018, 55(6):  1190-1200.  doi:10.7544/issn1000-1239.2018.20170063
Asbtract ( 1314 )   HTML ( 9)   PDF (2145KB) ( 599 )  
Related Articles | Metrics
Attribute-based encryption (ABE) scheme which can achieve fine-grained access control is more and more widely used in cloud storage. However, it is an important challenge to solve dynamic user and attribute revocation in the original scheme. In order to solve this problem, this paper proposes a ciphertext-policy ABE (CP-ABE) scheme which can achieve attribute level user attribution, namely if an attribute of some user is revoked, it cannot influence the common access of other legitimate attributes. If an attribute is revoked, the ciphertext corresponding to this attribute should be updated based on the designed broadcast attribute-based encryption scheme so that only the persons whose attributes meet the access strategy and have not been revoked will be able to carry out the key updating and decrypt the ciphertext successfully. Our scheme is proved secure based on the q-Parallel Bilinear Diffie-Hellman Exponent assumption in the standard model, therefore, it has stronger security. In addition, the relative operations associated with the attributes revocation are migrated to the cloud storage provider (CSP) to implement, which reduces the computational load of attribute authority (AA) greatly. Finally, the performance analysis and experimental verification are carried out in this paper, and the experimental results show that, compared with the existing revocation schemes, although our scheme increases the computational load of CSP for achieving the attribute revocation, it does not need the participation of AA, which reduces the computational load of AA. In addition, the user does not need any additional parameters to achieve the attribute revocation except of the private key, thus saving the storage space greatly.
Impossible Differential Attack of Block Cipher ARIA
Xie Gaoqi, Wei Hongru
2018, 55(6):  1201-1210.  doi:10.7544/issn1000-1239.2018.20170275
Asbtract ( 1487 )   HTML ( 10)   PDF (1826KB) ( 578 )  
Related Articles | Metrics
ARIA cipher is a new block cipher proposed by some South Korean experts in 2003. The design principle of ARIA is similar to the AES, and it has relatively high security. ARIA was established as a Korean Standard block cipher algorithm by Korean Agency for Technology and Standards in 2004. Combining the features of ARIA algorithm, a new impossible differential attack on 7-round ARIA is proposed by adding 2-round at the beginning and 1-round at the end. It is shown that this new impossible differential attack requires a data complexity of about 2\+\{119\}chosen plaintexts and a time complexity of about 2\+\{218\}7-round ARIA encryptions. Compared with the previous impossible differential attacks, this attack efficiently reduces the data complexity and time complexity. Similar to the attack of 7-round, a new impossible differential attack on 8-round ARIA is proposed first time by adding 2-round at the beginning and 2-round at the end. It is shown that this new impossible differential attack requires a data complexity of about 2\+\{207\} chosen plaintexts and a time complexity of about 2\+\{346\}8-round ARIA encryptions. It has exceeded the attack complexity of exhaustive search attack, so we can believe that ARIA cryptographic algorithm is safe in this path of 8-round impossible differential attack.
Reversible Data Hiding in Encrypted Image Based on Neighborhood Prediction Using XOR-Permutation Encryption
Yan Shu, Chen Fan, He Hongjie
2018, 55(6):  1211-1221.  doi:10.7544/issn1000-1239.2018.20170295
Asbtract ( 1146 )   HTML ( 11)   PDF (5457KB) ( 512 )  
Related Articles | Metrics
To improve the security of encrypted image as well as the quality of decrypted image, this paper proposes a neighborhood-prediction based reversible data hiding method in encrypted image (RDH-EI) which is generated by XOR-permutation encryption. In this paper, XOR-permutation is conducted to encrypt original image, which can reduce the risk of encrypted content disclosure due to the fact that both statistical information and location information of original pixels are hidden. According to the data hiding key, some encrypted pixels are pseudo-randomly chosen for data hiding, and secret information is embedded into the most significant bit (MSB) of chosen pixels by the bit replacement operation. In the image decryption phase, the possible marked pixels are predicted and corrected by comparing the difference between each pixel and its neighborhood average value to improve the quality of decrypted image. In the image recovery phase, for each marked pixel obtained by the data hiding key, five neighborhood templates are designed to compute its fluctuation value, which is used to deduce whether the MSB of it is changed or not. This paper discusses and analyzes the threshold selection, the prediction accuracy and the security of encrypted image contents. Experimental results demonstrate that the proposed neighborhood prediction method can correctly predict at least 96% marked pixels. The proposed RDH-EI scheme can not only enhance the security of encrypted image content, but also improve the quality of decrypted image, evidenced that the PSNR is about 5~23dB higher than the existing similar RDH-EI methods with the same embedded payload.
Deduplication on Encrypted Data Based on Zero-Knowledge Proof and Key Transmission
He Simeng, Yang Chao, Jiang Qi, Yang Li, Ma Jianfeng
2018, 55(6):  1222-1235.  doi:10.7544/issn1000-1239.2018.20170415
Asbtract ( 1272 )   HTML ( 5)   PDF (3388KB) ( 613 )  
Related Articles | Metrics
Data deduplication has been widely used in cloud storage servers to reduce bandwidth and save resource effectively. At present, the key chosen to encrypt the file is always the convergent key in the client-based deduplication, so when parts of the file are revealed or the file is poor in entropy, convergent encryption cannot guarantee the semantic security. As for ownership of the file, now the way in some protocols is to check certain numbers of the file blocks to response the challenge of the server, so it cannot prove the whole ownership of the file. In another word, this way is only in a certain probability condition to ensure the ownership of the file. Apart from above, some protocols choose a third party server to participate in the program. Through this way, we need higher security assumption, and it is not suitable for the reality scenes. In this paper, we propose a scheme to deduplicate encrypted data stored in cloud based on zero-knowledge proof and hidden credential retrieval. It uses zero-knowledge proof to achieve the proof of ownership of the file and hidden credential retrieval to transmit the encrypted key to file owners who have proved their ownership of the file. The result shows that our protocol is more efficient and effective. It is easy to be implemented. Meanwhile it improves the security of the ownership authentication and proposes a new key transmission method.
Adaptive App-DDoS Detection Method Based on Improved AP Algorithm
Liu Zihao, Zhang Bin, Zhu Ning, Tang Huilin
2018, 55(6):  1236-1246.  doi:10.7544/issn1000-1239.2018.20170124
Asbtract ( 964 )   HTML ( 5)   PDF (2147KB) ( 576 )  
Related Articles | Metrics
As it is complicated for training samples and difficult for updating models in behavior-based application layer DDoS detection methods, an adaptive App-DDoS detection method based on improved affinity propagation (IAP) algorithm is proposed. Firstly, to optimize the affinity propagation algorithm, we previously divide the dataset into several parts utilizing the limited priori knowledge, and merge the similar clusters for enhancing the ability of processing large amount of data. Besides, the abnormal clusters cleaning mechanism is introduced so as to avoid their interference for the detection results. Secondly, some user behavior attributes are given to represent behavior features, and the improved AP algorithm is applied to efficiently clustering the proposed attributes, as a result, improving the detection rate for abnormal users. Then by evaluating the quality of clusters with Silhouette index in real-time, a self-updating learning mechanism is put forward to support the resistance of analyzing the distribution of normal users’ attributions, which further reduces the false positive rate and increases the detection rate. The experimental results on real dataset, ClerkNet-Http, show that the proposed method is more effective and more accurate compared with the conventional AP algorithm and KMPCA algorithm, as well as can update clusters by itself in the process of detection.
IC Design with Multiple Engines Running CBC Mode SM4 Algorithm
Fan Lingyan, Zhou Meng, Luo Jianjun, Liu Hailuan
2018, 55(6):  1247-1253.  doi:10.7544/issn1000-1239.2018.20170144
Asbtract ( 1592 )   HTML ( 9)   PDF (2058KB) ( 488 )  
Related Articles | Metrics
With the advantages of fast speed, small size, light weight, strong shock resistance and low power consumption, solid state drive (SSD) becomes the new generation of computer hard disk storage products.Hard disk information security is not only related to personal privacy, corporate password, but also related to national security.In order to solve the information security problems of solid state drive, a hardware circuit implementing the SM4 algorithm is presented, which is promulgated by China’s State Cryptography Administration Office of Security Commercial Code Administration.This method can encrypt data that is stored in a drive, and improve security of stored data.To obtain the high speed data stream of the SSD, SM4 algorithm in cipher block chaining(CBC)mode had to be designed to run at the matched speed with data throughput.A circuit structure with multiple SM4 engines operating in parallel is proposed, which is beneficial for the SM4 feedback loop delay, pipelining technology and combination of wheel function under 65nm standard-cell process. After the verification done by FPGA, the circuit has been implemented with 65nm semiconductor process.The evaluation results show that its sequential read speed is 5288MBps and its sequential write speed is 4435MBps, which meets to SATAⅢ’s interface’s performance.
An Adaptive Scale Control Method of Multiple UAVs for Persistent Surveillance
Jing Tian, Wang Tao, Wang Weiping, Li Xiaobo, Zhou Xin
2018, 55(6):  1254-1262.  doi:10.7544/issn1000-1239.2018.20170311
Asbtract ( 1175 )   HTML ( 10)   PDF (3356KB) ( 579 )  
Related Articles | Metrics
Unmanned aerial vehicles swarm persistent surveillance is an important application in the multiple unmanned aerial vehicles (UAVs). With the increasing complexity of environment and tasks in surveillance mission,the requirement of UAV swarm reconfiguration and flexibility is also rising. To the adaptive and reconfigurable UAVs swarm, the amount of UAV is one of the basic control factors. However, most studies in UAV swarm control focus on control cooperative path planning in given mission, while dynamic deployment of the UAV amount in swarm system is neglected. In the surveillance design of traditional UAVs swarm, the amount of swarm is hard to adaptively adjust to match the different surveillance environments and various situations. To solve this kind of problem, a “digital turf” variation model is proposed on the base of the regional information entropy. Moreover, we imitate a dynamic balancing mechanism in the turf-herbivore ecosystem and design the scale control method in target region-UAV swarm. What’s more, on this basis, we study the biomes matrix and equilibrium point situation when surveillance system reaches stable and discusses adaptive adjusting method of UAV swarm in different mission environments with different efficiency constraints. Finally, the existence of equilibrium point and the convergence of system are demonstrated by simulation.
Granularity Selections in Generalized Incomplete Multi-Granular Labeled Decision Systems
Wu Weizhi, Yang Li, Tan Anhui, Xu Youhong
2018, 55(6):  1263-1272.  doi:10.7544/issn1000-1239.2018.20170233
Asbtract ( 1113 )   HTML ( 2)   PDF (940KB) ( 586 )  
Related Articles | Metrics
Granular computing (GrC), which imitates human being’s thinking, is an approach for knowledge representation and data mining. Its basic computing unit is called granules, and its objective is to establish effective computation models for dealing with large scale complex data and information. The main directions in the study of granular computing are the construction, interpretation, representation of granules, the selection of granularities and relations among granules which are represented by granular IF-THEN rules with granular variables and their relevant granular values. In order to investigate knowledge acquisition in the sense of decision rules in incomplete information systems with multi-granular labels, the concept of generalized incomplete multi-granular labeled information systems is first introduced. Information granules with different labels of granulation as well as their relationships from generalized incomplete multi-granular labeled information systems are then represented. Lower and upper approximations of sets with different levels of granulation are further defined and their properties are presented. The concept of granularity label selections in generalized incomplete multi-granular labeled information systems is also proposed. It is shown that the collection of all granularity label selections forms a complete lattice. Finally, optimal granular label selections in incomplete multi-granular labeled decision tables are also discussed. Belief and plausibility functions in the Dempster-Shafer theory of evidence are employed to characterize optimal granular label selections in consistent incomplete multi-granular labeled decision systems.
Solving Minimal Hitting Sets Method with SAT Based on DOEC Minimization
Wang Rongquan,Ouyang Dantong,Wang Yiyuan,Liu Siguang, Zhang Liming
2018, 55(6):  1273-1281.  doi:10.7544/issn1000-1239.2018.20160809
Asbtract ( 902 )   HTML ( 2)   PDF (2246KB) ( 361 )  
Related Articles | Metrics
In the model-based diagnosis, the minimal conflict sets is employed to find all corresponding minimal hitting sets. Therefore, how to improve the efficiency of obtaining all minimal hitting sets is a great important issue. In this paper, we propose a new method called SAT-MHS, which is mainly based on the transform method and the set-degree of element coverage(DOEC) strategy. There are two main innivations of this paper. Firstly, SAT-MHS transforms a hitting set problem into the SAT problem, which is a new direction to solve this problem. All the minimal conflict sets are expressed in the form of clauses as the input CNF of the SAT. Secondly, compared with the previous sub-superset detecting minimization (SSDM) strategy, the proposed DOEC strategy can effectively reduce both of solution space and the number of iterations. In details, the time consumption of DOEC is along with the number of all minimal conflict sets, not depending on the size of the given problem.Experimental results show that SAT-MHS outperforms the previous state-of-the-art method and the time speed ratio of SAT-MHS rises to 10-20 times, especially for some large instances. Moreover, we also carry out extensive experiments to demonstrate that the performance of DOEC strategy is better than SSDM, even up to about 40 times.
Weibo Popularity Prediction Method Based on Propagation Acceleration
Zhu Hailong, Yun Xiaochun, Han Zhishuai
2018, 55(6):  1282-1293.  doi:10.7544/issn1000-1239.2018.20161057
Asbtract ( 1126 )   HTML ( 7)   PDF (4169KB) ( 605 )  
Related Articles | Metrics
Weibo popularity prediction attempts to forecast the future diffusion range of Weibo messages based on the propagation features at early stages. The existing methods are mainly depended on messages’ early popularity, ignoring the propagation trend at that time, which leads to poor predicting accuracy when these methods applied on Weibo messages. For the purpose of forecasting the Weibo popularity more accurately and conveniently, we propose a multiple linear regression model: UAPA (user activity propagation acceleration) in this paper. Firstly, we investigate the relationship between future popularity and varying trend of Weibo diffusion, and find that they are positive correlation. Based on this detection, we present the concept of propagation acceleration which describes the spreading varying speed of Weibo, then we build a predicting model based on propagation acceleration and popularity at early stages. Furthermore, we analyze the Weibo user periodic activity and find that the retweeting times of users vary greatly at different time in one day, and the messages’ popularity and propagation acceleration are also distinct at various moments. In the light of this finding, we optimize the predicting model by user activity. Finally, we compare prediction accuracy of UAPA model mentioned above with representative popularity prediction methods on two real datasets, with 1000 thousands and 410 thousands messages respectively, and discuss the influence of parameter value in UAPA model on prediction performance. Experiments show that UAPA model is superior to the existing methods on multiple indicators.
HL-DAQ: A Dynamic Adaptive Quantization Coding for Hash Learning
Zhao Liang, Wang Yongli, Du Zhongshu, Chen Guangsheng
2018, 55(6):  1294-1307.  doi:10.7544/issn1000-1239.2018.20170238
Asbtract ( 880 )   HTML ( 2)   PDF (6304KB) ( 456 )  
Related Articles | Metrics
The existing binary coding methods for Hash learning usually learn a set of hypergraphs for data projection, and then simply translate the result data into binary code from the division of each hyperplane. While these methods all ignore the fact that the information may be distributed unevenly in the whole projection dimension, and the range of data value in each projection dimension may not be the same. In order to solve this problem, we propose a dynamic adaptive quantization coding method called HL-DAQ, which allocates the corresponding binary coding bits to each projection dimension dynamically according to the amount of information of it. And HL-DAQ maximizes the total information of all the projections through the dynamic programming method with the purpose to preserve the neighbor structure of the original data as much as possible. Experiments prove that the dynamic adaptive quantization coding for Hash learning method proposed in this paper has significant improvement over the traditional quantization methods for Hash. It is proved that the dynamic adaptive coding for Hash learning method and the dynamic adaptive distance measurement method keep the neighbor structure of the original data better than the original quantization coding methods that fix bit and the original distance measurement method such as Hamming distance.
MapReduce Back Propagation Algorithm Based on Structure Parallelism
Ren Gang, Deng Pan, Yang Chao, Wu Changmao
2018, 55(6):  1308-1319.  doi:10.7544/issn1000-1239.2018.20170024
Asbtract ( 1018 )   HTML ( 4)   PDF (3151KB) ( 479 )  
Related Articles | Metrics
Back propagation (BP) algorithm is a widely used learning algorithm that is used for training multiple layer neural networks. BP algorithm based on Hadoop cluster and MapReduce parallel programming model (MRBP) shows good performance on processing big data problems. However, it lacks the capability of fine-grained parallelism. Thus, when confronted with high dimension data and neural networks with large nodes, the performance is low relatively. On the other hand, since the users can’t control the communication of Hadoop computing nodes, the existing structure parallel scheme based on clusters can’t be directly applied to MRBP algorithm. This paper proposes a structure parallelism based MRBP algorithm (SP-MRBP), which adopts layer-wise parallelism, layer-wise ensemble (LPLE) strategy to implement structure parallel computing. Also, we derive the analytical expressions of the proposed SP-MRBP algorithm and the classic MRBP algorithm, and obtain the time differences between the both algorithms as well as the optimal number of parallel structures of SP-MRBP algorithm. To the best knowledge of the authors, it is the first time to introduce the structure parallelism scheme to the MRBP algorithm. The experimental results show that, compared with the classic MRBP algorithm, our algorithm has better performance on processing efficiency when facing large neural networks.
A Collaborative Collusion Detection Method Based on Online Clustering
Sun Yong, Tan Wenan, Jin Ting, Zhou Liangguang
2018, 55(6):  1320-1332.  doi:10.7544/issn1000-1239.2018.20170231
Asbtract ( 1189 )   HTML ( 13)   PDF (3974KB) ( 674 )  
Related Articles | Metrics
Cloud computing has been successfully used to integrate various Web services for facilitating the automation of large-scale distributed applications. However, there exist numerous noise ratings given in service-oriented cloud applications by collusion groups. Collusion detection is one of the most import issues in the emerging service-oriented cloud applications. Especially with the emergence of massive Web services, it is still a tough challenge to identify collaborative collusion groups in large-scale cloud systems using the classical clustering algorithm with batch computing mode. To tackle the challenge, a novel online clustering-based detection method is proposed to find collaborative collusion groups in an efficient and effective manner. Firstly, a mini-batch KMeans clustering method is employed to reduce the computational time for mining the large-scale service data; secondly, to improve the quality of the online clustering, a new and modified update rule is designed for the mini-batch KMeans clustering method, which adaptively optimizes the clustering weights with variance through an iterative procedure; finally, based on measuring the behavior similarity and group ratings deviation of malicious peers, a binary decision diagram evaluation method is presented for detecting the bias and prestige of collusion groups in a visual manner. Theoretical analysis is conducted for validation purpose. Extensive experimentation and comparison with related work indicate that the proposed approach is feasible and effective.
General Data Quality Assessment Model and Ontological Implementation
Zhang Xiaoran, Yuan Man
2018, 55(6):  1333-1344.  doi:10.7544/issn1000-1239.2018.20160764
Asbtract ( 1738 )   HTML ( 25)   PDF (2109KB) ( 1537 )  
Related Articles | Metrics
With the deep use of data science technology in all kinds of fields, as important assets of enterprise, data has shown more its value and importance. Most of the enterprises develop data quality detection systems to solve their own data quality issues by cooperating with the characteristics of industry. The assess models of these systems have different features, and the definitions of data quality dimensions are also different. This thesis attempts to define these models and data quality dimensions in a generic form, and aims to become the standards for the enterprise in developing data quality assess system. Through the analysis of research achievements in the field by domestic and foreign scholars, combining with years of experience in developing data quality detection and assess systems, First, a general mathematical model of data quality detection and assess is proposed. Next, based on this model, adopt ontology technology to define the transformation rules mapping from the general mathematical model to ontology model. Furthermore, considering most of the data stored in a relational database, take relational database as an example, based on the proposed mathematical model and transformation rules to achieve the extraction and construction of data quality assessment ontology. This model realizes the definition of complex quality rules, is standardized and can detect and assess data with different sources and different formats. Finally, combined with oil field of PetroChina development data quality assessment project, an application system is implemented, in order to verify the correctness, scientificalness, rationality and extensibility of the proposed model. Because the proposed data quality detection and assess model has nothing to do with the field, it posses generality.
Quality Constraints-Aware Service Composition Based on Task Granulating
Zhang Yiwen, Cui Guangming, Yan Yuanting, Zhao Shu, Zhang Yanping
2018, 55(6):  1345-1355.  doi:10.7544/issn1000-1239.2018.20170234
Asbtract ( 918 )   HTML ( 4)   PDF (3350KB) ( 413 )  
Related Articles | Metrics
With the development of service computing, more and more sources are released and utilized as services. Competition between service providers grows increasingly fierce. Hence, the win-win cooperation between services becomes an inevitable trend. Moreover, the consideration of the quality constraint correlation between businesses further complicates the service composition optimization problem. To solve it, this paper uses task granulation on service composition business process, and presents a quality constraint-aware service composition method based on task granulation (Tg-QcA) when considering the quality constraint between candidate services. Firstly, this paper makes theoretical analysis and verifies that each QoS aggregation has component mode and the utility function of multi-attribute service composition problem still has component mode, thereby guaranteeing the completeness of task-granulation optimization method. Secondly, a quality constraint based model is built and then a task granulation partition is made through the subjection degree between tasks to decompose the original problem, thereby reducing the solving scale of problem. Finally, it is demonstrated by computer simulation that this algorithm and this model have strong feasibility, efficiency and stability.