ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 January 2015, Volume 52 Issue 1
Survey on Computer System Simulator
Liu Yuchen,Wang Jia,Chen Yunji,Jiao Shuai
2015, 52(1):  3-15.  doi:10.7544/issn1000-1239.2015.20140104
Asbtract ( 1873 )   HTML ( 14)   PDF (1696KB) ( 1558 )  
References | Related Articles | Metrics
Computer system simulator has long been a useful tool for researchers. It is applied in many different areas, from code design to software programming. In the development of simulators, performance has always been the main focus of researchers, and the improvement of performance will in return benefit the performance of real computers. A number of optimization work has been proposed in both serial and parallel simulations, such as threaded code, binary translation, FPGA accelerator, simulation separation techniques in serial simulation and the solution for the load balance, synchronization and communication in parallel simulation. In this paper, we provide several basic rules and structures that are used in common simulator design, and summarize recent studies of serial and parallel simulation and simulators. First, we introduce the current development of simulators, including current research results, technical problems and challenges. Then, we talk about the structure and the classification of current simulators. After that, the technique in serial simulators is introduced, and the optimization work in parallel simulation is also organized, according to the problems they tend to solve. Some mature simulators as well as simulation platforms are presented later in the paper. At last, potential issues and future work are also introduced.
Recent Advances in Bayesian Machine Learning
Zhu Jun,Hu Wenbo
2015, 52(1):  16-26.  doi:10.7544/issn1000-1239.2015.20140107
Asbtract ( 4443 )   HTML ( 127)   PDF (2137KB) ( 5003 )  
Related Articles | Metrics
With the fast growth of big data, statistical machine learning has attracted tremendous attention from both industry and academia, with many successful applications in vision, speech, natural language, and biology. In particular, the last decades have seen the fast development of Bayesian machine learning, which is now representing a very important class of techniques. In this article, we provide an overview of the recent advances in Bayesian machine learning, including the basics of Bayesian machine learning theory and methods, nonparametric Bayesian methods and inference algorithms, and regularized Bayesian inference. Finally, we also highlight the challenges and recent progress on large-scale Bayesian learning for big data, and discuss on some future directions.
History and Recent Developments of AVS Video Coding Standards
Ma Siwei
2015, 52(1):  27-37.  doi:10.7544/issn1000-1239.2015.20140106
Asbtract ( 2001 )   HTML ( 12)   PDF (3233KB) ( 1000 )  
Related Articles | Metrics
AVS(audio video coding standard) is the informal name of Work group for Digital Audiovideo Coding Standard of China, which was founded by the Science and Technology Department under former Ministry of Information Industry in June 2002, approved by Standardization Administration of China. The role of the group is to establish general technical standards for the compression, decoding, processing, and the representation of digital audio-video, thereby enabling digital audio-video equipment and systems with high-efficiency and economical coding/decoding technologies. After more than ten years, AVS has established a series of video coding standards, including AVS1, AVS+ and AVS2. AVS1 and AVS2 are named from the first and second stage work of AVS, or the first and second generation standard, and AVS+ was established specially for China high definition TV broadcasting specially. AVS1 and AVS+ have been finished and widely used in various applications so far, and AVS2 is still under developing and will be released soon. This paper provides an overview of the history and recent developments of AVS video coding standards, including the key tools used in AVS and the comparison with the state-of-the-art technology, e.g. HEVC/H265. Moreover, a brief discussion and conclusion on the future video coding are provided.
Topology Optimization for Minimal Volume in 3D Printing
Xu Wenpeng,Wang Weiming,Li Hang,Yang Zhouwang,Liu Xiuping,Liu Ligang
2015, 52(1):  38-44.  doi:10.7544/issn1000-1239.2015.20140108
Asbtract ( 1586 )   HTML ( 5)   PDF (2555KB) ( 937 )  
Related Articles | Metrics
Compared with the traditional products manufacturing pattern, the cost of 3D printing products is still relatively high. It is important to optimize model to reduce print material consumption and printing costs without sacrificing print quality of the object surface. To solve this problem, we present a topology optimization algorithm for minimal volume in 3D printing with traditional evolutionary structural optimization methods combined with Von Mises stress. The algorithm calculates Von Mises stress of the model to guide the evolution of the volume reducing, until the maximum Von Mises stress reaches the allowable stress value of the material. Furthermore, we introduce multi-resolution technology to accelerate optimization computing from the coarse tetrahedral meshes to fine meshes, which effectively improves the computational efficiency. Compared with other existing methods, the optimization results of our method can be more flexible and better reflect the load transfer path of model under the given force.
A Replay System for Performance Analysis of Multi-Threaded Programs
Zheng Long, Liao Xiaofei, Wu Song, Jin Hai
2015, 52(1):  45-55.  doi:10.7544/issn1000-1239.2015.20140105
Asbtract ( 1403 )   HTML ( 1)   PDF (2461KB) ( 793 )  
Related Articles | Metrics
In recent years, it is a hotspot for program analysis to detect performance bugs in multi-threaded applications. However, traditional record/replay systems focusing on concurrent anomalies have many limitations to tackle the issues of performance bugs, such as replay overhead and imprecision of replay-based execution time. To cope with the problems above, this paper proposes an improved replay system PerfPlay which can be used for the performance analysis of multi-threaded programs. To be specific, we first collect and analyze the requisite information for the program performance. Secondly, the different replay strategies are discussed and then we present a novel schedule-driven strategy to ensure the performance fidelity of replay system. Finally, we study the classical performance problem of “inter-thread unnecessary lock contention” under the framework of PerfPlay. Compared with the traditional replay strategies, our experimental results demonstrate the performance fidelity of PerfPlay. Through the case study, we find a few performance bugs in real-world and further verify the effectiveness of PerfPlay.
Feature Selection for Multi-Label Classification Based on Neighborhood Rough Sets
Duan Jie, Hu Qinghua,Zhang Lingjun,Qian Yuhua,Li Deyu
2015, 52(1):  56-65.  doi:10.7544/issn1000-1239.2015.20140544
Asbtract ( 2106 )   HTML ( 13)   PDF (3560KB) ( 1861 )  
Related Articles | Metrics
Multi-label classification is a kind of complex decision making tasks, where one object may be assigned with more than one decision label. This kind of tasks widely exist in text categorization, image recognition, gene function analysis. Multi-label classification is usually described with high-dimensional vectors, and some of the features are superfluous and irrelevant. A great number of feature selection algorithms have been developed for single-label classification to conquer the curse of dimensionality. However, as to multi-label classification, fewer researches have been reported for designing feature selection algorithms. In this work, we introduce rough sets to multi-label classification for constructing a feature selection algorithm. We redefine the lower approximation and dependency, and discuss the properties of the model. After that, we design a neighborhood rough sets based feature selection algorithm for multi-label classification. Experimental results show the effectiveness of the proposed algorithm.
An Indirect Branch Prediction for Interpreters
Huang Mingkai, Liu Xianhua, Tan Mingxing, Xie Zichao,Cheng Xu
2015, 52(1):  66-82.  doi:10.7544/issn1000-1239.2015.20130970
Asbtract ( 1115 )   HTML ( 3)   PDF (4938KB) ( 683 )  
Related Articles | Metrics
Interpreters are widely used in Java virtual machines, JavaScript execution engines and other managed runtime environments. Interpreters usually use indirect branches to dispatch bytecodes. Indirect branch mispredictions are becoming a critical limit of interpreters’ performance in modern multi-issue multi-stage pipeline microprocessors. This paper proposes a bytecode pointer guided indirect (BGI) branch prediction for interpreters. The key idea is to use interpreter-specific bytecode pointer value to distinguish different indirect branch occurrences. It can improve the prediction accuracy of indirect branches in interpreters, so as to improve the performance of interpreters. Microprocessors usually use BTB to predict branch target addresses. BTB can’t accurately predict the rapidly changing branch target addresses of indirect branches in interpreters because it stores only one target address for one indirect branch. Our technique requires software-hardware cooperation, where interpreter software is augmented with specialized hint instructions to indicate bytecode pointers, and then the branch predictor makes use of the bytecode pointer value to store different target addresses of one indirect branch in different entries of BTB at runtime, so as to improve the prediction accuracy. Experimental results show that it improves the performance by 347% for Java interpreter and by 8.3% for JavaScript interpreter over the most commonly used BTB prediction, and it improves the performance by 21.9% for Java interpreter over the specialized indirect branch predictor TTC.
A Multi-Root I/O Resource Pooling Method Based on Single-Root I/O Virtualization
Wang Zhan,Cao Zheng,Liu Xiaoli,Su Yong,Li Qiang,An Xuejun,Sun Ninghui
2015, 52(1):  83-93.  doi:10.7544/issn1000-1239.2015.20131182
Asbtract ( 1750 )   HTML ( 4)   PDF (3502KB) ( 721 )  
Related Articles | Metrics
Virtualization offers data center with efficient server consolidation and flexible application deployment, but it requires data center servers improve their I/O devices to get with the needs of virtualization, and to make up the performance degradation brought by device virtual sharing between virtual machines. These changes bring the redundancy of I/O devices for each server under current I/O architecture, increase the cost of data center infrastructure and add more I/O cables between servers. To solve these problems, we design and implement a SRIOV-based multi-root I/O resource pooling method. Through a hardware-based PCIe ID remapping and address remapping technology, virtual functions in the same SR-IOV I/O devices can be shared among different physical servers, which efficiently reduces the redundancy of I/O resources under virtualization environment. We also adopt a hotplug-based virtual I/O device allocation method to dynamically adjust resources between servers for increasing resource utilization. Experiments prove our design does can provides functions mentioned above and maintain server I/O performance as it using directly-attached devices.
Green Demands Oriented Data Allocation for Embedded Systems
He Yanxiang,Yu Tao,Chen Yong, Li Qingan,Jiang Nan, Xu Chao,Wen Weidong
2015, 52(1):  94-104.  doi:10.7544/issn1000-1239.2015.20140485
Asbtract ( 1056 )   HTML ( 4)   PDF (2151KB) ( 588 )  
Related Articles | Metrics
Green demands, such as energy efficiency and resource utilization, have become critical issues during the design of embedded systems. Data allocation, one of the most important back-end optimization methods of compiler, can largely influence the utilization of energy and resources. This paper proposes a data allocation approach to improve the effective utilization of resources and energy. First, a green evaluation model for data allocation is proposed in this paper. In this model, green indicators are proposed to represent both energy efficiency and resource utilization. Second, based on the evaluation model, an iterative-style multi-objective data allocation approach is proposed to reduce the energy consumption and to balance the resource utilization. This data allocation approach resorts to two common compilation optimization techniques, i.e., exchangeable instructions rearrangement and register reallocation, to improve the green indicators. In addition, an iterative framework is employed to synthesize the exchangeable instructions rearrangement and register reallocation techniques smoothly to improve the green indicators further. Simulation experiment results show that the proposed method can obtain about 23% improvement on average when GCC compiler is the baseline. Therefore, the proposed method can significantly improve the green indicators.
Multi-Indices Self-Approximate Optimal Power Consumption Control Model of GPU Clusters
Wang Haifeng, Chen Qingkui
2015, 52(1):  105-115.  doi:10.7544/issn1000-1239.2015.20131195
Asbtract ( 1071 )   HTML ( 2)   PDF (2205KB) ( 601 )  
Related Articles | Metrics
GPU clusters have become important high-performance parallel computing systems in the large-scale stream data field. In practice, the computing requires high computing speed, less power consumption and better reliability.So GPU clusters have three significantly performance indices restrainting each others that are computing speed, power consumption and reliability. In real-time computing phase, it needs to dynamically search the optimal point that is the tradeoff among computing speed, power consumption optimization and reliability. So the multi-indices optimization in GPU clusters power consumption control process is a challenging issue. To consider the three indices simultaneously, a comprehensive index is generated by maxinum entropy function that can combine them. Then an adaptable control model is built based on model prediction theory that can dynamically scale power consumption status with the workloads variation. This control model can cap the redundant energy consumption and control the power consumption of the GPU clusters under a specific ideal set point while guaranteeing computing speed and reliability. Compared with the control scheme without considering reliability, the results demonstrate that the proposed control scheme has better control stability and robustness and is very suitable to apply into GPU cluster power management projects to handle the real-time large-scale stream data.
Reachability Indexing for Large-Scale Graphs: Studies and Forecasts
Fu Lizhen,Meng Xiaofeng
2015, 52(1):  116-129.  doi:10.7544/issn1000-1239.2015.20131567
Asbtract ( 1641 )   HTML ( 5)   PDF (3240KB) ( 1476 )  
Related Articles | Metrics
With the rapid development of social networks, biology, ontology and other emerging areas, there are a lot of graph data in real applications. Reachability query is one of the fundamental queries in graphs. For small graphs, depth-first search (DFS) or reachability transitive closure can answer reachability query easily. However, as graphs become larger and larger, above two approaches are no longer feasible, because the query time of DFS is very long and the storage cost of reachability transitive closure is very high. Therefore, lots of reachability indices have been proposed. These approaches have been widely applied in many areas of computer science, such as software engineering, programming languages, distributed computing, social network analysis, biological network analysis, XML database, RDF databases, routing planning and other fields. In addition, reachability indices can also speed up other graph algorithms, such as the shortest path queries and sub-graph matching. In this survey, applications of reachability indexing schemes are introduced firstly. Secondly, according to supported graph size, supported graph type and supported query type, we provide taxonomy of existing works. Then, we provide a comparison of representative works. Finally, we discuss the disadvantages of existing scalable reachability indexing schemes and put forward some suggestions for future research works.
Discovering Extended Conditional Functional Dependencies
Liu Xianmin, Li Jianzhong
2015, 52(1):  130-140.  doi:10.7544/issn1000-1239.2015.20130691
Asbtract ( 1120 )   HTML ( 1)   PDF (3084KB) ( 767 )  
Related Articles | Metrics
eCFD (extended conditional functional dependency) is proposed as the extension of CFD (conditional functional dependency) for data cleaning. Compared with CFD, eCFD can take more patterns of values and catch more semantic information. However, there are only few works about eCFD. This paper focuses on the problem of eCFD discovering, whose counterpart of CFD has been studied very much. As we know, this paper is the first work about eCFD discovering. To avoid inconsistencies and remove redundancies, based on the definitions of strongly validated and weakly non-redundant eCFDs, formal definition of eCFD discovering problem is given and MeCFD method is proposed to solve this problem. MeCFD first generates all basic eCFDs which are weakly non-redundant and semantically equivalent to all strongly validated eCFDs, then constructs compound eCFDs through merging basic eCFDs. Searching candidate space in depth-first order makes MeCFD use only constant memory space to maintain data partitions. Efficient pruning strategies are proposed to improve the performance of MeCFD. Theoretical analysis shows the correctness of MeCFD. Experiments over real data sets show the good scalability of MeCFD and the effectiveness of pruning strategies and optimizing methods.
A Compound Native Object Model Based on the Strategy of Cross-Language Object Migration
Huang Yukun,Chen Rong,Pei Xilong,Cao Jing
2015, 52(1):  141-155.  doi:10.7544/issn1000-1239.2015.20131166
Asbtract ( 1252 )   HTML ( 2)   PDF (3418KB) ( 457 )  
Related Articles | Metrics
The Java native interface (JNI) mechanism, which is designed to handle the interaction between native code and Java code, is currently widely utilized to develop mobile applications. However, JNI is observed hardly from perfection in two points: on one hand, the overhead of invoking functions of JNI interfaces heavily affects programs’ runtime performance; on the other hand, the complexity of the JNI’s programming specification prevents the integration and reusing of third party native components in Java code. To solve these problems, a new strategy is advised to migrate objects between Java components and native components by injecting necessary information of native objects into high-level objects. Guided by this strategy, a model of compound native objects (CNO) is proposed to integrate a Java object and a native object into a compound object which share same metadata maintained by Java class objects. Therefore the CNO model could literally reduce the overheads for the time saving of data type conversions, and lessen down the programming burden of the bridging code. A prototype of the CNO model is implemented on the basis of the Dalvik virtual machine such that Java could reuse third-party components in a dynamical and efficient way. Experiments show that the CNO model outweighs JNI in better performance of accessing native methods.
A Fine-Grained Information Diffusion Model Based on Node Attributes and Content Features
Zhou Donghao,Han Wenbao,Wang Yongjun
2015, 52(1):  156-166.  doi:10.7544/issn1000-1239.2015.20130915
Asbtract ( 1697 )   HTML ( 10)   PDF (2017KB) ( 1519 )  
Related Articles | Metrics
With the development of online social networks, more and more people use Twitter or Weibo to propagate information or share opinions. Predicting the diffusion of information on social networks is a key problem for applications like opinion leader detection, public opinion monitoring or viral marketing. Although there has been extensive research on information diffusion model, what the factors that influence information diffusion are and how to characterize the process of information diffusion on social networks are still open problems. Traditional models pay more attention on network structures, and largely ignore important dimensions such as user attributes or information content features. In this paper, we extract features from multiple dimensions including node attributes and information contents, proposing a fine-grained information diffusion model based on these features. We use stochastic gradient descent method to learn the weights of these features in the proposed model, and then give quantitative analysis. Our model gives an insight into the extent on which these features can influence information diffusion on social network. Besides, given an initial state of information diffusion, our model can be used to predict the future diffusion process. Experiment results on Sina Weibo datasets show that the proposed model yields higher prediction precision than other widely used models like AsIC model or NetRate model.
Analysis and Verification of AADL Hierarchical Schedulers
Fu Ning,Du Chenglie,Li Jianliang,Liu Zhiqiang,Peng Han
2015, 52(1):  167-176.  doi:10.7544/issn1000-1239.2015.20130722
Asbtract ( 1201 )   HTML ( 1)   PDF (3097KB) ( 661 )  
Related Articles | Metrics
In the system based on a hierarchical scheduler, the processor is shared between several collaborative schedulers. Such schedulers are becoming more and more investigated and proposed in reallife applications. For example, the ARINC 653 international standard which defines programming interface for avionic real time operating systems provides such a kind of collaborative schedulers. This article focuses on the modeling and the schedulability analysis of hierarchical schedulers. We investigate the modeling of hierarchical schedulers with architecture analysis and design language (AADL). A model checking based method for analyzing the schedulability of AADL hierarchical schedulers is proposed. The AADL thread components and hierarchical schedulers are modeled as network of timed automatons. The schedulability is described as a set of temporal logic formulas. Then we use a model checker Uppaal to analyze and verify the schedulability of hierarchical schedulers. Our work shows that analyzing schedulability of AADL hierarchical schedulers by model checking is feasible. The method uses an exhaustive method to automate analyze the properties of a system by a model checker. Compared with related works, the proposed method produces more precise results.
Video Object Tracking Based on Appearance Models Learning
Zhang Huanlong,Hu Shiqiang,Yang Guosheng
2015, 52(1):  177-190.  doi:10.7544/issn1000-1239.2015.20130995
Asbtract ( 2087 )   HTML ( 7)   PDF (1777KB) ( 1661 )  
Related Articles | Metrics
Visual tracking is an active reasch topic in the field of computer vision and has been well studied in the last decades. A key component for achieving robust tracking is the tracker’s capability of updating its internal representation of tragets to capture the varying appearance. Although numberous approaches have been proposed, many challenging problems still remain in designing an effective model of the appearance of tracked objects. In recent years, the methods of appearance model associated with statistical learning have been promoting the study for video object tracking. To help reader swiftly learn the rencent advances and trends so as to easily grasp the key problems of visual object tracking based on appearance models learning, a detailed review of the existing appearance learning models is provided. Here, the mechanism of the tracking algorithm based on appearance model learning is introduced firstly. Then the state-of-the-art feature descriptors are analyzed to show their different performance. Meanwhile, the tracking progress is categorized into three main groups, and the character of representative methods in each group are compared and analyzed in detail. Finally, the current research on the tracking methods based appearance model learning is summarized and classified, and the further application and research trend is discussed.
Hidden Topic Variable Graphical Model Based on Deep Learning Framework
Wu Lei,Zhang Wensheng,Wang Jue
2015, 52(1):  191-199.  doi:10.7544/issn1000-1239.2015.20131113
Asbtract ( 1715 )   HTML ( 1)   PDF (2116KB) ( 1334 )  
Related Articles | Metrics
The hidden topic variable graphical model represents potential topics or potential topic changes by nodes. The current study of hidden topic variable graphical models suffers from the flaw that they can only extract single level topic nodes. This paper proposes a probabilistic graphical model based on the framework of deep learning to extract multi-level topic nodes. The model adds the preprocessing layer to the bottom of the hidden topic variable graphical model. The preprocessing layer used in the paper is the self-organizing maps (SOM) model. By introducing the SOM, the model can effectively extract different topic status with those extracted by the hidden topic variable graphical model. In addition, the hidden topic variable graphical model used in this paper is constructed by hidden Markov model (HMM) and conditional random field (CRF). In order to make up the short-distance dependency Markov property, we use the characteristic function defined by first-order logic. On this basis, we propose a new algorithm by hierarchically extracting topic status. Experimental results on both the international universal Amazon sentiment analysis dataset and the Tripadvisor sentiment analysis dataset show that the proposed algorithm improves the accuracy of sentiment analysis. And the new algorithm can mine more macroscopic topic distribution information and local topic information.
A Hierarchical Co-Clustering Algorithm for High-Order Heterogeneous Data
Yang Xinxin, Huang Shaobin
2015, 52(1):  200-210.  doi:10.7544/issn1000-1239.2015.20130493
Asbtract ( 1457 )   HTML ( 3)   PDF (3718KB) ( 789 )  
Related Articles | Metrics
The availability of high-order heterogeneous data represented with multiple features coming from heterogeneous domains is getting more and more common in real world application. High-order co-clustering algorithms can fuse multiple feature space information to improve clustering results effectivity, so recently it is becoming one of the hottest research topics. Most existing high-order co-clustering algorithms are non-hierarchical clustering algorithms. However, there are always hierarchical cluster structures hidden in high-order heterogeneous data. In order to mine the hidden patterns in datasets more effectively, we develop a high-order hierarchical co-clustering algorithm (HHCC). Goodman-Kruskal τ is used to measure the association of objects and features, which is an index measuring association of categorical variables. The objects which are strong association are partitioned into the same objects clusters, and simutaneously the features which are strong association are partitioned into the same features clusters too. HHCC algorithm uses Goodman-Kruskal τ to quantify the quality of clustering results of objects and features of every level. According to optimizing Goodman-Kruskal τ by a locally search approach, the number of clusters is automatically determined and clustering results of every hierarchy are obtained. The top-down strategy is adopted and a tree-like cluster structure is formed at last. Experimental results demonstrate that HHCC algorithm outperforms four classical homogeneous hierarchical algorithms and five previous high-order co-clustering algorithms.
Model and Algorithm of Quantum Neural Network Based on the Controlled Hadamard Gates
Li Panchi, Zhou Hongyan
2015, 52(1):  211-220.  doi:10.7544/issn1000-1239.2015.20131016
Asbtract ( 1360 )   HTML ( 16)   PDF (2910KB) ( 1031 )  
Related Articles | Metrics
To enhance the approximation capability of neural network, a quantum neural network model based on the controlled-Hadamard gates is proposed. This model takes a multi-dimensional discrete sequence as the input, which can be described by a matrix where the number of rows denotes the number of input nodes, and the number of columns denotes the length of discrete sequence. This model consists of three layers, the hidden layer consists of quantum neurons, and the output layer consists of common neurons. The quantum neuron consists of the quantum rotation gates and the multi-qubits controlled-Hadamard gates. Using the information feedback of target qubit from output to input in multi-qubits controlled-Hadamard gates, the overall memory of input sequence is realized. The output of quantum neuron is obtained from the controlled relationship between the control bits and target bit of controlled-Hadamard gates. The learning algorithm is designed in detail according to the basis principles of quantum computation. The characteristics of input sequence can be effectively obtained. The experimental results show that, when the input nodes and the length of the sequence satisfy certain relations, the proposed model is obviously superior to the common BP neural network.
Containing Reasoning and Its Conservative Extensionsin Description Logic FL0
Nie Dengguo,Kang Wangqiang,Cao Fasheng,Wang Ju
2015, 52(1):  221-228.  doi:10.7544/issn1000-1239.2015.20131135
Asbtract ( 1238 )   HTML ( 1)   PDF (859KB) ( 681 )  
Related Articles | Metrics
Conservative extension is an important property in the mathematical logic. Its notion plays acentral role in ontology design and integration. It can be used to formalize ontology refinements,safe mergings of two ontologies, and independent modules inside an ontology. Regarding reasoning support, the most basic task is to decide whether one ontology is a conservative extension of another. If this is not the case, the evolution of the ontology with the original ontology will not be able to maintain the same logical conclusion. In recent years, lightweight description logics (DLs) have gained increasing popularity. In fact ontology is definitely the structured knowledge base in description logic. As we know, knowledge is not always the same, so it needs to be extended as long as new improvement appears in this field. It is concerned that whether it is consistent with the primitive one after extension. The conservative extension of FL0 system is analyzed based on Lutzs’ work. Firstly the FL0 canonical model is constructed and the inclusion inference is reduced to the simulations between two FL0 canonical models. The complexity is pointed out to be polynomial based on the fact that the canonical models’ largest simulation is polynomial. After that the FL0 conservative extension algorithm is presented and its complexity is proved to be exponential.
Personal Privacy Protection in the Era of Big Data
Liu Yahui,Zhang Tieying,Jin Xiaolong,Cheng Xueqi
2015, 52(1):  229-247.  doi:10.7544/issn1000-1239.2015.20131340
Asbtract ( 4774 )   HTML ( 292)   PDF (2350KB) ( 3413 )  
Related Articles | Metrics
With the development of information technology, emerging services based on Web2.0 technologies such as blog, microblog, social networks, and the Internet of things produce various types of data at an unprecedented rate, while cloud computing provides a basic storage infrastructure for big data. All of these lead to the arrival of the big data era. Big data contains great value. Data become the most valuable wealth of the enterprise, but big data also brings grand challenges. Personal privacy protection is one of the major challenges of big data. People on the Internet leave many data footprint with cumulativity and relevance. Personal privacy information can be found by gathering data footprint in together. Malicious people use this information for fraud. It brings many trouble or economic loss to personal life. Therefore, the issue of personal privacy has caused extensive concern of the industry and academia. However, there is little work on the protection of personal privacy at present. Firstly, the basic concepts of big data privacy protection are introduced, and the challenges and research on personal privacy concern are discussed. Secondly, the related technology of privacy protection is described from the data layer, application layer and data display layer. Thirdly, several important aspects of the personal privacy laws and industry standards are probed in the era of big data. Finally, the further research direction of personal privacy protection is put forward.
Proving Method of Ownership of Encrypted Files in Cloud De-Duplication Deletion
Yang Chao, Zhang Junwei, Dong Xuewen, Ma Jianfeng
2015, 52(1):  248-258.  doi:10.7544/issn1000-1239.2015.20130544
Asbtract ( 1617 )   HTML ( 0)   PDF (1732KB) ( 989 )  
Related Articles | Metrics
Abstract As the rapid adoption of cloud storage services, a new technology of client-side deduplication is proposed to save the bandwidth of uploading copies of existing files to the server. This promising technology, however, has been recently found being vulnerable to a new kind of attack, in which by learning just a small piece of information about the file, namely its Hash value, an attacker is able to get the entire file from the server. To solve the problems mentioned above, we propose a cryptographically secure and efficient scheme to support cross-user client side deduplication over encrypted file. The new scheme utilizes the technique of spot checking in which the client only need to access small portions of the original file, dynamic coefficients, randomly chosen indices of the original files and a subtle approach to distribute the file encrypting key among clients to satisfy security requirements. Extensive security analysis shows that the proposed scheme can generate provable ownership of the encrypted file (POEF) with the presence of the curious server, and maintain a high detection probability of the client misbehavior. Both performance analysis and simulation results demonstrate that our proposed scheme is much more efficient than the existing schemes, especially in reducing the burden of the client.