Advanced Search

    2014  Vol. 51  No. 12

    Abstract:
    The two-tiered wireless sensor networks have become the research hotspot because of its scalability and long lifetime. Top-k query is an important query type but most Top-k query technologies cannot perform precise query. A Top-k query asks for data items whose numeric attributes are among the k highest, where k is an application-dependent parameter. The privacy preservation is important for Top-k query in a hostile environment because sensitive information may leak from compromised nodes. The integrity and authenticity of the Top-k query results should be verified because the adversary can instruct a compromised master node to delete or modify data in response to Top-k queries. This paper presents a precise query algorithm called PI-TQ (privacy-preserving integrity-verification Top-k query) which provides both privacy preservation and integrity verification. The algorithm uses a two-step query method to reduce the communication traffic between nodes and sinks. To ensure the privacy and correctness of the query results, the perturbation algorithm is utilized to protect the privacy of sensitive data. The neighbor verification method achieves integrity verification by using probability space. The simulation results show that PI-TQ algorithm can greatly reduce the computational cost and traffic consumption compared with other algorithms. It can also guarantee accuracy, privacy and integrity of the query results.
    Abstract:
    With the development of Internet, especially, the occurrence of the concept of cloud computing, there is an increasing demand for the search and process of encrypted data, which makes the fully homomorphic encryption become more and more important. The concept of fully homomorphic encryption was first introduced by Rivest et al. in 1970s. How to construct such schemes is a hard problem for cryptographers. Until 2009, Gentry presented the first fully homomorphic schemes based on ideal lattice, which is a breakthrough in this field. After that, many cryptographers have done some interesting work which promote the fully homomorphric schemes to be practical in future. Fully homomorphric encryption becomes a very trendy topic in cryptography. This paper discusses the main progress on fully homomorphric schemes including the first homomorphic encryption introduced by Gentry and its optimizations, as well as the fully homorphric schemes based on integer and learning with errors problem (LWE problem). Then, the general application framework of fully homomorphic scheme is provided. Cloud computing, electronic voting and digital watermarking are taken as examples to introduce the significant value of application of fully homomorphric encryption.
    Abstract:
    To timely prevent the diffusion of tampered information and make full use of authentic information, a fragile watermarking scheme is proposed to detect the validity of webpage and mark the tampered parts of it on the browser. To locate the tampered locations, a webpage is firstly divided into webpage-blocks according to the labels which include the displayed characters on the browser. For each webpage-block, the watermark data is generated based on the content of it and randomly hidden into the color attributions of other webpage-blocks according to the secret key, which will further improve the secrecy of watermark data. The validity of a webpage-block is determined by comparing the number of the inconsistent bits with the given threshold. If a webpage-block is judged as tampered, the displayed characters of the webpage-block are replaced with the given symbol to shield the tampered information, and the last-label attributions of it are changed to highlight the tampered webpage-block. These strategies not only prevent the spread of tampered information in webpage, but also make efficient use of the unmodified ones in it. Experimental results show that the file delta of watermarked webpage is smaller. The proposed method achieves good imperceptibility and the tampered webpage-blocks are reasonably and accurately localized on the browser when the threshold value is two. Whats more, time efficiency is also improved.
    Abstract:
    With the rapid development of cloud storage, more and more people prefer to transfer personal files and data from local to cloud and share them with each other by cloud storage service. Nevertheless, each user has different access privileges on account of different identities, roles, etc. Given the different access privileges among the multi-user sharing the same file in cloud environment, an efficient file sharing scheme for multi-user is presented. Based on Elgamal cryptographic system and proxy re-encryption, multi-user access to different contents of the same file encrypted once by file owner is achieved in the proposed scheme. Compared with previous proposals, the scheme has the following advantages: the computation of the shared file encrypted by file owner is only proportional to the exponentiations of the amount of file blocks, regardless of bilinear pairings. Besides, users access to different contents of the same file by exponentiations with linear times regardless of bilinear pairings. Furthermore, ciphertext space for storage has no increments. Further analysis shows that our scheme adapts to the characteristics of cloud computing, which means cloud service providers provide fast computation and huge storage for thin clients.
    Abstract:
    Granular computing, which imitates human beings thinking, is an approach for knowledge representation and data mining. Its basic computing unit is called granule, and its objective is to establish effective computation models for dealing with large scale complex data and information. In order to study knowledge acquisition in ordered information systems with multi-granular labels, rough set approximations based on ordered granular labeled structures are explored. The concept of ordered labeled structures is first introduced. A dominance relation on the universe of discourse from an ordered labeled structure is also defined. Dominated labeled blocks determined by the dominance relation are constructed. Ordered lower approximations and ordered upper approximations, as well as ordered labeled lower approximations and ordered labeled upper approximations of sets based on dominance relations, are then proposed. Properties of approximation operators are examined. It is further proved that the qualities of lower and upper approximations of a set derived from an ordered labeled structure are a dual pair of necessity measure and possibility measure. Finally, multi-scale ordered granular labeled structures are defined and relationships among rough approximations with different scales induced from multi-scale ordered granular labeled structures are discussed.
    Abstract:
    Based on interval analysis and immune principles, some properties of solutions on nonlinear interval number programming are investigated, and an immune optimization approach as well as its theoretical foundations are explored. Firstly, the concept of optimal solution for such nonlinear programming is developed based on the version of optimal-valued interval. Some properties of efficient solutions on interval-valued programming are found, while an inherent solution relation is obtained between such nonlinear programming and interval natural extension programming. This derives an efficient pathway to find the optimal solution in terms of sufficient conditions acquired. Secondly, based on simplified metaphors of the immune response, a micro-immune optimization approach is proposed with the characteristics of small populations, few adjustable parameters, simple and non-master-slave structures. It is also proven to be convergent with low computational complexity. Comparatively numerical results show that such an efficient and effective approach is potential to nonlinear interval number programming problems with low or somewhat high dimensions.
    Abstract:
    Unmanned surface vehicle (USV) is a kind of important marine autonomous robots, which has been studied and applied to practice gradually. However, the autonomy of USV is still restricted by the performance of autonomous navigation technology. Especially, the problem of adaptive obstacle avoidance in complicated sea-state marine environments needs to be solved urgently. In the paper, an adaptive avoidance decision process model is proposed for USV to solve the problem of obstacle avoidance in complicated sea-state marine environments. By analyzing the disturbance factors from complicated sea-state marine environments, the model is constructed on the basis of Sarsa on-policy reinforcement learning algorithm. By setting the GLIE (greedy in the limit and infinite exploration) as the action exploration, the convergence of the adaptive avoidance decision process has been proved. The convergence shows that the action can converge to the optimal action strategy with the probability value of one. The proved result demonstrates that the performance of obstacle avoidance of USV in the complicated sea-state marine environment can be enhanced under the action of on-policy reinforcement learning algorithm.
    Abstract:
    Region function is an integral part of urban planning. The extraction and mining of ridership characteristic can be regarded as data support of region function recognition. The advance of intelligent transportation technology in metro system enables the collection of spatial-temporal passenger flow data, which conveys human mobility and indicates the similarity between metro stations, also closely related to the region function during different periods. This paper discusses the ridership characteristic clustering using passenger trip pattern and metro station flow pattern extracted from metro passenger flow data. Firstly, we identify the passenger flow centrality and station tide flow from passenger trip pattern and metro station flow pattern, which imply the region function of metro stations. Secondly, by discovering the similarity between region cluster and text analysis, we take advantage of the classical probabilistic graphical model and propose a novel LDA-based region ridership characteristic clustering model, allocating metro stations with similar ridership characteristic into the same region. Thirdly, the experimental results show the passenger flow relationship among regions and recognize the region functions during different periods. The analysis of clustering results gives us a good understanding of how passenger flow circulates during different periods and may enables many valuable services like network design and crowd evacuation.
    Abstract:
    Face recognition is among the most popular biometric approaches due to its low intrusiveness and high uniqueness. However, recent face recognition algorithms such as sparse representation-based classification algorithms, often ignore the noises during training process. When training samples are corrupted, recognition performance will significantly degenerate. Therefore, face recognition is an active yet challenging topic in computer vision application. To solve this problem, this paper proposes a novel low-rank relaxed collaborative representation algorithm which is combined with global and local features. The low-rank decomposition algorithm separates sparse noise from training samples to get more effective face information. Suppressing the effect of sparse noise is essential for face recognition system. Meanwhile, considering the fact that the different features in a sample should contribute differently to the pattern classification, this paper uses relaxed collaboration representation to get more discriminating coding coefficients. The discrimination of face recognition system will be enhanced. In addition, combining local features with global features can further improve the recognition rate. The proposed algorithm contains all of the above advantages. Experimental results on face databases show that, although both training and testing image data might be corrupted due to occlusion and disguise, the proposed algorithm is very competitive with state-of-the-art recognition approaches on effectiveness and robustness.
    Abstract:
    To solve some complicated function optimization problems, the SEIRS algorithm is constructed based on the SEIRS epidemic model. The algorithm supposes that some human individuals exist in an ecosystem; each individual is characterized by a number of features; an infectious disease exists in the ecosystem and infects among individuals; and the disease attacks a part of features of an individual. Each infected individual passes through such stages as suspected, exposed, infected and removed, which determine synthetically the physique strength of an individual. The algorithm uses the transferring mechanism of the infectious disease described by the SEIRS epidemic model to construct some operators so as to enable individuals to exchange feature information among them easily. Results show that the E-E, I-I and R-R operator can transfer feature information from some strong individuals to a weak individual so as to make the latter grow better; the S-E, S-R, E-I(ω) and R-S(ω) operator ensure an individual to obtain average feature information from other individuals so as to reduce probability that the individual drops into local optima; the S-S operator can expand an individual’s search scope by increasing its vitality; the E-R and I-R operator have the characteristics of both the S-S operator and the S-E, S-R, E-I(ω) and R-S(ω) operator; The individuals with strong physique can continue to grow, while the individuals with weak physique stop growing, which ensures the algorithm to have global convergence. Some case studies show that the algorithm has characteristics of strong search capability and high convergence speed for the complicated functions optimization problems.
    Abstract:
    The minimum spanning tree(MST) algorithm is one of the most classic algorithms in the graph theory. Some complex graph algorithms based on MST including clustering, classification and shortest path queries have been improved significantly in terms of efficiency and quality. However, with the rapid development of Internet, the scales of graphs have been becoming larger and larger. Large scale graphs which contain millions or even billions of vertices have become more common. Therefore, how to implement query processing and data mining algorithms on large scale graphs has become a problem to be solved urgently. In addition, because of the dynamic properties of large-scale graphs, how to maintain results dynamically has also become one of the most attractive problems. However, the state of the art MST algorithms can’t handle such massive and dynamic graph data. In this paper, we propose a vertex-driven parallel MST algorithm called PB based on a partition Prim algorithm named PP, and demonstrate the correctness of PB. Moreover, we implement the whole process of PB algorithm on the MapReduce and BSP framework respectively. Taking deletion-only graphs into consideration, we put forward a maintenance algorithm for MST which can conduct efficient incremental evaluation. All the computation and maintenance costs are further analyzed and compared. Finally, experiments based on real and synthetic data sets demonstrate the reliabilty, efficiency and scalability of our algorithms.
    Abstract:
    Recently, Skyline computing has been playing a more and more important role in decision-making applications. Centralized processing has become relatively mature. Today with explosion of big data, Skyline computing faces the same problem of big data processing. MapReduce is a parallel model and it is widely used in data-intensive processing. As we all know, processing on MapReduce requires the task be decomposable. There are some partition methods for Skyline computing on MapReduce, such as grid partition, angle-based partition and so on. Grid partition can only get good performance on low dimensional dataset. Angle-based partition applies to both low dimensional and high dimensional dataset. But it needs a complex and time-consuming coordinates conversion process before partitioning. In this paper, we employ a method similar to angle-based partition method called hyperplane-projections-based partition to break down our dataset. It applies to both low dimensional and high dimensional dataset and at the same time the coordinates conversion process before partitioning is very simple. We propose an algorithm to process Skyline computing on MapReduce called MR-HPP(MapReduce with hyperplane-projections-based partition) based on hyperplane-projections partition. Moreover, we propose an effective filter method called PSF(presorting filter) in the filter period of MR-HPP. Extensive comparative experiments based on Hadoop have proved that our method is accurate, efficient and stable.
    Abstract:
    The overhead of memory allocation is one of the major bottlenecks for shared-memory MapReduce, especially for the applications that have large amount of keys. In order to solve this problem, this paper presents a less memory consumption MapReduce, namely MALK, which can high-efficiently process applications with a large number of keys. Firstly, MALK succeeds in avoiding the constant allocations of massive small memory blocks by managing the discrete keys using contiguous area of storage. Secondly, MALK pipelines the process of Map-tasks and Reduce-tasks to decrease the active data in the system at the same time, and proposes a reusable mechanism of Hash table to reuse the memory space so as to avoid the memory reallocation of Hash table. What is more, MALK determines the suitable number of Reduce tasks, by evaluating the effect of task quantity and granularity on performance, to get optimal performance. The experiments show that, compared with Phoenix++, MALK achieves up to 3.8X higher speedup (average of 2.8X), and saves up to 95.2% memory in Map phase and 87.8% memory in Reduce phase. In addition, MALK reduces 30% waiting time with better load balance in Reduce phase, and cuts down more than 35% cache miss rate on average.
    Abstract:
    In recent years, in order to improve the accuracy of SMT (statistical machine translation) system, massive corpus has been widely applied to train language and translation models. As the scale of the language and translation models increase, computing performance becomes a challenging issue for SMT, which makes existing single-machine translation algorithms and systems difficult to complete the computation in time, especially when dealing with online translation. In order to overcome the limitations of single-machine translation decoding algorithm and improve the computing performance of large-scale SMT toward a practical online translation system, this paper proposes a distributed and parallel translation decoding algorithm and framework by adopting a distributed storage and parallel query mechanism upon both the language and translation models. We develop a hierarchical phrase parallel decoder by using a distributed memory database to store and query large-scale translation and language model tables. To further improve the speed of parallel decoding, we also make three additional optimizations: 1) Transform the synchronous rules in translation model table and the Trie data structure of language model table into a Hash indexed key-value structure for use in the distributed memory database; 2) Modify the cube-pruning algorithm to make it suitable for batch query; 3) Adopt and optimize the batch query for language model and translation model tables to reduce the network overhead. Our implemented algorithm can achieve fast decoding of SMT based on large-scale corpus and provide excellent scalability. Experimental results show that, compared with the single-machine decoder, our parallel decoder can reach 2.7 times of speedup for single sentence translation and reach 11.7 times of speedup for batch translation jobs, achieving significant performance improvement.
    Abstract:
    With the development of the Internet, it has been a trend of manual tagging, labeling and sharing videos. Rational use of these swarm intelligence will help to improve the effectiveness of video advertising. The method presented in this paper first collects the fine-grained user video tags, and generates the video hotspots by the video timeline-weighted method. Then, based on the idea of the classification matching, the description of the video hotspots can be used to select the advertising. At last, the time points that the popular attention has dropped by the biggest level are found to put advertising. Experiments show that, among the mega-scale video set, the content correlation between the hotspot and the advertisements selected by this method can reach 85%. The probability that the users close ads windows is less than 10%. Compared with the ads system that has been widely adopted so far, the average broadcast time of the new method can be increased by 21.5%, the click-through rate is improved from 0.65% to 0.73%.
    Abstract:
    The intelligent service system with context-aware capability for coal mine industry can provide the most appropriate information services timely according to the miners’ real situation. As a result, all information system resources can be taken full advantage of so as to help miners to prevent various hidden risks in advance. However, there is no detailed discussion and in-depth research on how to implement service system currently. In order to bridge the gap between the theory and the practice in the field, three critical problems need to be solved: 1) How to model the served miners’ context; 2) How to provide the information services to meet miners’ customize demands; 3) How to verify the validity of service invocation. According to the characteristics and practical needs in coal mine, this paper first constructs the coal mine semantic sensor network ontology (CMSSN) and establishes the miner’s whole context model. Then, the coupling relationship between context information and business service can be defined by means of the configuration model, which can realize the customized service invocation. Thirdly, the method of computational experiment is proposed to verify the availability and validity of current service system, which can simulate various accident scenes in the virtual coal mine. At last, this paper gives a discussion of the potential problem in realizing the system as well as the future research focus.
    Abstract:
    With the advent and popularity of multi-core architecture, on-chip bus (OCB) is gradually becoming the bottleneck of the functionality and performance of the system on chip (SoC). Consequently, the formal verification of OCB turns to be a significant aspect of SoC design. As a key formal verification technique, model checking performs an exhaustive procedure to automatically examine behaviors of SoC and determine if the specifications are satisfied by it. Nevertheless, model checking suffers from state space explosion problem while the expressive power of the existing specification languages such as computation tree logic (CTL) and linear temporal logic (LTL) is limited. This paper presents a propositional projection temporal logic (PPTL) based symbolic model checking approach for WISHBONE on-chip bus. With this approach, the WISHBONE bus designed in Verilog hardware description language (HDL) is transformed to system model described in SMV input language of NuSMV model checker, while the desired property is expressed in a PPTL formula. Then whether the system model satisfies the property or not can be determined with PLSMC, a PPTL symbolic model checking tool proposed in our previous work. The experiment results show that this approach can be applied to the verification of qualitative properties, as well as quantitative properties such as iteration and time duration for WISHBONE on-chip bus.
    Abstract:
    Fault localization is an important task of program debugging. The statistical fault localization techniques, based on slice spectrum, can reduce the effort of fault localization by analyzing the program slices. However, the effectiveness of these techniques depends on slice selection criteria and suspiciousness computing formulas. Thus, we propose a fault localization framework to evaluate the influence on the effectiveness of fault localization by the above two factors. Firstly, we compute the full slices of failed runs and the execution slices of passed runs, respectively. Secondly, we give a definition of the similarity between slice spectrums, and develop a set of slices selection criteria to construct a hybrid slice spectrum. Finally, we choose a suspiciousness evaluation formula and then generate a fault location report. To investigate the impact of the two factors (i.e., the similarity between slices, and the evaluation formulas) on the effectiveness of fault localization, we conduct empirical study on several classical Java benchmarks consisting of more than 90 faults. The experimental result suggests that the performance of Wong, Russel&Rao, and Binary cannot be influenced by the similarity of slice spectrum. However, our proposed formula (HSS), Tarantula, DStar, Naish1, and Naish2 can perform better on slice spectrum of lower similarity.
    Abstract:
    Human behavior is profoundly affected by individuals and the social network that links them together. We base our study on the important model of influence network largely due to DeGroot. In this model, the social structure of a society is described by a weighted and possibly directed network. Each node in the network takes an initial position about a common question of interest. At each date, nodes communicate with each other in the social network and update their positions because of the influences from neighbors. This paper presents a framework to analyze the controllability of social complex networks via influence. We show how the opinion, or attitude about some common questions can be controlled by a subset of committed nodes who consistently proselytize the opposing opinion and are immune to influence. Some controllable criteria are established to guarantee that a network can be fully or partially controllable. Besides, the methods to control an influence network are proposed. Because structural controllability has been proposed as an analytical framework for making predictions regarding the control of complex networks in the physical and life sciences, the relationship between influence controllability and structural controllability of networks is also presented.
    Abstract:
    As a kind of new-arising social media, Microblog has accumulated hundreds of millions of users in four years and the amount is still increasing quickly, because of its brevity, instantaneity and openness. The social influence of Microblog becomes more and more widely nowadays. It is significant to research for the community detection in the network of Microblog’s users both in theory and application. On one hand, most Microblog’s users are real persons and thus finding communities’ structure will help in revealing the behavior pattern of human being; on the other hand, Microblog’s users can be classified different groups based on the results from community detection, which will facilitate the accomplishment of targeted advertising. Given the features of Microblog, i.e., a directed network and the arbitrariness in establishing the following relation, this paper proposes a kind of similarity measure for users based on their behavior that following others and being followed by others, and defines its modularity function, then designs the community detection approach based on the modularity maximization inspired by the idea of fast greedy algorithm. Furthermore, this method has been generalized to Microblog network with tag information of users. Three real networks are processed in this approach. The results show that the approach proposed in this paper is more efficient on detecting the community structure of network of Microblog’s users, compared with Newman’s modularity maximization method, Infomap method and Walktrap method.