2014 Vol. 51 No. 1
2014, 51(1): 1-16.
Abstract:
Being a new research area of machine learning, deep learning is good at solving some complex problems. As a representative of deep learning, Boltzmann machine is being widely studied. In view of the theoretical significance and practical value of Boltzmann machine, the research and development on Boltzmann machine are reviewed systematically. Firstly, some concepts about Boltzmann machine are summarized, which include configuration of Boltzmann machine as a single layer feedback network and classification of Boltzmann machine according to the topological structure, including general Boltzmann machine, semi-restricted Boltzmann machine and restricted Boltzmann machine. Secondly, the learning procedure of Boltzmann machine is reviewed in detail. Thirdly, several typical algorithms of Boltzmann machine are introduced, such as Gibbs sampling, parallel tempering, variational approach, stochastic approximation procedure, and contrastive divergence. Fourthly, the learning procedure of deep Boltzmann machine is described. New research and development on aspects of algorithms, models and practical application of Boltzmann machine in recent years are expounded then. Finally, the problems to be solved are pointed out.
Being a new research area of machine learning, deep learning is good at solving some complex problems. As a representative of deep learning, Boltzmann machine is being widely studied. In view of the theoretical significance and practical value of Boltzmann machine, the research and development on Boltzmann machine are reviewed systematically. Firstly, some concepts about Boltzmann machine are summarized, which include configuration of Boltzmann machine as a single layer feedback network and classification of Boltzmann machine according to the topological structure, including general Boltzmann machine, semi-restricted Boltzmann machine and restricted Boltzmann machine. Secondly, the learning procedure of Boltzmann machine is reviewed in detail. Thirdly, several typical algorithms of Boltzmann machine are introduced, such as Gibbs sampling, parallel tempering, variational approach, stochastic approximation procedure, and contrastive divergence. Fourthly, the learning procedure of deep Boltzmann machine is described. New research and development on aspects of algorithms, models and practical application of Boltzmann machine in recent years are expounded then. Finally, the problems to be solved are pointed out.
2014, 51(1): 17-30.
Abstract:
With the fast swelling of industry applications, new platforms, such as clusters, datacenter and cloud computing, are becoming mainstream service environments. Meanwhile, high performance chip multiprocessor (CMP) systems begin to play a role on these commercial service platforms as major allocable resources. However, shared resources of CMP architectures would lead to severe co-run performance degradation on concurrent user applications. This problem, if left untouched, would lower the resource utility even users have less demands on the quality of services (QoS). Shared resources are attracting more and more concerns. Proper software scheduling policies keep being effective to the improvement of co-run performance. The paper summarizes the development of key techniques on CMP, variations of shared resources, which are followed with a detailed presentation and analysis for the representative schedulers. Limitations of the state-of-the-art scheduling policies, new challenges are then discussed and prospects for the research trends in the future are presented at the end of the paper.
With the fast swelling of industry applications, new platforms, such as clusters, datacenter and cloud computing, are becoming mainstream service environments. Meanwhile, high performance chip multiprocessor (CMP) systems begin to play a role on these commercial service platforms as major allocable resources. However, shared resources of CMP architectures would lead to severe co-run performance degradation on concurrent user applications. This problem, if left untouched, would lower the resource utility even users have less demands on the quality of services (QoS). Shared resources are attracting more and more concerns. Proper software scheduling policies keep being effective to the improvement of co-run performance. The paper summarizes the development of key techniques on CMP, variations of shared resources, which are followed with a detailed presentation and analysis for the representative schedulers. Limitations of the state-of-the-art scheduling policies, new challenges are then discussed and prospects for the research trends in the future are presented at the end of the paper.
2014, 51(1): 31-40.
Abstract:
With the development of computer graphics technologies, modeling and simulation of cloth animation becomes a common subject in 3D game and animated film related field, and presents the booming trend. The main objective of cloth animation is to obtain rich details of cloth movement and realistic animation. It can be widely used in games, animation and virtual reality, and many other fields. For its complicated nonlinear, anisotropic elastic behavior of the cloth, natural and realistic clothing wrinkles and shapes are difficult to create. The researches on cloth animation based on physical simulation are reviewed and the basic theory, method and framework are summarized. For the widely used methods in cloth animation, the essential process such as modeling, solving dynamics equations, collision detection and response are introduced. With analyzing the related literatures, the advantages, disadvantages and interrelationship of the existing modeling and simulation methods are summarized. Finally, from the demand of realistic performance and real-time interaction, future directions in modeling and simulation of cloth animation are discussed.
With the development of computer graphics technologies, modeling and simulation of cloth animation becomes a common subject in 3D game and animated film related field, and presents the booming trend. The main objective of cloth animation is to obtain rich details of cloth movement and realistic animation. It can be widely used in games, animation and virtual reality, and many other fields. For its complicated nonlinear, anisotropic elastic behavior of the cloth, natural and realistic clothing wrinkles and shapes are difficult to create. The researches on cloth animation based on physical simulation are reviewed and the basic theory, method and framework are summarized. For the widely used methods in cloth animation, the essential process such as modeling, solving dynamics equations, collision detection and response are introduced. With analyzing the related literatures, the advantages, disadvantages and interrelationship of the existing modeling and simulation methods are summarized. Finally, from the demand of realistic performance and real-time interaction, future directions in modeling and simulation of cloth animation are discussed.
2014, 51(1): 41-53.
Abstract:
Aggregation is a commonly used but time-consuming operation in database systems. Relative Compared to exact query, it is often more attractive to return an approximate result with the required error bound to user in a much faster response time. However, we find that none of the previous methods can process approximate aggregation on massive data with arbitrary accuracy and high efficiency. A novel algorithm PAA is proposed to efficiently process approximate aggregation with an arbitrary confidence interval. The data space of dimensional attributes is divided into multiple hypercubes of the same cube size. Each partition maintains the tuples whose dimensional attributes fall into the corresponding hypercube. A random sample RS is pre-constructed on table. PAA consists of two stages. If the approximate result obtained by RS in stage 1 does not satisfy the confidence interval, it is required to retrieve more random tuples from partition set IPS whose hypercubes overlap with search region in stage 2. The novelty of PAA lies in how to retrieve random tuples from IPS when the exact number of tuples satisfying predicate in each partition is unknown and how to reduce random I/O cost of retrieval operation as much as possible. The experimental results show that PAA obtains up to two orders of magnitude speedup compared with the existing methods.
Aggregation is a commonly used but time-consuming operation in database systems. Relative Compared to exact query, it is often more attractive to return an approximate result with the required error bound to user in a much faster response time. However, we find that none of the previous methods can process approximate aggregation on massive data with arbitrary accuracy and high efficiency. A novel algorithm PAA is proposed to efficiently process approximate aggregation with an arbitrary confidence interval. The data space of dimensional attributes is divided into multiple hypercubes of the same cube size. Each partition maintains the tuples whose dimensional attributes fall into the corresponding hypercube. A random sample RS is pre-constructed on table. PAA consists of two stages. If the approximate result obtained by RS in stage 1 does not satisfy the confidence interval, it is required to retrieve more random tuples from partition set IPS whose hypercubes overlap with search region in stage 2. The novelty of PAA lies in how to retrieve random tuples from IPS when the exact number of tuples satisfying predicate in each partition is unknown and how to reduce random I/O cost of retrieval operation as much as possible. The experimental results show that PAA obtains up to two orders of magnitude speedup compared with the existing methods.
2014, 51(1): 54-63.
Abstract:
Recently, a large amount of naturally graph-structured data are generated in various fields, such as social network, bioinformatics, software engineering, knowledge engineering, and etc. Consequently, querying, searching and mining such “graph data” have become the popular research topics. However, due to the extremely high computational complexity, existing keyword search approaches for graph data do not scale well on large graphs. This paper is motivated by a novel study of the user information needs of graph search. We discuss some possible types of graph search and their potentials of being optimized, and propose an idea that we should adopt customized optimization strategies according to the features of different types of graph search. Based on that, we propose an effective heuristic optimization approach for an important and usual particular type of search called “known-item search”. We construct an index to capture the local topology information in the graph. For large-scale graph data, we can use the MapReduce technique to handle the indexing task. By using the index, we can prune matched vertices before the search, and thereby significantly reduce the search space with a little possible loss of the top-k answers as trade-offs. Our experiments testify that the approach can decrease the response time of the known-item search dramatically.
Recently, a large amount of naturally graph-structured data are generated in various fields, such as social network, bioinformatics, software engineering, knowledge engineering, and etc. Consequently, querying, searching and mining such “graph data” have become the popular research topics. However, due to the extremely high computational complexity, existing keyword search approaches for graph data do not scale well on large graphs. This paper is motivated by a novel study of the user information needs of graph search. We discuss some possible types of graph search and their potentials of being optimized, and propose an idea that we should adopt customized optimization strategies according to the features of different types of graph search. Based on that, we propose an effective heuristic optimization approach for an important and usual particular type of search called “known-item search”. We construct an index to capture the local topology information in the graph. For large-scale graph data, we can use the MapReduce technique to handle the indexing task. By using the index, we can prune matched vertices before the search, and thereby significantly reduce the search space with a little possible loss of the top-k answers as trade-offs. Our experiments testify that the approach can decrease the response time of the known-item search dramatically.
2014, 51(1): 64-75.
Abstract:
This paper proposes a method KEE for evaluating entity extraction problem over XML data, which is an important step for identifying entities in XML data. Directed by the XML Key, utilizing the relaxation and verification techniques, KEE provides a rule-based solution for entity extraction problem, which has following characteristics. Firstly, using XML query language, KEE provides a condensed presentation for the entity whose size may get very large when scaling up the data size. Secondly, requiring only one location example to indicate the interests, using relaxation technique, KEE can discover other similar locations automatically. Thirdly, by adjusting the example given to KEE, users can specify their own interesting entity locations and control the locations discovered by KEE. Besides, utilizing the idea of sharing computations, by extending previous automaton techniques for XML query evaluation, an efficient implementation of KEE is provided. Experimental results on both synthetic and real data show that KEE can provide an effective and efficient solution to the entity extraction problem.
This paper proposes a method KEE for evaluating entity extraction problem over XML data, which is an important step for identifying entities in XML data. Directed by the XML Key, utilizing the relaxation and verification techniques, KEE provides a rule-based solution for entity extraction problem, which has following characteristics. Firstly, using XML query language, KEE provides a condensed presentation for the entity whose size may get very large when scaling up the data size. Secondly, requiring only one location example to indicate the interests, using relaxation technique, KEE can discover other similar locations automatically. Thirdly, by adjusting the example given to KEE, users can specify their own interesting entity locations and control the locations discovered by KEE. Besides, utilizing the idea of sharing computations, by extending previous automaton techniques for XML query evaluation, an efficient implementation of KEE is provided. Experimental results on both synthetic and real data show that KEE can provide an effective and efficient solution to the entity extraction problem.
2014, 51(1): 88-95.
Abstract:
The main aim of data stream subspace clustering is to find clusters in subspace in rational time accurately. The existing data stream subspace clustering algorithms are greatly influenced by parameters. Generally, the number of clusters or feature subspace need predefining, and the clustering result can't describe the changes of data stream accurately. Further,they cannot describe the changes of clusters accurately and the clustering result will be influenced. Due to the flaws mentioned above, we propose a new data stream subspace clustering algorithm, SC-RP, in which the number of clusters or the feature subspace need not predefining. SC-RP has the advantages of fast clustering and being insensitive to outliers. When data stream changes, the changes will be recorded by the data structure named Region-tree, and the corresponding statistics information will be updated. Further SC-RP can regulate clustering results in time. According to the experiments on real datasets and synthetic datasets, SC-RP is superior to the existing data stream subspace clustering algorithms on both clustering precision and clustering speed, and it has good scalability to the number of clusters and dimensions.
The main aim of data stream subspace clustering is to find clusters in subspace in rational time accurately. The existing data stream subspace clustering algorithms are greatly influenced by parameters. Generally, the number of clusters or feature subspace need predefining, and the clustering result can't describe the changes of data stream accurately. Further,they cannot describe the changes of clusters accurately and the clustering result will be influenced. Due to the flaws mentioned above, we propose a new data stream subspace clustering algorithm, SC-RP, in which the number of clusters or the feature subspace need not predefining. SC-RP has the advantages of fast clustering and being insensitive to outliers. When data stream changes, the changes will be recorded by the data structure named Region-tree, and the corresponding statistics information will be updated. Further SC-RP can regulate clustering results in time. According to the experiments on real datasets and synthetic datasets, SC-RP is superior to the existing data stream subspace clustering algorithms on both clustering precision and clustering speed, and it has good scalability to the number of clusters and dimensions.
2014, 51(1): 96-103.
Abstract:
Value dependence is an emerging topic in the research of database inference area. Firstly, aiming at the characteristics of value inference and requirement of formal concept models, formal context and concept lattice are presented. Then, formal concept models are investigated, and the formal definition of value dependence and formal concept models are proposed. Next, the relationships between intent reduction of concept lattice and value dependence are analyzed. Two new concepts, inference dependence and α maximum inference dependence, are proposed by introducing the security sensitivity level of the property into value dependence. Furthermore, aiming at the generation rule of α maximum inference dependence, the properties of intent reduction set of formal concept lattice are studied. Specifically, it is proved that a perfect, non-redundant α maximum inference dependence set can be deduced from the intent reduction of formal concept lattice. Finally, an algorithm based on intent reduction of mining the full inference dependence set in database is proposed, and a case study is implemented to prove its effectiveness. In a global view, the study of inference dependence, which is one of the most important attribute dependencies in relational database, shows great insights on the detecting and eliminating database inference channel.
Value dependence is an emerging topic in the research of database inference area. Firstly, aiming at the characteristics of value inference and requirement of formal concept models, formal context and concept lattice are presented. Then, formal concept models are investigated, and the formal definition of value dependence and formal concept models are proposed. Next, the relationships between intent reduction of concept lattice and value dependence are analyzed. Two new concepts, inference dependence and α maximum inference dependence, are proposed by introducing the security sensitivity level of the property into value dependence. Furthermore, aiming at the generation rule of α maximum inference dependence, the properties of intent reduction set of formal concept lattice are studied. Specifically, it is proved that a perfect, non-redundant α maximum inference dependence set can be deduced from the intent reduction of formal concept lattice. Finally, an algorithm based on intent reduction of mining the full inference dependence set in database is proposed, and a case study is implemented to prove its effectiveness. In a global view, the study of inference dependence, which is one of the most important attribute dependencies in relational database, shows great insights on the detecting and eliminating database inference channel.
2014, 51(1): 104-114.
Abstract:
Frequent pattern mining is a popular technique for analyzing transaction datasets. However, because these datasets contain sensitive information (e.g., user behavior records, electronic health records, etc), directly releasing discovered frequent patterns with true support counts will carry significant risk to privacy of individuals. In this paper, we propose an efficient algorithm, called DP-topkP, based on differential privacy model, to accurately find top-k frequent patterns. To avoid the individuals' privacy leakage, in this algorithm, exponential mechanism is used to sample top-k frequent patterns in a candidate set, and Laplace mechanism is utilized to generate the noisy data for perturbing the true support counts of the sampled patterns. However, the noisy support counts returned may be inconsistent with query semantic constraints (e.g., descending order, integer, etc), which will make the utility of the discovered top-k patterns poor. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct the post-processing step. We theoretically prove that the proposed method is (λ,δ)-useful and differentially private. The experimental results demonstrate that this method can maintain better accuracy, utility and scalability.
Frequent pattern mining is a popular technique for analyzing transaction datasets. However, because these datasets contain sensitive information (e.g., user behavior records, electronic health records, etc), directly releasing discovered frequent patterns with true support counts will carry significant risk to privacy of individuals. In this paper, we propose an efficient algorithm, called DP-topkP, based on differential privacy model, to accurately find top-k frequent patterns. To avoid the individuals' privacy leakage, in this algorithm, exponential mechanism is used to sample top-k frequent patterns in a candidate set, and Laplace mechanism is utilized to generate the noisy data for perturbing the true support counts of the sampled patterns. However, the noisy support counts returned may be inconsistent with query semantic constraints (e.g., descending order, integer, etc), which will make the utility of the discovered top-k patterns poor. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct the post-processing step. We theoretically prove that the proposed method is (λ,δ)-useful and differentially private. The experimental results demonstrate that this method can maintain better accuracy, utility and scalability.
2014, 51(1): 115-125.
Abstract:
Location privacy has been a hot topic in recent years. However, the methods of existing location privacy preserving only support simple nearest neighbor queries, which do not consider the obstructed space. But in fact, the obstructed space is very popular in our life. Therefore, we study location privacy preserving obstructed nearest neighbor queries. Due to the effect of obstacles, this problem is also very hard. In this paper, we adopt a normal approach based on the third trusted party for privacy preserving obstructed nearest neighbor (ONN) queries. The approach can entertain the location-based service without leaking the user's exact location and obtain the exact answer. In our approach, firstly, the third trusted party constructs a cloaked region corresponding to the exact location and sends the cloaked region to the LBS. Then LBS processes the query region. In the process of query processing, we use two methods to return a set of candidate answers with respect to the cloaking region for the actual user location: 1)Basic approach of query processing, which uses the max obstructed distance of segment to expand the region and returning the results in the expanded region; 2)Improved approach of query processing, which further narrows the expanded region based on the basic approach. At last, the third trusted party returns the actual answer to the users corresponding to user's exact location .Finally, experimental results and proof theory show the effectiveness and correctness of our approach.
Location privacy has been a hot topic in recent years. However, the methods of existing location privacy preserving only support simple nearest neighbor queries, which do not consider the obstructed space. But in fact, the obstructed space is very popular in our life. Therefore, we study location privacy preserving obstructed nearest neighbor queries. Due to the effect of obstacles, this problem is also very hard. In this paper, we adopt a normal approach based on the third trusted party for privacy preserving obstructed nearest neighbor (ONN) queries. The approach can entertain the location-based service without leaking the user's exact location and obtain the exact answer. In our approach, firstly, the third trusted party constructs a cloaked region corresponding to the exact location and sends the cloaked region to the LBS. Then LBS processes the query region. In the process of query processing, we use two methods to return a set of candidate answers with respect to the cloaking region for the actual user location: 1)Basic approach of query processing, which uses the max obstructed distance of segment to expand the region and returning the results in the expanded region; 2)Improved approach of query processing, which further narrows the expanded region based on the basic approach. At last, the third trusted party returns the actual answer to the users corresponding to user's exact location .Finally, experimental results and proof theory show the effectiveness and correctness of our approach.
2014, 51(1): 126-137.
Abstract:
The t-closeness model is an effective model to prevent the data sets from skewness attack and similarity attack. But the EMD (earth mover's distance), which t-closeness used to measure the distance between distributions, is not well considering the stability between distributions, so it is hardly to entirely measure the distance between distributions. When the stability between distributions is too large, it will greatly increase the risk of privacy. Aim to address these limitations and accurately measure the distance between distributions, based on traditional t-closeness, the model of SABuk t-closeness which combined the EMD with KL divergence to construct a new distance measurement is proposed. At the same time, according to the hierarchy of sensitive attribute (SA), it partitions a table into buckets based on the semantic similarity of SA values, and then uses greedy algorithm for generating the minimum groups which is satisfied with the requirement of the distance between distributions. At the end, it has adopted the k-nearest neighbour algorithm to choose similar quasi-identifiers (QI) values. Experimental results indicate that SABuk t-closeness model can bring down the information loss on the premise of consuming a little time, and it can preserve privacy of sensitive data well meanwhile maintaining high data utility.
The t-closeness model is an effective model to prevent the data sets from skewness attack and similarity attack. But the EMD (earth mover's distance), which t-closeness used to measure the distance between distributions, is not well considering the stability between distributions, so it is hardly to entirely measure the distance between distributions. When the stability between distributions is too large, it will greatly increase the risk of privacy. Aim to address these limitations and accurately measure the distance between distributions, based on traditional t-closeness, the model of SABuk t-closeness which combined the EMD with KL divergence to construct a new distance measurement is proposed. At the same time, according to the hierarchy of sensitive attribute (SA), it partitions a table into buckets based on the semantic similarity of SA values, and then uses greedy algorithm for generating the minimum groups which is satisfied with the requirement of the distance between distributions. At the end, it has adopted the k-nearest neighbour algorithm to choose similar quasi-identifiers (QI) values. Experimental results indicate that SABuk t-closeness model can bring down the information loss on the premise of consuming a little time, and it can preserve privacy of sensitive data well meanwhile maintaining high data utility.
2014, 51(1): 138-150.
Abstract:
In wireless networks, bit error rate (BER) estimation provides a critical foundation for many upper-layer protocols, greatly influencing the performance of data communications. Thus, it has become a significant research topic currently. However, the existing error estimating codes ignore the BER distribution in real networks. Hence, the estimation accuracy of these codes is low. Based on the analysis of BER distribution in practical 802.11 wireless networks, we propose to exploit the principle of differentiation to improve the accuracy of BER estimation. The main idea of our design, called DEE (differentiated error estimation), is to insert into each data packet multiple levels of error estimating codes which have different estimation capability. All these error estimating codes are randomly and evenly distributed in each packet. Then, BER is derived by the theoretical relationship between BER and the parity check failure probability. Furthermore, DEE utilizes the characteristics of uneven distribution of BER to optimize the estimation capability of each level of codes. It improves the estimation accuracy of BER which occurs more frequently, thus reducing the average estimation error. We implement and evaluate DEE over a 7-node testbed. The experimental result shows that DEE reduces the estimation error by about 44% on average compared with the recent scheme EEC (error estimating code). When the estimation redundancy is low, DEE can decrease the estimation error by about 68%. Furthermore, DEE has smaller estimation bias than EEC.
In wireless networks, bit error rate (BER) estimation provides a critical foundation for many upper-layer protocols, greatly influencing the performance of data communications. Thus, it has become a significant research topic currently. However, the existing error estimating codes ignore the BER distribution in real networks. Hence, the estimation accuracy of these codes is low. Based on the analysis of BER distribution in practical 802.11 wireless networks, we propose to exploit the principle of differentiation to improve the accuracy of BER estimation. The main idea of our design, called DEE (differentiated error estimation), is to insert into each data packet multiple levels of error estimating codes which have different estimation capability. All these error estimating codes are randomly and evenly distributed in each packet. Then, BER is derived by the theoretical relationship between BER and the parity check failure probability. Furthermore, DEE utilizes the characteristics of uneven distribution of BER to optimize the estimation capability of each level of codes. It improves the estimation accuracy of BER which occurs more frequently, thus reducing the average estimation error. We implement and evaluate DEE over a 7-node testbed. The experimental result shows that DEE reduces the estimation error by about 44% on average compared with the recent scheme EEC (error estimating code). When the estimation redundancy is low, DEE can decrease the estimation error by about 68%. Furthermore, DEE has smaller estimation bias than EEC.
2014, 51(1): 151-160.
Abstract:
Broadcasting is the most basic manner to transmit messages in mobile wireless sensor networks. However, existing relevant broadcasting algorithms in mobile wireless sensor networks need a large number of intermediate forwarding nodes. The large number of intermediate forwarding nodes causes a mass of redundant message packages. Massive redundant messages will consume much energy and lead to a short life period of mobile wireless sensor networks. This paper proposes a broadcasting algorithm named node density and distance-based probability (NDDP for short). The average forwarding ratio of the algorithm is only 5S/(Nπr2), where S is the area of the networks, N is the amount of nodes in the networks, and r is communication range. Then the average receipt ratio of NDDP is more than 95 percent in theoretical analysis and more than 92 percent in the ns-2 simulation results. Denser the network is, the more energy saving the algorithm is. Simulation results also show that NDDP outperforms the two algorithms in Smite and Sidewinder not only on the aspect of stability but also on energy conservation.
Broadcasting is the most basic manner to transmit messages in mobile wireless sensor networks. However, existing relevant broadcasting algorithms in mobile wireless sensor networks need a large number of intermediate forwarding nodes. The large number of intermediate forwarding nodes causes a mass of redundant message packages. Massive redundant messages will consume much energy and lead to a short life period of mobile wireless sensor networks. This paper proposes a broadcasting algorithm named node density and distance-based probability (NDDP for short). The average forwarding ratio of the algorithm is only 5S/(Nπr2), where S is the area of the networks, N is the amount of nodes in the networks, and r is communication range. Then the average receipt ratio of NDDP is more than 95 percent in theoretical analysis and more than 92 percent in the ns-2 simulation results. Denser the network is, the more energy saving the algorithm is. Simulation results also show that NDDP outperforms the two algorithms in Smite and Sidewinder not only on the aspect of stability but also on energy conservation.
2014, 51(1): 161-172.
Abstract:
Aiming to maximize the lifetime of time-driven multi-sink wireless sensor networks, employing inter-flow network coding theory as the underlying methodology, a coding-aware cross-path anycast routing protocol called CodeMesh is proposed. Firstly, the coding condition of unicast flows in multi-hop wireless environment is analyzed, and the coding rules under multi-sink anycast network model are presented and proved. Secondly, the concept of multi-flow encoding cluster is proposed followed by the demonstration of the theorems of determining the number and member of encoding cluster. Furthermore, a routing metric integrating the link quality, load balancing and coding gains is defined. Particularly, it quantifies the unified cost of coding and non-coding path. Finally, an anycast coded routing is designed which has the characteristics of both reactive source routing and proactive routing. CodeMesh takes full advantages of the rich computing and communication resources of sink nodes. Additionally, it embeds the routing optimization, refactoring, update and maintenance all in the process of periodical data gathering, which significantly lowers the routing overhead. The results of testbed based experiments indicate that, CodeMesh can effectively find a path with maximum coding opportunities for each node, thereby reduces transmissions, improves transmission efficiency, balances the load and energy consumption, and thus prolongs the survival time of whole network.
Aiming to maximize the lifetime of time-driven multi-sink wireless sensor networks, employing inter-flow network coding theory as the underlying methodology, a coding-aware cross-path anycast routing protocol called CodeMesh is proposed. Firstly, the coding condition of unicast flows in multi-hop wireless environment is analyzed, and the coding rules under multi-sink anycast network model are presented and proved. Secondly, the concept of multi-flow encoding cluster is proposed followed by the demonstration of the theorems of determining the number and member of encoding cluster. Furthermore, a routing metric integrating the link quality, load balancing and coding gains is defined. Particularly, it quantifies the unified cost of coding and non-coding path. Finally, an anycast coded routing is designed which has the characteristics of both reactive source routing and proactive routing. CodeMesh takes full advantages of the rich computing and communication resources of sink nodes. Additionally, it embeds the routing optimization, refactoring, update and maintenance all in the process of periodical data gathering, which significantly lowers the routing overhead. The results of testbed based experiments indicate that, CodeMesh can effectively find a path with maximum coding opportunities for each node, thereby reduces transmissions, improves transmission efficiency, balances the load and energy consumption, and thus prolongs the survival time of whole network.
2014, 51(1): 173-179.
Abstract:
WSN can fundamentally change the way of our life. For a WSN system, up to 10000s nodes are expected. Each node needs to be low cost, low power, with reasonal communication and processing capacity. Meanwhile, FPGAs are demonstrated as an approach to improve processing performance efficiently. Conventionally, in embedded system design like WSN, mainly due to the limitation of energy budget, the inclusion of reconfigurable HW is considered as unacceptable. However, FPGA can not only permit static configurability in field, but also enable partial dynamic configurability (PDR) remotely. Both power and area can be saved through mapping different hardcores to one certain chip area. Futhermore, by implemeing PDR technique, each node can be maintained during its lifetime remotely. All those characteristics are demanding in WSN. As a result, it becomes the author's interest to investigate the feasibility of FPGA for WSN. In this paper, a prototype and related design flow are presented and several test cases are used to evaluate the cost of PDR. Finally, it is demonstrated 60% less power cost with 27% area saving.
WSN can fundamentally change the way of our life. For a WSN system, up to 10000s nodes are expected. Each node needs to be low cost, low power, with reasonal communication and processing capacity. Meanwhile, FPGAs are demonstrated as an approach to improve processing performance efficiently. Conventionally, in embedded system design like WSN, mainly due to the limitation of energy budget, the inclusion of reconfigurable HW is considered as unacceptable. However, FPGA can not only permit static configurability in field, but also enable partial dynamic configurability (PDR) remotely. Both power and area can be saved through mapping different hardcores to one certain chip area. Futhermore, by implemeing PDR technique, each node can be maintained during its lifetime remotely. All those characteristics are demanding in WSN. As a result, it becomes the author's interest to investigate the feasibility of FPGA for WSN. In this paper, a prototype and related design flow are presented and several test cases are used to evaluate the cost of PDR. Finally, it is demonstrated 60% less power cost with 27% area saving.
2014, 51(1): 180-188.
Abstract:
Sensor array signal processing has shown great potential in target monitoring applications. However, for wireless sensor network nodes, sending raw signal would result in large transmission delay and consume too much energy.To achieve low power consumption, high accuracy and deployment flexibility in target monitoring under the framework of wireless sensor network (WSN), a compressed sampling-based wireless array is presented. With the help of compressive sensing theory, real-time array signal transmission is made feasible on low-power, low-cost wireless platform. In this framework, signals are randomly sampled at a lower average sampling rate and reconstructed at the fusion center, which is supposed to have powerful processing capability. Furthermore, based on the correlation of signals within an array, collaborative array signal reconstruction algorithm is designed using priori knowledge model to reduce redundancy in signal collection. Simulation results show that compressed sampling-based wireless array can perform effective direction of arrival (DoA) estimation for targets without distinct performance decline, and it even outperforms traditional method obviously when data transmission is heavily restricted. It is show that CS based DoA estimation has similar performance with conventional method while requiring about 15% data in transmission. Experiment is also conducted on prototype system with low cost microphones arrays, thus verifying the feasibility of implementation.
Sensor array signal processing has shown great potential in target monitoring applications. However, for wireless sensor network nodes, sending raw signal would result in large transmission delay and consume too much energy.To achieve low power consumption, high accuracy and deployment flexibility in target monitoring under the framework of wireless sensor network (WSN), a compressed sampling-based wireless array is presented. With the help of compressive sensing theory, real-time array signal transmission is made feasible on low-power, low-cost wireless platform. In this framework, signals are randomly sampled at a lower average sampling rate and reconstructed at the fusion center, which is supposed to have powerful processing capability. Furthermore, based on the correlation of signals within an array, collaborative array signal reconstruction algorithm is designed using priori knowledge model to reduce redundancy in signal collection. Simulation results show that compressed sampling-based wireless array can perform effective direction of arrival (DoA) estimation for targets without distinct performance decline, and it even outperforms traditional method obviously when data transmission is heavily restricted. It is show that CS based DoA estimation has similar performance with conventional method while requiring about 15% data in transmission. Experiment is also conducted on prototype system with low cost microphones arrays, thus verifying the feasibility of implementation.
2014, 51(1): 189-198.
Abstract:
Floating car technology is the essential source to acquire the road traffic information in intelligent transportation systems. It can be used as the data source for large-scale real-time traffic monitoring. It's a challenge of handling stream data effectively in a large number of moving objects because of the huge scale of (floating car data, FCD). In this paper, a congestion companion discovery algorithm is proposed by adopting the idea of similar trajectory clustering and utilizing traffic parameters with congestion characteristics. The candidate congestion FCD can be filtered out from the floating car trajectory stream for approximately predicting the trend of congestion areas. While the load shedding decision-making is determined by the prediction, an algorithm of multi-priority scheduling based on prediction is designed to achieve the whole monitoring process. Our method can effectively reduce the processing cost of FCD, and rapidly monitor traffic congestion. Both efficiency and effectiveness of our method are evaluated by a very large volume of real taxi trajectories in an urban road network.
Floating car technology is the essential source to acquire the road traffic information in intelligent transportation systems. It can be used as the data source for large-scale real-time traffic monitoring. It's a challenge of handling stream data effectively in a large number of moving objects because of the huge scale of (floating car data, FCD). In this paper, a congestion companion discovery algorithm is proposed by adopting the idea of similar trajectory clustering and utilizing traffic parameters with congestion characteristics. The candidate congestion FCD can be filtered out from the floating car trajectory stream for approximately predicting the trend of congestion areas. While the load shedding decision-making is determined by the prediction, an algorithm of multi-priority scheduling based on prediction is designed to achieve the whole monitoring process. Our method can effectively reduce the processing cost of FCD, and rapidly monitor traffic congestion. Both efficiency and effectiveness of our method are evaluated by a very large volume of real taxi trajectories in an urban road network.
2014, 51(1): 199-205.
Abstract:
Marine data is a typical big data. As the diversified marine data acquisition methods develop rapidly (remote sensing, GPS sensing, buoy monitoring, seabed monitoring, research ships and so on), the marine data grows explosively. However, the management of big data is a sophisticated problem currently. Cloud storage is an effective way to manage the big data. As the big marine data is characterized by massive, multisource, uncertain and especially the sensitive, the conventional architecture of cloud storage is not suitable for it. The big marine data is managed on hybrid cloud storage platform based on its characteristics and applications. Then, how to take advantage of hybrid cloud storage to manage the big marine data becomes a challenge. Data migration is the key question in the hybrid cloud storage. To resolve it, the lifecycle of big marine data is put forward, and based on it, the migration algorithm for the hybrid cloud storage is proposed. In the migration algorithm, the marine data sensitivity, data access frequency, data time length, data size and so on, are provided as the migration factors. Migration algorithm considers the capacity of data storage, attributes characteristics of marine data as well as the dynamic changes in the process of data access. The experimental results show that hybrid cloud storage pattern reduces the costs of managing big marine data greatly. Meanwhile, the access speed of big marine data is ensured by the presented migration algorithm.
Marine data is a typical big data. As the diversified marine data acquisition methods develop rapidly (remote sensing, GPS sensing, buoy monitoring, seabed monitoring, research ships and so on), the marine data grows explosively. However, the management of big data is a sophisticated problem currently. Cloud storage is an effective way to manage the big data. As the big marine data is characterized by massive, multisource, uncertain and especially the sensitive, the conventional architecture of cloud storage is not suitable for it. The big marine data is managed on hybrid cloud storage platform based on its characteristics and applications. Then, how to take advantage of hybrid cloud storage to manage the big marine data becomes a challenge. Data migration is the key question in the hybrid cloud storage. To resolve it, the lifecycle of big marine data is put forward, and based on it, the migration algorithm for the hybrid cloud storage is proposed. In the migration algorithm, the marine data sensitivity, data access frequency, data time length, data size and so on, are provided as the migration factors. Migration algorithm considers the capacity of data storage, attributes characteristics of marine data as well as the dynamic changes in the process of data access. The experimental results show that hybrid cloud storage pattern reduces the costs of managing big marine data greatly. Meanwhile, the access speed of big marine data is ensured by the presented migration algorithm.
2014, 51(1): 206-214.
Abstract:
In this paper, a local document set based personalized representation method and a result rank algorithm for enterprise search engines are proposed to help user find the documents he really needs. Firstly, the clustering algorithms are used to cluster the history documents scanned by a user into many classes. Secondly, the fuzzy inference technique is used to analyze each class to detect how much the user likes each class. Thirdly, a different sampling number is allocated to each class according to the degree calculated by the fuzzy inference technique to reflect how much the user likes a class. Finally, the typical documents sampled from each class form a local document set, which is used to represent the personalized information of the user. The personalized rank algorithm re-ranks the document set returned by the general enterprise search engines by calculating the similarity between a result and each document in the local document set to reflect the personalization of the user. Experimental results show that, compared with the traditional keyword based personalized representation and result rank algorithms, the local document set based personalized representation method and the result rank algorithm can provide more accurate results and react faster when the user changes his personality.
In this paper, a local document set based personalized representation method and a result rank algorithm for enterprise search engines are proposed to help user find the documents he really needs. Firstly, the clustering algorithms are used to cluster the history documents scanned by a user into many classes. Secondly, the fuzzy inference technique is used to analyze each class to detect how much the user likes each class. Thirdly, a different sampling number is allocated to each class according to the degree calculated by the fuzzy inference technique to reflect how much the user likes a class. Finally, the typical documents sampled from each class form a local document set, which is used to represent the personalized information of the user. The personalized rank algorithm re-ranks the document set returned by the general enterprise search engines by calculating the similarity between a result and each document in the local document set to reflect the personalization of the user. Experimental results show that, compared with the traditional keyword based personalized representation and result rank algorithms, the local document set based personalized representation method and the result rank algorithm can provide more accurate results and react faster when the user changes his personality.
2014, 51(1): 215-224.
Abstract:
This paper presents an adaptive window object tracking method for Mean Shift based on graph cuts theory. It copes with the size-changing object during visual tracking while the traditional Mean Shift can't change the scale of tracking window in real time. According to the Mean Shift iteration result of every frame, graph is created by using skin color Gaussian mixture model in a small area around it. Graph cut is implemented by calculating the minimum energy function based on max flow/min cut principle. And then the largest skin lump is found, which is accepted as tracking object in the result of graph cuts. As a result, tracking window size can be updated by the largest skin lump. Experimental results clearly demonstrate that the method can avoid the problem of nonstop shrinking bandwidth effectively which is brought by expanding and shrinking 10% of kernel function bandwidth. It reflects the real scale change of tracking target in real time, avoids the interference of other targets in the background, and has good usability and robustness. Besides it can be applied to controlling entertainment games that enriches operation mode of human computer interaction.
This paper presents an adaptive window object tracking method for Mean Shift based on graph cuts theory. It copes with the size-changing object during visual tracking while the traditional Mean Shift can't change the scale of tracking window in real time. According to the Mean Shift iteration result of every frame, graph is created by using skin color Gaussian mixture model in a small area around it. Graph cut is implemented by calculating the minimum energy function based on max flow/min cut principle. And then the largest skin lump is found, which is accepted as tracking object in the result of graph cuts. As a result, tracking window size can be updated by the largest skin lump. Experimental results clearly demonstrate that the method can avoid the problem of nonstop shrinking bandwidth effectively which is brought by expanding and shrinking 10% of kernel function bandwidth. It reflects the real scale change of tracking target in real time, avoids the interference of other targets in the background, and has good usability and robustness. Besides it can be applied to controlling entertainment games that enriches operation mode of human computer interaction.
2014, 51(1): 225-236.
Abstract:
Highlight event detection in soccer videos has high academic research value and wide market application prospect in the field of sport video semantic analysis. Based on the powerful expression of hidden conditional random field (HCRF) model in the expression and identification of semantic event, a fusion HCRF and affective arousal model (AAM) framework for highlight event detection is put forward. Firstly, through the analysis of the structural semantics of the wonderful event video, thirteen kinds of multi-modal semantic clues are defined to accurately describe the included semantic information of the wonderful events. Secondly, on the clustering foundation of the multi-modal semantic clues by concept lattice, time-domain features are added to establish an affective arousal model based on feature weight coefficient, and then the affective arousal value of the different kinds of highlights events is calculated. Finally, the above observed sequence is used as HCRF model input in the case of small-scale training samples, and a wonderful event detection HCRF model is effectively established based on the mapping relationship between the sequences of video semantic shots, affective arousal values and the highlight events. The inherent laws of the wonderful events are excavated from multiple dimensions like multi-modal semantic clues, video structure semantics, and affective semantics. The detection of wonderful events is simultaneously achieved by using the same HCRF model. Experimental results show the effectiveness of this paper.
Highlight event detection in soccer videos has high academic research value and wide market application prospect in the field of sport video semantic analysis. Based on the powerful expression of hidden conditional random field (HCRF) model in the expression and identification of semantic event, a fusion HCRF and affective arousal model (AAM) framework for highlight event detection is put forward. Firstly, through the analysis of the structural semantics of the wonderful event video, thirteen kinds of multi-modal semantic clues are defined to accurately describe the included semantic information of the wonderful events. Secondly, on the clustering foundation of the multi-modal semantic clues by concept lattice, time-domain features are added to establish an affective arousal model based on feature weight coefficient, and then the affective arousal value of the different kinds of highlights events is calculated. Finally, the above observed sequence is used as HCRF model input in the case of small-scale training samples, and a wonderful event detection HCRF model is effectively established based on the mapping relationship between the sequences of video semantic shots, affective arousal values and the highlight events. The inherent laws of the wonderful events are excavated from multiple dimensions like multi-modal semantic clues, video structure semantics, and affective semantics. The detection of wonderful events is simultaneously achieved by using the same HCRF model. Experimental results show the effectiveness of this paper.