2014 Vol. 51 No. 11
2014, 51(11): 2361-2373.
DOI: 10.7544/issn1000-1239.2014.20131069
Abstract:
Due to the battery energy exhausted, sensor node becomes invalid and gets out of use, hence researching on energy efficiency query processing algorithms plays a significant role in the area of sensor networks. The results returned by the Top-k queries provide k largest (or smallest) sensed values and their locations, which are very useful for detecting abnormal events happened in the monitored region. Existing algorithms focus on returning either exact results or approximate results, which brings about higher communication consumption. To overcome the shortcomings of the existing Top-k queries and improve the energy efficiency, filter based Top-k monitoring algorithm is proposed in this paper, which aims at minimizing the expectation of communication overhead. Firstly, the robustness of filters guaranteeing the correctness and high energy efficiency is proposed, and then the communication overhead model is presented. Based on the essence of expectation and sensed data correlation, the probability of filter failure is derived. Finally, minimizing the expectation of communication overhead as an optimization objective, optimal filter threshold is proved and filter based Top-k monitoring algorithm (FTM) is proposed. Real dataset based experiments are carried out to evaluate the efficiency and effectiveness of the proposed algorithm. The theoretical analysis and performance evaluation demonstrate the accuracy and energy efficiency of the proposed algorithm.
Due to the battery energy exhausted, sensor node becomes invalid and gets out of use, hence researching on energy efficiency query processing algorithms plays a significant role in the area of sensor networks. The results returned by the Top-k queries provide k largest (or smallest) sensed values and their locations, which are very useful for detecting abnormal events happened in the monitored region. Existing algorithms focus on returning either exact results or approximate results, which brings about higher communication consumption. To overcome the shortcomings of the existing Top-k queries and improve the energy efficiency, filter based Top-k monitoring algorithm is proposed in this paper, which aims at minimizing the expectation of communication overhead. Firstly, the robustness of filters guaranteeing the correctness and high energy efficiency is proposed, and then the communication overhead model is presented. Based on the essence of expectation and sensed data correlation, the probability of filter failure is derived. Finally, minimizing the expectation of communication overhead as an optimization objective, optimal filter threshold is proved and filter based Top-k monitoring algorithm (FTM) is proposed. Real dataset based experiments are carried out to evaluate the efficiency and effectiveness of the proposed algorithm. The theoretical analysis and performance evaluation demonstrate the accuracy and energy efficiency of the proposed algorithm.
2014, 51(11): 2374-2381.
DOI: 10.7544/issn1000-1239.2014.20131070
Abstract:
In recent years, mobile phone sensing application has been regarded as a new paradigm which makes use of the smartphones to get the ubiquitous environment data. Most of the mobile phone sensing task assignment problems are based on the locations of the smartphone users. Unfortunately, the location-based optimal task assignment problem in mobile phone sensing system is an NP-hard problem. To solve this challenge, we study the optimal location-based task assignment problem for mobile phone sensing system, and propose a polynomial time approximation algorithm in this paper. The proposed approximation algorithm first introduces the shifting method for unit disk model into the task assignment problem of mobile phone sensing, and divides the sensing area into many sub-areas. We can prove that the union of the optimal task assignment solution in each sub-area is 〖SX(〗1〖〗1+ε〖SX)〗 of the optimal solution in the whole area, which illustrates the presented algorithm is a polynomial-time approximation scheme (PTAS). Then, we also prove that the optimal assignment problem in each sub-area is polynomial-time solvable, and design an enumeration method to get the optimal solution in the sub-area. Finally, the simulation results show that the practical performance of the proposed near optimal task assignment algorithm corroborates the theoretical analysis.
In recent years, mobile phone sensing application has been regarded as a new paradigm which makes use of the smartphones to get the ubiquitous environment data. Most of the mobile phone sensing task assignment problems are based on the locations of the smartphone users. Unfortunately, the location-based optimal task assignment problem in mobile phone sensing system is an NP-hard problem. To solve this challenge, we study the optimal location-based task assignment problem for mobile phone sensing system, and propose a polynomial time approximation algorithm in this paper. The proposed approximation algorithm first introduces the shifting method for unit disk model into the task assignment problem of mobile phone sensing, and divides the sensing area into many sub-areas. We can prove that the union of the optimal task assignment solution in each sub-area is 〖SX(〗1〖〗1+ε〖SX)〗 of the optimal solution in the whole area, which illustrates the presented algorithm is a polynomial-time approximation scheme (PTAS). Then, we also prove that the optimal assignment problem in each sub-area is polynomial-time solvable, and design an enumeration method to get the optimal solution in the sub-area. Finally, the simulation results show that the practical performance of the proposed near optimal task assignment algorithm corroborates the theoretical analysis.
2014, 51(11): 2382-2392.
DOI: 10.7544/issn1000-1239.2014.20131079
Abstract:
In the process of wireless sensor network communication, the communication state of network is a quite important research point. Network communication strategy needs to be adjusted to ensure the overall performance when the network throughput overloads. Therefore, we need to identify the state of network transmission as the guidance of network communication strategy adjustment firstly. In this paper, we design and implement a real-time online mechanism to identify the communication state of the sensor networks, which can accurately determine whether the network transmission state is normal or overload, to support the adjustment of communication strategy. We obtain the basic performance parameters of the process of transmission by experiments, and analyze the correlation between these parameters and the communication state. Then, we select throughput as the classification criteria of the network communication state, and establish the identification model of communication state based on the support vector machines (SVM), and obtain the parameters of the model in the simulated environment according to the experimental data. Evaluation result shows that the accuracy of the identification model can be achieved up to 91.28%, which can effectively identify the communication state for wireless sensor networks.
In the process of wireless sensor network communication, the communication state of network is a quite important research point. Network communication strategy needs to be adjusted to ensure the overall performance when the network throughput overloads. Therefore, we need to identify the state of network transmission as the guidance of network communication strategy adjustment firstly. In this paper, we design and implement a real-time online mechanism to identify the communication state of the sensor networks, which can accurately determine whether the network transmission state is normal or overload, to support the adjustment of communication strategy. We obtain the basic performance parameters of the process of transmission by experiments, and analyze the correlation between these parameters and the communication state. Then, we select throughput as the classification criteria of the network communication state, and establish the identification model of communication state based on the support vector machines (SVM), and obtain the parameters of the model in the simulated environment according to the experimental data. Evaluation result shows that the accuracy of the identification model can be achieved up to 91.28%, which can effectively identify the communication state for wireless sensor networks.
2014, 51(11): 2393-2407.
DOI: 10.7544/issn1000-1239.2014.20130736
Abstract:
In delay tolerant networks, to deal with the problems that the topology of the network changes dramatically and the disconnections between nodes are prevalent, researchers propose the store-carry-forward protocols: nodes store the messages in buffers and may carry them for a long time until they encounter the proper next hops or the destination nodes. Because of the limited buffer capacity of nodes, this way of message transmission is bound to bring buffer overflows and then lead to network congestion. A congestion control strategy based on the game of life is proposed for delay tolerant networks in this paper and it is applied to classic Epidemic routing protocol. This strategy determines the specific operations of a message stored in the local buffer of node according to the proportion of the holders of this message in all the nodes’ neighbors. Furthermore, the policies of message queuing and dropping are designed. The utility value of a certain message is calculated based on the influence of delivering or dropping this message on the delivery ratio of the whole network, and the messages stored in the buffers are queued by the utility value. The messages with large utility values have high priority to send and the messages with small utility values are dropped. Experiments under movement model (which is the build-in model in the ONE) and real trajectories are carried out in the ONE. Simulation results show that the congestion control strategy based on the game of life significantly improves the delivery ratio compared with other congestion control strategies, while the delivery latency, packet loss rate and overhead ratio are reduced at the same time.
In delay tolerant networks, to deal with the problems that the topology of the network changes dramatically and the disconnections between nodes are prevalent, researchers propose the store-carry-forward protocols: nodes store the messages in buffers and may carry them for a long time until they encounter the proper next hops or the destination nodes. Because of the limited buffer capacity of nodes, this way of message transmission is bound to bring buffer overflows and then lead to network congestion. A congestion control strategy based on the game of life is proposed for delay tolerant networks in this paper and it is applied to classic Epidemic routing protocol. This strategy determines the specific operations of a message stored in the local buffer of node according to the proportion of the holders of this message in all the nodes’ neighbors. Furthermore, the policies of message queuing and dropping are designed. The utility value of a certain message is calculated based on the influence of delivering or dropping this message on the delivery ratio of the whole network, and the messages stored in the buffers are queued by the utility value. The messages with large utility values have high priority to send and the messages with small utility values are dropped. Experiments under movement model (which is the build-in model in the ONE) and real trajectories are carried out in the ONE. Simulation results show that the congestion control strategy based on the game of life significantly improves the delivery ratio compared with other congestion control strategies, while the delivery latency, packet loss rate and overhead ratio are reduced at the same time.
2014, 51(11): 2408-2415.
DOI: 10.7544/issn1000-1239.2014.20131073
Abstract:
It is a challenging task to improve the accuracy of the mobile localization in LOS (line-of-sight) and NLOS (non-line-of-sight) mixed environment. When the MN (moving node) moves in indoor environment, due to the obstacles such as walls, doors, and furniture, the communicating signal between MN and ANs(anchor nodes) change between LOS and NLOS frequently and randomly, which has negative effect on the accuracy of MN location estimation. To guarantee the accuracy, a KF (Kalman filter) based IMM (interacting multiple model) is proposed to filter the measured distance under the LOS/NLOS mixed environment. Due to the different characteristic of ranging errors between LOS and NLOS, two parallel KFs with different parameters are employed in order to suit for LOS mode and NLOS mode, both of the mode probabilities are calculated by the mode likelihoods and history probabilities. The modes transition between LOS/NLOS modes is based on Markov chain and mode probabilities. The weighted mean of the two modes filtering results is taken as the estimated distance of IMM. Once the estimated distances are obtained, the EKF (extended Kalman filter) is applied to locate the MN. The simulation results demonstrate the IMM can significantly mitigate the positive range error and achieve high localization accuracy.
It is a challenging task to improve the accuracy of the mobile localization in LOS (line-of-sight) and NLOS (non-line-of-sight) mixed environment. When the MN (moving node) moves in indoor environment, due to the obstacles such as walls, doors, and furniture, the communicating signal between MN and ANs(anchor nodes) change between LOS and NLOS frequently and randomly, which has negative effect on the accuracy of MN location estimation. To guarantee the accuracy, a KF (Kalman filter) based IMM (interacting multiple model) is proposed to filter the measured distance under the LOS/NLOS mixed environment. Due to the different characteristic of ranging errors between LOS and NLOS, two parallel KFs with different parameters are employed in order to suit for LOS mode and NLOS mode, both of the mode probabilities are calculated by the mode likelihoods and history probabilities. The modes transition between LOS/NLOS modes is based on Markov chain and mode probabilities. The weighted mean of the two modes filtering results is taken as the estimated distance of IMM. Once the estimated distances are obtained, the EKF (extended Kalman filter) is applied to locate the MN. The simulation results demonstrate the IMM can significantly mitigate the positive range error and achieve high localization accuracy.
2014, 51(11): 2416-2426.
DOI: 10.7544/issn1000-1239.2014.20130749
Abstract:
It is well known that cloud computing can be used to deal with mass data, however such tasks always suffer from expensive time cost of data transmission. Data placement and task scheduling algorithms are used to place data and schedule task to nodes for one special purpose, such as decreasing data transmission time, balancing node load and increasing throughput of cloud computing system. At present, however, the shortcoming of those algorithms is that the amount of data replica is not changed and the transmission criterion is not extremely accurate. In this paper, we propose a new algorithm, called dynamic iterate for time (DIT), to decrease data transmission time. It dynamically changes the amount of data replica according to the frequency of data accessing and the remaining memory, which reduces the memory waste caused by the low efficiency of data access, as well as the number of data transmission of those data replica with high access rate. Moreover, DIT evaluates the data transmission by time cost, which increases the accuracy of transmission criteria, considering the differences among network bandwidths. The experiment results show that DIT can significantly reduce data transmission time compared with data cluster (DC) and data dependence (DD), only except one certain special situation that the scale of task set and the amount of nodes are small. It is worth to mention that a 50% speedup can be achieved when the scale of task set and amount of node are big.
It is well known that cloud computing can be used to deal with mass data, however such tasks always suffer from expensive time cost of data transmission. Data placement and task scheduling algorithms are used to place data and schedule task to nodes for one special purpose, such as decreasing data transmission time, balancing node load and increasing throughput of cloud computing system. At present, however, the shortcoming of those algorithms is that the amount of data replica is not changed and the transmission criterion is not extremely accurate. In this paper, we propose a new algorithm, called dynamic iterate for time (DIT), to decrease data transmission time. It dynamically changes the amount of data replica according to the frequency of data accessing and the remaining memory, which reduces the memory waste caused by the low efficiency of data access, as well as the number of data transmission of those data replica with high access rate. Moreover, DIT evaluates the data transmission by time cost, which increases the accuracy of transmission criteria, considering the differences among network bandwidths. The experiment results show that DIT can significantly reduce data transmission time compared with data cluster (DC) and data dependence (DD), only except one certain special situation that the scale of task set and the amount of nodes are small. It is worth to mention that a 50% speedup can be achieved when the scale of task set and amount of node are big.
2014, 51(11): 2427-2436.
DOI: 10.7544/issn1000-1239.2014.20131071
Abstract:
Machine learning, as a learning method, uses the experience to improve its performance. The support vector machine (SVM), as a new model of the machine learning, is good at dealing with the condition of small sample size, nonlinear and high dimensional pattern recognition. The node localization algorithm based on SVM can locate the nodes in wireless sensor networks, WSN depending on the characteristics of machine learning algorithms. The basic idea is dividing the network area into several small aliquots of grids and each represents a certain class of machine learning algorithm. And when the machine learning algorithm learns the classes corresponding to the known beacon nodes, it will classify the unknown nodes’ localization and then further determine the position coordinates of the unknown nodes. For the SVM “one against one” location algorithm, the simulation results show that it has higher location accuracy and better tolerance of the ranging error, which is suitable for the network environment where the beacon nodes are sparse as it doesn’t require a high beacon node ratio. For the SVM decision tree location algorithm, the results show that it is not affected seriously by the coverage holes, which is applicable for the network environment where nodes distribution is uneven or the coverage holes exist.
Machine learning, as a learning method, uses the experience to improve its performance. The support vector machine (SVM), as a new model of the machine learning, is good at dealing with the condition of small sample size, nonlinear and high dimensional pattern recognition. The node localization algorithm based on SVM can locate the nodes in wireless sensor networks, WSN depending on the characteristics of machine learning algorithms. The basic idea is dividing the network area into several small aliquots of grids and each represents a certain class of machine learning algorithm. And when the machine learning algorithm learns the classes corresponding to the known beacon nodes, it will classify the unknown nodes’ localization and then further determine the position coordinates of the unknown nodes. For the SVM “one against one” location algorithm, the simulation results show that it has higher location accuracy and better tolerance of the ranging error, which is suitable for the network environment where the beacon nodes are sparse as it doesn’t require a high beacon node ratio. For the SVM decision tree location algorithm, the results show that it is not affected seriously by the coverage holes, which is applicable for the network environment where nodes distribution is uneven or the coverage holes exist.
2014, 51(11): 2437-2447.
DOI: 10.7544/issn1000-1239.2014.20130165
Abstract:
An important challenge on designing data center networking (DCN) is how to efficiently interconnect a large number of servers. Traditional tree-based structures are increasingly difficult to meet the design goals of data centers. Recently, a number of novel DCNs are proposed. However, these DCNs expand the scale of data center mainly by increasing the number of servers network interface card (NIC) ports, which brings expanding limitation and managing complexity. Consequently, it is meaningful and challenging to design a scalable structure for data centers, using only the commodity servers with fixed number of NIC ports and low-end, multi-port commodity switches. To address this problem, this paper proposes a novel DCN structure with constant degree called CH, which utilizes fixed number of NIC ports and commodity switches to interconnect large population of servers. The structure is server-centric, and leverages the expansibility and performance using multi-level interconnection. They own two potential benefits, i.e., the expansibility and equal degree. Theoretical analysis and experiment results show that CH has excellent topology properties and can provide large data center with multi-pattern data traffic with high bandwidth and high fault-tolerance, but without increasing the number of NIC ports. Moreover, the modularity makes the structure have good deployablility and maintainability.
An important challenge on designing data center networking (DCN) is how to efficiently interconnect a large number of servers. Traditional tree-based structures are increasingly difficult to meet the design goals of data centers. Recently, a number of novel DCNs are proposed. However, these DCNs expand the scale of data center mainly by increasing the number of servers network interface card (NIC) ports, which brings expanding limitation and managing complexity. Consequently, it is meaningful and challenging to design a scalable structure for data centers, using only the commodity servers with fixed number of NIC ports and low-end, multi-port commodity switches. To address this problem, this paper proposes a novel DCN structure with constant degree called CH, which utilizes fixed number of NIC ports and commodity switches to interconnect large population of servers. The structure is server-centric, and leverages the expansibility and performance using multi-level interconnection. They own two potential benefits, i.e., the expansibility and equal degree. Theoretical analysis and experiment results show that CH has excellent topology properties and can provide large data center with multi-pattern data traffic with high bandwidth and high fault-tolerance, but without increasing the number of NIC ports. Moreover, the modularity makes the structure have good deployablility and maintainability.
2014, 51(11): 2448-2457.
DOI: 10.7544/issn1000-1239.2014.20130852
Abstract:
The data plane of OpenFlow networks sends packets which don’t match any flow entry to the controller, and in such case connectionless burst traffic is encapsulated as control messages, until the data plane receives a response message from the controller. This process produce redundant control messages for both the control and data planes, which influences the performance of the entire network. However, current OpenFlow protocol doesn’t give definite explanation or handle this problem. In this paper, we propose two approaches of eliminating redundant control messages on the control plane (referred as ERCMC) and on the data plane (referred as ERCMD). ERCMC constructs latest packet-in message view (referred as LPMV) on the controller to mark the latest packet-in messages, so as to eliminate redundant control messages on the control plane. ERCMD adds marks directly on the data plane, so that the burst packets are directly buffered and never encapsulated as control messages. We implement these two approaches in NOX and Open vSwitch respectively, and evaluate the performance of the two approaches. The results of our experiments show that ERCMC can eliminate redundant control messages, but it will add extra processing overheads; ERCMD can not only eliminate redundant control messages, but also alleviate the load of the controller and OpenFlow switches.
The data plane of OpenFlow networks sends packets which don’t match any flow entry to the controller, and in such case connectionless burst traffic is encapsulated as control messages, until the data plane receives a response message from the controller. This process produce redundant control messages for both the control and data planes, which influences the performance of the entire network. However, current OpenFlow protocol doesn’t give definite explanation or handle this problem. In this paper, we propose two approaches of eliminating redundant control messages on the control plane (referred as ERCMC) and on the data plane (referred as ERCMD). ERCMC constructs latest packet-in message view (referred as LPMV) on the controller to mark the latest packet-in messages, so as to eliminate redundant control messages on the control plane. ERCMD adds marks directly on the data plane, so that the burst packets are directly buffered and never encapsulated as control messages. We implement these two approaches in NOX and Open vSwitch respectively, and evaluate the performance of the two approaches. The results of our experiments show that ERCMC can eliminate redundant control messages, but it will add extra processing overheads; ERCMD can not only eliminate redundant control messages, but also alleviate the load of the controller and OpenFlow switches.
2014, 51(11): 2458-2469.
DOI: 10.7544/issn1000-1239.2014.20130798
Abstract:
With the bandwidth of backbone network link increasing geometrically, mining the frequent items over network flows promptly and accurately is important for network management and network security. Inspired by SS counting method, an integrated weighted frequent items mining algorithm IWFIM over network flows, whose pruning strategy is subject to the constraints of time and flow length, is proposed according to the property of flows. The weight of each flow item is endowed by time and flow length and the item with the minimum weight is deleted during the operation of pruning for IWFIM. Then, based on IWFIM, another frequent items mining algorithm CBF_IWFIM with the capability of combining the advantages of hashing method and counting method is proposed according to the property of heavy-tailed distribution of flows. The improved counting Blooming filter is used to filter the majority of small flows without saving flows’ information and IWFIM is introduced to identify the frequent items afterwards for CBF_IWFIM. The experiments over real network traffic show that CBF_IWFIM and IWFIM are very space-saving and precise, and they can achieve much more reasonable measurement accuracy than other three frequent items mining algorithms like SS. Even in the situation of consuming one-third of the space cost in other three algorithms, the two algorithms CBF_IWFIM and IWFIM still perform better than other three algorithms like SS.
With the bandwidth of backbone network link increasing geometrically, mining the frequent items over network flows promptly and accurately is important for network management and network security. Inspired by SS counting method, an integrated weighted frequent items mining algorithm IWFIM over network flows, whose pruning strategy is subject to the constraints of time and flow length, is proposed according to the property of flows. The weight of each flow item is endowed by time and flow length and the item with the minimum weight is deleted during the operation of pruning for IWFIM. Then, based on IWFIM, another frequent items mining algorithm CBF_IWFIM with the capability of combining the advantages of hashing method and counting method is proposed according to the property of heavy-tailed distribution of flows. The improved counting Blooming filter is used to filter the majority of small flows without saving flows’ information and IWFIM is introduced to identify the frequent items afterwards for CBF_IWFIM. The experiments over real network traffic show that CBF_IWFIM and IWFIM are very space-saving and precise, and they can achieve much more reasonable measurement accuracy than other three frequent items mining algorithms like SS. Even in the situation of consuming one-third of the space cost in other three algorithms, the two algorithms CBF_IWFIM and IWFIM still perform better than other three algorithms like SS.
2014, 51(11): 2470-2482.
DOI: 10.7544/issn1000-1239.2014.20130973
Abstract:
Policy refinement is an important method to resolve the configuration complexity of access control policies for distributed applications. Although the current policy refinement techniques make it possible to describe the layered policies and refine the policies layer by layer, it is not easy of these methods to describe and analyze the associated attributes among different policies. The wide use of policy refinement is thus hindered. In this paper, new methods for the description of policies and relationships among them such as composition, mutual exclusion, refinement and path cooperation are given. A new algorithm for policies refinement with relationship description ability is proposed. A refine-tree construction method with the capability of describing the policies and the relationships among these policies is also proposed with the algorithm. This provides a basis for solving the issue of the associating attributes between policies in the policy refinement process. The policies refine-tree can also be used to demonstrate the SLA (service-level agreement) of access control.
Policy refinement is an important method to resolve the configuration complexity of access control policies for distributed applications. Although the current policy refinement techniques make it possible to describe the layered policies and refine the policies layer by layer, it is not easy of these methods to describe and analyze the associated attributes among different policies. The wide use of policy refinement is thus hindered. In this paper, new methods for the description of policies and relationships among them such as composition, mutual exclusion, refinement and path cooperation are given. A new algorithm for policies refinement with relationship description ability is proposed. A refine-tree construction method with the capability of describing the policies and the relationships among these policies is also proposed with the algorithm. This provides a basis for solving the issue of the associating attributes between policies in the policy refinement process. The policies refine-tree can also be used to demonstrate the SLA (service-level agreement) of access control.
2014, 51(11): 2483-2492.
DOI: 10.7544/issn1000-1239.2014.20131074
Abstract:
By using crowdsourcing, vehicular sensor networks (VSN) are considered essential for achieving automatic and dynamic traffic information collection, and have created various fresh new business applications and services in our daily lives. However, the published trajectories that collected in VSNs raise significant privacy concerns. These existing methods, such as anonymization and cloaking techniques, though they are attractive for providing strong privacy guarantees, generally fail to satisfy the accuracy requirements of the trajectory data based applications. In addition, different attack strategies will result in quite different performance under various privacy preserving strategies. In order to address these challenges, we present a location privacy protection method, the DefenseGame algorithm. Given a set of trajectories and a probability density function for side information, the algorithm can assist the defender in selecting the optimal defense strategies by calculating the equilibriums in attack and defense games. In the attack and defense game, we use a game-theoretic model to capture the behavior of the adversary and defender, and we show the effectiveness of the two kinds of defense strategies in the adversary’s inference attacks. Our experimental results indicate that the same defense strategy shows different performance for attack strategies and the proposed algorithm can help to obtain higher defender’s utility compared with other approaches.
By using crowdsourcing, vehicular sensor networks (VSN) are considered essential for achieving automatic and dynamic traffic information collection, and have created various fresh new business applications and services in our daily lives. However, the published trajectories that collected in VSNs raise significant privacy concerns. These existing methods, such as anonymization and cloaking techniques, though they are attractive for providing strong privacy guarantees, generally fail to satisfy the accuracy requirements of the trajectory data based applications. In addition, different attack strategies will result in quite different performance under various privacy preserving strategies. In order to address these challenges, we present a location privacy protection method, the DefenseGame algorithm. Given a set of trajectories and a probability density function for side information, the algorithm can assist the defender in selecting the optimal defense strategies by calculating the equilibriums in attack and defense games. In the attack and defense game, we use a game-theoretic model to capture the behavior of the adversary and defender, and we show the effectiveness of the two kinds of defense strategies in the adversary’s inference attacks. Our experimental results indicate that the same defense strategy shows different performance for attack strategies and the proposed algorithm can help to obtain higher defender’s utility compared with other approaches.
2014, 51(11): 2493-2504.
DOI: 10.7544/issn1000-1239.2014.20130854
Abstract:
The processes of attackers exploiting target network facilities are always gradual in cyberspace, and multiple attack steps would be performed in order to achieve the ultimate goal. How to form the complete picture of attacks or identify the attack scenarios is one of the main challenges in many research fields, such as cyberspace security situation awareness. Alerts correlation analysis based on causal knowledge is one of the main methods of the CEP (complex event processing) technology, which is a promising way to identify the multi-step attack process and reconstruct attack scenarios. Current researches suffer from the problem of defining causal knowledge manually. In order to solve this problem, a causal knowledge mining method based on the Markov property is proposed in this paper. Firstly, the raw alert streams are clustered by address to produce alert cluster sets; then the one step transition probability matrix between different attack types in each cluster set is mined based on the Markov property, and the knowledge with the same steps is fused; finally the knowledge base is created. The experimental results show that this method is feasible.
The processes of attackers exploiting target network facilities are always gradual in cyberspace, and multiple attack steps would be performed in order to achieve the ultimate goal. How to form the complete picture of attacks or identify the attack scenarios is one of the main challenges in many research fields, such as cyberspace security situation awareness. Alerts correlation analysis based on causal knowledge is one of the main methods of the CEP (complex event processing) technology, which is a promising way to identify the multi-step attack process and reconstruct attack scenarios. Current researches suffer from the problem of defining causal knowledge manually. In order to solve this problem, a causal knowledge mining method based on the Markov property is proposed in this paper. Firstly, the raw alert streams are clustered by address to produce alert cluster sets; then the one step transition probability matrix between different attack types in each cluster set is mined based on the Markov property, and the knowledge with the same steps is fused; finally the knowledge base is created. The experimental results show that this method is feasible.
2014, 51(11): 2505-2512.
DOI: 10.7544/issn1000-1239.2014.20130440
Abstract:
The existing fragile watermarking algorithms based on blocks can’t drastically avoid the independence between the blocks. A novel variable-payload self-embedding fragile watermarking algorithm is proposed. Firstly, the original image is divided into blocks with the size of 2×2 and the average value of each block is calculated. Then a new image called average-image can be composed of the obtained average values. The watermark is not generated for every signal block, but encoded statistically according to the correlation between the pixels of average-image. The watermark is embedded into 2-LSB (least significant bit) of the original image. As a result, the length of the watermark is variable due to the correlation of the image. That is to say, for a texture complex image the watermark is longer, but for a smooth image the watermark is shorter. And moreover, the independence of the blocks can be broken, so it has a better ability to resist attacks. As the authentication, string matching is used to match the blocks and the watermark. For the blocks which are not matched, group matching is used to re-match. The theoretic analysis and the simulation results show that the proposed algorithm can not only thwart the collage attack and locate tampered blocks accurately, but also has better non-visibility.
The existing fragile watermarking algorithms based on blocks can’t drastically avoid the independence between the blocks. A novel variable-payload self-embedding fragile watermarking algorithm is proposed. Firstly, the original image is divided into blocks with the size of 2×2 and the average value of each block is calculated. Then a new image called average-image can be composed of the obtained average values. The watermark is not generated for every signal block, but encoded statistically according to the correlation between the pixels of average-image. The watermark is embedded into 2-LSB (least significant bit) of the original image. As a result, the length of the watermark is variable due to the correlation of the image. That is to say, for a texture complex image the watermark is longer, but for a smooth image the watermark is shorter. And moreover, the independence of the blocks can be broken, so it has a better ability to resist attacks. As the authentication, string matching is used to match the blocks and the watermark. For the blocks which are not matched, group matching is used to re-match. The theoretic analysis and the simulation results show that the proposed algorithm can not only thwart the collage attack and locate tampered blocks accurately, but also has better non-visibility.
2014, 51(11): 2513-2517.
DOI: 10.7544/issn1000-1239.2014.20130882
Abstract:
A common way to build hash functions is combining a secure block cipher with an appropriate chaining structure. When the block cipher used has strong security properties such as AES, the security of this kind of hash functions mainly depends on the security of the certain chaining structure. HTBC is a hash function design which uses a special triple-block-chaining structure to be combined with a secure block cipher. The triple-block-chaining structure used by the HTBC hash function is not secure. Based on the serious flaws of the chaining structure, using particular properties of related operations, the collisions of the HTBC hash function can be directly constructed, and the time complexity to find a single collision is only 1. For the messages with certain length, a second preimage can be constructed as well. When AES-256 is used as the block cipher, the maximum time complexity to launch a second preimage attack is 2\+{112}, which is lower than the brute force attack bound 2\+{128}. For the particular weak messages, the overall time needed to find a second preimage is only 1. If the message is randomly chosen, the average time complexity to launch a successful second preimage attack is 2\+{46.56}.
A common way to build hash functions is combining a secure block cipher with an appropriate chaining structure. When the block cipher used has strong security properties such as AES, the security of this kind of hash functions mainly depends on the security of the certain chaining structure. HTBC is a hash function design which uses a special triple-block-chaining structure to be combined with a secure block cipher. The triple-block-chaining structure used by the HTBC hash function is not secure. Based on the serious flaws of the chaining structure, using particular properties of related operations, the collisions of the HTBC hash function can be directly constructed, and the time complexity to find a single collision is only 1. For the messages with certain length, a second preimage can be constructed as well. When AES-256 is used as the block cipher, the maximum time complexity to launch a second preimage attack is 2\+{112}, which is lower than the brute force attack bound 2\+{128}. For the particular weak messages, the overall time needed to find a second preimage is only 1. If the message is randomly chosen, the average time complexity to launch a successful second preimage attack is 2\+{46.56}.
2014, 51(11): 2518-2527.
DOI: 10.7544/issn1000-1239.2014.20130869
Abstract:
Uncertain data stream, a new widespread data form which is emerging in many application fields with the development of computer and sensing technology. The research of data analysis and processing of uncertain data stream has attracted the attention of many researchers. Existing data stream clustering techniques generally ignored uncertainty characteristics. It often makes the clustering results unreasonable even unavailable. The two aspects of uncertain character, existence-uncertainty and attributive-uncertainty, can affect the clustering process and results significantly. But they can’t be considered at same time in existing relevant work. The lately reported clustering algorithms are all based on K-Means algorithm with inherent shortage. In order to solve this problem, a data stream adaptive grid-density based algorithm, ADC-UStream, is proposed under the uncertainty of model. For the uncertainty characteristic, with the unified strategy of the presence and properties uncertainty, the algorithm builds the entropy uncertainty model to measure the uncertainty. With the comprehensive survey of uncertainty, the grid-density based clustering algorithm over attenuation window model is adopted to design the temporal and spatial adaptive density threshold, to adapt to the temporal and non-uniform distribution characteristics of the uncertainty data flow. The experimental results show that the ADC-UStream algorithm under the uncertainty model has good performance both in clustering quality and clustering efficiency.
Uncertain data stream, a new widespread data form which is emerging in many application fields with the development of computer and sensing technology. The research of data analysis and processing of uncertain data stream has attracted the attention of many researchers. Existing data stream clustering techniques generally ignored uncertainty characteristics. It often makes the clustering results unreasonable even unavailable. The two aspects of uncertain character, existence-uncertainty and attributive-uncertainty, can affect the clustering process and results significantly. But they can’t be considered at same time in existing relevant work. The lately reported clustering algorithms are all based on K-Means algorithm with inherent shortage. In order to solve this problem, a data stream adaptive grid-density based algorithm, ADC-UStream, is proposed under the uncertainty of model. For the uncertainty characteristic, with the unified strategy of the presence and properties uncertainty, the algorithm builds the entropy uncertainty model to measure the uncertainty. With the comprehensive survey of uncertainty, the grid-density based clustering algorithm over attenuation window model is adopted to design the temporal and spatial adaptive density threshold, to adapt to the temporal and non-uniform distribution characteristics of the uncertainty data flow. The experimental results show that the ADC-UStream algorithm under the uncertainty model has good performance both in clustering quality and clustering efficiency.
2014, 51(11): 2528-2537.
DOI: 10.7544/issn1000-1239.2014.20130789
Abstract:
With the popularization of cloud computing, software as a service (SaaS) has become an important form of cloud computing. Memory resource owned by each data node in the cloud is a key resource to improve data access performance of multi-tenant applications. Therefore, memory resource share and provisioning have received a lot of attention from SaaS providers. For the service providers, how to reasonably allocate memory resource in each data node in order to obtain higher profits while guaranteeing tenants’ service level agreement (SLA) has become a challenge. Addressing the challenge, we propose a framework of multi-tenant memory management (MTMM) for cloud data storage and corresponding memory allocation method. The method takes the maximum profits service provider can obtain as a target. Combined with tenants’ SLA profit models, the global memory allocation problem is analyzed and modeled as an objective optimal problem. Corresponding the profits service provider can get under different memory allocation strategies are predicted through it. Considering the characteristics of multi-tenant memory allocation, we solve the problem by optimized genetic algorithm in order to improve the performance of the method. Compared with the traditional LRU method and multi-tenant memory allocation method employed in single node, the mechanism proposed in this paper can effectively manage memory and provide higher profits for service providers.
With the popularization of cloud computing, software as a service (SaaS) has become an important form of cloud computing. Memory resource owned by each data node in the cloud is a key resource to improve data access performance of multi-tenant applications. Therefore, memory resource share and provisioning have received a lot of attention from SaaS providers. For the service providers, how to reasonably allocate memory resource in each data node in order to obtain higher profits while guaranteeing tenants’ service level agreement (SLA) has become a challenge. Addressing the challenge, we propose a framework of multi-tenant memory management (MTMM) for cloud data storage and corresponding memory allocation method. The method takes the maximum profits service provider can obtain as a target. Combined with tenants’ SLA profit models, the global memory allocation problem is analyzed and modeled as an objective optimal problem. Corresponding the profits service provider can get under different memory allocation strategies are predicted through it. Considering the characteristics of multi-tenant memory allocation, we solve the problem by optimized genetic algorithm in order to improve the performance of the method. Compared with the traditional LRU method and multi-tenant memory allocation method employed in single node, the mechanism proposed in this paper can effectively manage memory and provide higher profits for service providers.
Abstract:
Data-driven parallel computing is widely used in scientific and engineering computation. Most of these computations are based on data dependency diagraphs. In real word applications, vertex scheduling, data communication and numerical computation are executed concurrently in a tightly coupled way, and it is hard to implement in a decoupled manner, which imposes difficulties for both application software co-design and code reuse. To address these problems, in this paper, we propose a hierarchical software architecture and implement it in sweeping integrator component, which is the part of the J adaptive structured mesh infrastructure (JASMIN). The hierarchical architecture is based on a unified algorithm framework for data dependency diagraph computation. It consists of three levels, including directed acyclic graph (DAG) scheduling level, DAG modeling level and numerical computation level. This design provides strong support for decoupled implementation of vertex scheduling, data communication and numerical computation, which are essential in data-driven parallel computing. We apply this result in typical scientific computing applications such as neutron transportation. The sequential implementation overhead and parallel performance results are obtained on a parallel computer with 2048 CPU cores. These results suggest that our hierarchical software architecture and component-based implementation are both effective and efficient.
Data-driven parallel computing is widely used in scientific and engineering computation. Most of these computations are based on data dependency diagraphs. In real word applications, vertex scheduling, data communication and numerical computation are executed concurrently in a tightly coupled way, and it is hard to implement in a decoupled manner, which imposes difficulties for both application software co-design and code reuse. To address these problems, in this paper, we propose a hierarchical software architecture and implement it in sweeping integrator component, which is the part of the J adaptive structured mesh infrastructure (JASMIN). The hierarchical architecture is based on a unified algorithm framework for data dependency diagraph computation. It consists of three levels, including directed acyclic graph (DAG) scheduling level, DAG modeling level and numerical computation level. This design provides strong support for decoupled implementation of vertex scheduling, data communication and numerical computation, which are essential in data-driven parallel computing. We apply this result in typical scientific computing applications such as neutron transportation. The sequential implementation overhead and parallel performance results are obtained on a parallel computer with 2048 CPU cores. These results suggest that our hierarchical software architecture and component-based implementation are both effective and efficient.
2014, 51(11): 2547-2558.
DOI: 10.7544/issn1000-1239.2014.20130750
Abstract:
This paper has been cancelled.
This paper has been cancelled.
2014, 51(11): 2559-2572.
DOI: 10.7544/issn1000-1239.2014.20130242
Abstract:
In remote visualization for complex 3D dynamic scene, compression and transmission for vertex trajectory needs computing huge vertex and has very slow computational speed. In order to solve these problems, we present a remote visualization method for 3D dynamic scene based on image space, which replaces vertex trajectory with sample trajectory to reduce the vertex number in computation and presents a parallel compression method to realize rapid and effect compression for dynamic datasets. First, adaptive sampling algorithm in space and time (time after space) using graphics pipeline and the construction of samples connective information is presented to obtain many animated depth images. Then, the trajectory of samples is compressed in parallel in each animated depth image, which reduces compression time effectively. Finally, compressed dynamic datasets are transferred to the client and 3D dynamic scene over some time is reconstructed. Reconstructed 3D dynamic scene during some time supplies the client with observation in arbitrary angles and has little decline in rendering quality. Exprement results also show that the algorithms realize rapid compression and reduce the number of dynamic datasets greatly, which reduce network latency limitation effectively.
In remote visualization for complex 3D dynamic scene, compression and transmission for vertex trajectory needs computing huge vertex and has very slow computational speed. In order to solve these problems, we present a remote visualization method for 3D dynamic scene based on image space, which replaces vertex trajectory with sample trajectory to reduce the vertex number in computation and presents a parallel compression method to realize rapid and effect compression for dynamic datasets. First, adaptive sampling algorithm in space and time (time after space) using graphics pipeline and the construction of samples connective information is presented to obtain many animated depth images. Then, the trajectory of samples is compressed in parallel in each animated depth image, which reduces compression time effectively. Finally, compressed dynamic datasets are transferred to the client and 3D dynamic scene over some time is reconstructed. Reconstructed 3D dynamic scene during some time supplies the client with observation in arbitrary angles and has little decline in rendering quality. Exprement results also show that the algorithms realize rapid compression and reduce the number of dynamic datasets greatly, which reduce network latency limitation effectively.
2014, 51(11): 2573-2584.
DOI: 10.7544/issn1000-1239.2014.20130687
Abstract:
Most combinational optimization problems of information science can be abstracted into the maximum weight perfect matching problem of bipartite graph. Due to the growth in the amount of data, classical algorithms of solving the matching problem of bipartite graph are difficult to balance the contradiction between the efficiency of the algorithms and the precision of the solutions. Based on this point, an intelligent optimization method to solve the general maximum weight perfect matching problem is proposed. The method converts the matrix form of an original candidate matching solution into the evolution basis which the intelligent algorithm can handle, and adaptively chooses the effectively evolutionary strategies from the improved discrete particle swarm optimization and simulated annealing based on the off-spring selection and the quantum-cooperative process, and moreover, it maintains the steady evolution and promots the rapid convergence of the population. According to the experimental results of the test functions with the various types and the matching matrixes with the various dimensions, the proposed method outperforms other algorithms under the condition of the restricted iteration, and presents the higher convergence precision and the faster convergence speed, as well as the adaptability to various kinds of the classical optimization problems and the high dimensional matching problems.
Most combinational optimization problems of information science can be abstracted into the maximum weight perfect matching problem of bipartite graph. Due to the growth in the amount of data, classical algorithms of solving the matching problem of bipartite graph are difficult to balance the contradiction between the efficiency of the algorithms and the precision of the solutions. Based on this point, an intelligent optimization method to solve the general maximum weight perfect matching problem is proposed. The method converts the matrix form of an original candidate matching solution into the evolution basis which the intelligent algorithm can handle, and adaptively chooses the effectively evolutionary strategies from the improved discrete particle swarm optimization and simulated annealing based on the off-spring selection and the quantum-cooperative process, and moreover, it maintains the steady evolution and promots the rapid convergence of the population. According to the experimental results of the test functions with the various types and the matching matrixes with the various dimensions, the proposed method outperforms other algorithms under the condition of the restricted iteration, and presents the higher convergence precision and the faster convergence speed, as well as the adaptability to various kinds of the classical optimization problems and the high dimensional matching problems.