2013 Vol. 50 No. 8
2013, 50(8): 1573-1582.
Abstract:
With the increase of data intensive application, cluster file systems need to manage PB or even EB scale storage. Limited by the management of data location information, object storage servers have scalable problems at data location, load balance and replica maintenance. To deal with these problems, we present a scalable storage space management to support EB scale storage. Firstly, the presented method organizes object location information into a two-level indexed structure through extendible hashing. Scalable management of object location information can be achieved by this structure. Secondly, this method places object based on the distribution of object location structure. The system can adjust the distribution of data by adjusting the distribution of object location structure with little overhead. Thirdly, the method records the replica location at the granularity of object location structure to reduce maintenance overhead. The evaluation shows the storage space management can provide high efficient data management for massive storage. With load balance mechanism, the I/O throughput of the system can be increased by 10%. Under the concurrent workload, compared with Lustre and DCFS3, the system can achieve a more scalable performance.
With the increase of data intensive application, cluster file systems need to manage PB or even EB scale storage. Limited by the management of data location information, object storage servers have scalable problems at data location, load balance and replica maintenance. To deal with these problems, we present a scalable storage space management to support EB scale storage. Firstly, the presented method organizes object location information into a two-level indexed structure through extendible hashing. Scalable management of object location information can be achieved by this structure. Secondly, this method places object based on the distribution of object location structure. The system can adjust the distribution of data by adjusting the distribution of object location structure with little overhead. Thirdly, the method records the replica location at the granularity of object location structure to reduce maintenance overhead. The evaluation shows the storage space management can provide high efficient data management for massive storage. With load balance mechanism, the I/O throughput of the system can be increased by 10%. Under the concurrent workload, compared with Lustre and DCFS3, the system can achieve a more scalable performance.
2013, 50(8): 1583-1591.
Abstract:
The speed gap between processor and memory is constantly widening, which substantially exacerbates the dependence of program performance on the on-chip memory hierarchy design in chip multiprocessors (CMP). However, traditional data management mechanism doesn't take advantage of the property of non-uniform cache access latency in large distributed cache in CMP, which causes the contradiction between miss rate and hit latency is increasingly serious. Furthermore, it is difficult to solve the problems of conflicting access and long latency hit to shared blocks by simply replying on dynamic migration and blind replication. Aiming at these challenges, this paper proposes an adaptive migration-replication (AMR) mechanism based on the virtual shared regions (VSR) partition in tiled CMP. Both the state of the victim candidate in local VSR and the activity degree of remote source line are taken into consideration cooperatively, so that the shared blocks accessed by different processor cores can be migrated and replicated between different VSRs adaptively, which results in the reduction of the average memory access time. Finally, the VSR partition and AMR mechanism are implemented using a full system simulator, and the typical benchmark suit SPLASH-2 is used to evaluate the performance improvement. Simulation results demonstrate that AMR performs well under different sharing degree compared with traditional fixed partition mechanism, while the additional hardware overhead is negligible.
The speed gap between processor and memory is constantly widening, which substantially exacerbates the dependence of program performance on the on-chip memory hierarchy design in chip multiprocessors (CMP). However, traditional data management mechanism doesn't take advantage of the property of non-uniform cache access latency in large distributed cache in CMP, which causes the contradiction between miss rate and hit latency is increasingly serious. Furthermore, it is difficult to solve the problems of conflicting access and long latency hit to shared blocks by simply replying on dynamic migration and blind replication. Aiming at these challenges, this paper proposes an adaptive migration-replication (AMR) mechanism based on the virtual shared regions (VSR) partition in tiled CMP. Both the state of the victim candidate in local VSR and the activity degree of remote source line are taken into consideration cooperatively, so that the shared blocks accessed by different processor cores can be migrated and replicated between different VSRs adaptively, which results in the reduction of the average memory access time. Finally, the VSR partition and AMR mechanism are implemented using a full system simulator, and the typical benchmark suit SPLASH-2 is used to evaluate the performance improvement. Simulation results demonstrate that AMR performs well under different sharing degree compared with traditional fixed partition mechanism, while the additional hardware overhead is negligible.
2013, 50(8): 1592-1603.
Abstract:
Data indexing is one of the most important techniques for distributed storage systems in cloud computing environments since the application data has been partitioned among different storage nodes of the data center. With the rapid development of Web applications, most query requests about metadata information are more complicated. However, the state-of-the-art indexing mechanisms for distributed storage system cannot support complex query, such as multi-dimensional query and range query. To address this issue, we firstly construct the definition of prefix binary tree (PBT) in this paper to support range query process. We then investigate a multi-dimensional indexing for complex query in cloud computing (M-Index) by the combination of pyramid-technique and PBT to transform the multi-dimensional metadata into a single-dimensional key. Data are distributed to overlay networks based on the key and consistent hashing to implement the efficient acquisition and distribution of data. On this basis, we propose a query algorithm based on M-Index which will support multi-dimensional query and range query. Last but not the least, theoretic analysis proves that M-Index possesses fine complex query efficiency as well as completeness of query results. And furthermore, the experiment results demonstrate that our indexing mechanism can outperform the existing relevant mechanisms in query efficiency and load balancing.
Data indexing is one of the most important techniques for distributed storage systems in cloud computing environments since the application data has been partitioned among different storage nodes of the data center. With the rapid development of Web applications, most query requests about metadata information are more complicated. However, the state-of-the-art indexing mechanisms for distributed storage system cannot support complex query, such as multi-dimensional query and range query. To address this issue, we firstly construct the definition of prefix binary tree (PBT) in this paper to support range query process. We then investigate a multi-dimensional indexing for complex query in cloud computing (M-Index) by the combination of pyramid-technique and PBT to transform the multi-dimensional metadata into a single-dimensional key. Data are distributed to overlay networks based on the key and consistent hashing to implement the efficient acquisition and distribution of data. On this basis, we propose a query algorithm based on M-Index which will support multi-dimensional query and range query. Last but not the least, theoretic analysis proves that M-Index possesses fine complex query efficiency as well as completeness of query results. And furthermore, the experiment results demonstrate that our indexing mechanism can outperform the existing relevant mechanisms in query efficiency and load balancing.
2013, 50(8): 1604-1612.
Abstract:
In sequential data storage, such as video surveillance, continuous data protection (CDP), virtual tape library (VTL), etc., the address of the I/O requests are mainly continuous except a small amount of random accesses, but the small write still exists and hampers the full exploitation of the performance of RAID5. In this paper, a write optimization method for RAID5 in sequential storage (WOSS) is presented. Firstly, address translation (AT) based on area mapping is performed, which maps the discontinuous address of write requests to continuous address space, so the continuous write for RAID5 is realized. Secondly, the write requests are buffered and further reorganized into the new ones aligned to the stripes of RAID5. Then they are dispatched to RAID5, thus the full write for RAID5 is achieved. Full write to RAID5 may completely eliminate the extra overhead caused by read modify write or reconstruction write for generating the parity, meanwhile the continuity of write requests further decreases the seek time and improves the throughput of RAID5 significantly. Experiments show that the optimization may improve the write performance prominently, especially when the 80% workload is sequential, and that the data transfer rate approaches about the maximum of RAID5, while the read performance is limitly reduced. The method is also suitable for RAID4 and RAID6 in sequential data storage.
In sequential data storage, such as video surveillance, continuous data protection (CDP), virtual tape library (VTL), etc., the address of the I/O requests are mainly continuous except a small amount of random accesses, but the small write still exists and hampers the full exploitation of the performance of RAID5. In this paper, a write optimization method for RAID5 in sequential storage (WOSS) is presented. Firstly, address translation (AT) based on area mapping is performed, which maps the discontinuous address of write requests to continuous address space, so the continuous write for RAID5 is realized. Secondly, the write requests are buffered and further reorganized into the new ones aligned to the stripes of RAID5. Then they are dispatched to RAID5, thus the full write for RAID5 is achieved. Full write to RAID5 may completely eliminate the extra overhead caused by read modify write or reconstruction write for generating the parity, meanwhile the continuity of write requests further decreases the seek time and improves the throughput of RAID5 significantly. Experiments show that the optimization may improve the write performance prominently, especially when the 80% workload is sequential, and that the data transfer rate approaches about the maximum of RAID5, while the read performance is limitly reduced. The method is also suitable for RAID4 and RAID6 in sequential data storage.
2013, 50(8): 1613-1627.
Abstract:
In trusted cloud storage (TCS), for protecting the privacy of the sensitive outsourced cloud data, data owners locally encrypt their data before outsourcing. Through the secure management of the data keys, the selective access of outsourced data can be enforced in TCS scenarios. However, in TCS with multiple data owners, it remains a challenge to reduce users security risk and costs of key management as much as possible. In this paper, we propose a novel key management scheme based on global logical hierarchical graph (GLHG) for key derivation, which is used to enforce correctly the global authorization policies of all users. Our solution can achieve high efficiency by delegating the management of GLHG structure to cloud and adopting proxy re-encryption (PRE) technology. Additionally, this paper states the update policies for supporting dynamic access control. Finally, we show the benefits of our solution by experimentally evaluating quantitative criterions of key management.
In trusted cloud storage (TCS), for protecting the privacy of the sensitive outsourced cloud data, data owners locally encrypt their data before outsourcing. Through the secure management of the data keys, the selective access of outsourced data can be enforced in TCS scenarios. However, in TCS with multiple data owners, it remains a challenge to reduce users security risk and costs of key management as much as possible. In this paper, we propose a novel key management scheme based on global logical hierarchical graph (GLHG) for key derivation, which is used to enforce correctly the global authorization policies of all users. Our solution can achieve high efficiency by delegating the management of GLHG structure to cloud and adopting proxy re-encryption (PRE) technology. Additionally, this paper states the update policies for supporting dynamic access control. Finally, we show the benefits of our solution by experimentally evaluating quantitative criterions of key management.
2013, 50(8): 1628-1636.
Abstract:
A management approach to key used times based on trusted platform module (TPM) is proposed to protect the confidentiality of data in cloud storage and control the key-used times. Firstly, the data is encrypted by a symmetric encryption scheme using a data encryption key (DEK). And then DEK is encrypted by the ciphertext-policy attribute-based encryption (CP-ABE) scheme to control the access of DEK. Only those whose attributes satisfy the access control tree adopted by CP-ABE can decrypt and access DEK. Then DEK will be stored securely by binding the key and the TPM with a digital signature locally. The physical monotonic counter of the TPM is utilized to generate virtual monotonic counter (VMC) for each DEK. Secondly, comparing the monotonically increased value of VMC and the pre-set times that DEK can be used, DEK is judged to be deleted or to be used unceasingly so that the used times of DEK is controlled. Finally, the replay attack of the hard disk is prevented by the anti-physical tampering functionality of TPM, monotonicity of the counter, and digital signature. The experiment results show that the performance cost is low and the proposed scheme can securely store and effectively protect DEK, thus achieving the goal that the times of DEK can be used is limited.
A management approach to key used times based on trusted platform module (TPM) is proposed to protect the confidentiality of data in cloud storage and control the key-used times. Firstly, the data is encrypted by a symmetric encryption scheme using a data encryption key (DEK). And then DEK is encrypted by the ciphertext-policy attribute-based encryption (CP-ABE) scheme to control the access of DEK. Only those whose attributes satisfy the access control tree adopted by CP-ABE can decrypt and access DEK. Then DEK will be stored securely by binding the key and the TPM with a digital signature locally. The physical monotonic counter of the TPM is utilized to generate virtual monotonic counter (VMC) for each DEK. Secondly, comparing the monotonically increased value of VMC and the pre-set times that DEK can be used, DEK is judged to be deleted or to be used unceasingly so that the used times of DEK is controlled. Finally, the replay attack of the hard disk is prevented by the anti-physical tampering functionality of TPM, monotonicity of the counter, and digital signature. The experiment results show that the performance cost is low and the proposed scheme can securely store and effectively protect DEK, thus achieving the goal that the times of DEK can be used is limited.
2013, 50(8): 1637-1646.
Abstract:
Hardware processes and hardware centric execution model introduced by BORPH have improved the usability of reconfigurable computers significantly. BORPH-N is designed as an extended system of BORPH. The main extension is that BORPH-N supports shared memory reconfigurable computers. And hardware processes can communicate with the rest of the system by shared memory and semaphore which are Unix semantic in BORPH-N. Accelerating the computing intensive parts of applications is one of the most important goals of reconfigurable computing. Thereby efficiency of software-hardware communications is very crucial. Supporting shared memory and semaphore through simple mechanisms such as remote system call, can definitely not meet the need of applications. Independent-execution-based optimizations are adopted by BORPH-N. Independent execution means the FPGA does some work locally without the help of the host. The efficiency will be enhanced a lot due to the elimination of data exchange between the host and FPGA. To reduce the workload, only the functions which are repeated frequently during the execution of applications will be completed by FPGA independently. BORPH-N focuses on two tasks: virtual memory access and atomic variable access. Experiment is setup on a PC with an ARRIA II GX FPGA board. The results show the hardware overhead of BORPH-N is low. The efficiency of shared memory and semaphore access is close to the peak performance of hardware platform.
Hardware processes and hardware centric execution model introduced by BORPH have improved the usability of reconfigurable computers significantly. BORPH-N is designed as an extended system of BORPH. The main extension is that BORPH-N supports shared memory reconfigurable computers. And hardware processes can communicate with the rest of the system by shared memory and semaphore which are Unix semantic in BORPH-N. Accelerating the computing intensive parts of applications is one of the most important goals of reconfigurable computing. Thereby efficiency of software-hardware communications is very crucial. Supporting shared memory and semaphore through simple mechanisms such as remote system call, can definitely not meet the need of applications. Independent-execution-based optimizations are adopted by BORPH-N. Independent execution means the FPGA does some work locally without the help of the host. The efficiency will be enhanced a lot due to the elimination of data exchange between the host and FPGA. To reduce the workload, only the functions which are repeated frequently during the execution of applications will be completed by FPGA independently. BORPH-N focuses on two tasks: virtual memory access and atomic variable access. Experiment is setup on a PC with an ARRIA II GX FPGA board. The results show the hardware overhead of BORPH-N is low. The efficiency of shared memory and semaphore access is close to the peak performance of hardware platform.
2013, 50(8): 1647-1656.
Abstract:
The optimization of join strategies between columns is an important problem in column store based queries. Current column-oriented systems simplify the join strategy by changing storage structure, making the join strategy lack query optimization, which can not achieve the satisfied performance. On the basis of these problems, this paper presents a new join strategy optimization method with cost-based and rule-based method. Firstly, we use the rule-based optimization (RBO), setting the optimization rules to remove those candidate plans with too much cost. Then we design the cost-based optimization (CBO). We change the execution order by Huffman tree and left-deep tree principle. Then we summarize the execution strategies of each join node in the column-oriented query plan into pipeline strategy and parallel strategy. Based on that, a cost model is then proposed to select the better strategy. With small time and space complexity, the efficiency of the query execution in column-oriented systems is improved by focusing on estimating the cost of the pipeline and parallel strategies in this paper. The experimental results on the large-scale data warehouse benchmark data sets SSB verify the effectiveness of the proposed method.
The optimization of join strategies between columns is an important problem in column store based queries. Current column-oriented systems simplify the join strategy by changing storage structure, making the join strategy lack query optimization, which can not achieve the satisfied performance. On the basis of these problems, this paper presents a new join strategy optimization method with cost-based and rule-based method. Firstly, we use the rule-based optimization (RBO), setting the optimization rules to remove those candidate plans with too much cost. Then we design the cost-based optimization (CBO). We change the execution order by Huffman tree and left-deep tree principle. Then we summarize the execution strategies of each join node in the column-oriented query plan into pipeline strategy and parallel strategy. Based on that, a cost model is then proposed to select the better strategy. With small time and space complexity, the efficiency of the query execution in column-oriented systems is improved by focusing on estimating the cost of the pipeline and parallel strategies in this paper. The experimental results on the large-scale data warehouse benchmark data sets SSB verify the effectiveness of the proposed method.
2013, 50(8): 1657-1666.
Abstract:
Cloud-based services are emerging as an economical and convenient alternative for clients who don't want to acquire, maintain and operate their own IT equipment. Instead, customers purchase virtual machines (VMs) with certain service level objectives (SLOs) to obtain computational resources. Existing algorithms for memory and CPU allocation are inadequate for I/O allocation, especially in clustered storage infrastructures where storage is distributed across multiple storage nodes. This paper focuses on: 1) dynamic SLO decomposition so that VMs receive proper I/O service in each distributed storage node, and 2) efficient and robust local I/O scheduling strategy. To address these issues, we present an adaptive I/O resource scheduling algorithm (called PC) for utility optimization that at runtime adjusts local SLOs. The local SLOs are generated for each VM at each storage node based on access patterns. We also adopt dual clocks to allow automatic switching between two scheduling strategies. When system capacity is sufficient, we interweave requests in an earliest deadline first (EDF) manner. Otherwise resources are allocated proportionately to their normalized revenues. The results of our experiments suggest that the algorithm is adaptive to various access patterns without significant manual pre-settings while maximizing profits.
Cloud-based services are emerging as an economical and convenient alternative for clients who don't want to acquire, maintain and operate their own IT equipment. Instead, customers purchase virtual machines (VMs) with certain service level objectives (SLOs) to obtain computational resources. Existing algorithms for memory and CPU allocation are inadequate for I/O allocation, especially in clustered storage infrastructures where storage is distributed across multiple storage nodes. This paper focuses on: 1) dynamic SLO decomposition so that VMs receive proper I/O service in each distributed storage node, and 2) efficient and robust local I/O scheduling strategy. To address these issues, we present an adaptive I/O resource scheduling algorithm (called PC) for utility optimization that at runtime adjusts local SLOs. The local SLOs are generated for each VM at each storage node based on access patterns. We also adopt dual clocks to allow automatic switching between two scheduling strategies. When system capacity is sufficient, we interweave requests in an earliest deadline first (EDF) manner. Otherwise resources are allocated proportionately to their normalized revenues. The results of our experiments suggest that the algorithm is adaptive to various access patterns without significant manual pre-settings while maximizing profits.
2013, 50(8): 1667-1673.
Abstract:
To solve the “I/O wall” problem in the case of real-time accessing about mass information processing and improve the performance of distributed mass storage systems, an access policy based on storage unit pass-through is proposed. Firstly we analyse the problem of traditional access models. Then we study the mechanism of pass-through pattern, and build the model which divides the storage system into multi-level, distributed storage system. Next, the continuous mapping of physical address, cache address of storage space and logical space address of storage system is realized depending on the different levels and mapping strategies. Thirdly we analyse the time consuming of pass-through storage path in pass-through pattern. Fourthly we test the performance of the storage unit in the simulated environment, and validate the application system which employs the strategies. The validated results show that the method can improve the performance of storage system effectively, and can meet the needs of real-time accessing about massive information processing in the ubiquitous network architecture.
To solve the “I/O wall” problem in the case of real-time accessing about mass information processing and improve the performance of distributed mass storage systems, an access policy based on storage unit pass-through is proposed. Firstly we analyse the problem of traditional access models. Then we study the mechanism of pass-through pattern, and build the model which divides the storage system into multi-level, distributed storage system. Next, the continuous mapping of physical address, cache address of storage space and logical space address of storage system is realized depending on the different levels and mapping strategies. Thirdly we analyse the time consuming of pass-through storage path in pass-through pattern. Fourthly we test the performance of the storage unit in the simulated environment, and validate the application system which employs the strategies. The validated results show that the method can improve the performance of storage system effectively, and can meet the needs of real-time accessing about massive information processing in the ubiquitous network architecture.
2013, 50(8): 1674-1682.
Abstract:
Data warehouse which utilizes the column-oriented store approach appears to be more conducive to data compression. Order-preserving lightweight compression methods show its superiority on the compression of column stored string data. However, they seldom consider the probability of string occurence, which would affect the compression performance. This paper presents a probability-based order-preserving string compression method. First, we propose an improved shared leaf structure. It makes the encoding and decoding index share the same code table, which greatly reduces the time of maintaining the encoding and decoding index. At the same time, we record the probability of the string in the proposed structure, then establish the decoding index according to the probability. It effectively reduces the decompression time of high-frequency strings. Further more, this paper also preserves the information of row-id in the proposed leaf structure according to the column storage characteristics, thus effectively reducing the storage space and creation time for the column index. The experimental results demonstrate the effectiveness of the proposed method.
Data warehouse which utilizes the column-oriented store approach appears to be more conducive to data compression. Order-preserving lightweight compression methods show its superiority on the compression of column stored string data. However, they seldom consider the probability of string occurence, which would affect the compression performance. This paper presents a probability-based order-preserving string compression method. First, we propose an improved shared leaf structure. It makes the encoding and decoding index share the same code table, which greatly reduces the time of maintaining the encoding and decoding index. At the same time, we record the probability of the string in the proposed structure, then establish the decoding index according to the probability. It effectively reduces the decompression time of high-frequency strings. Further more, this paper also preserves the information of row-id in the proposed leaf structure according to the column storage characteristics, thus effectively reducing the storage space and creation time for the column index. The experimental results demonstrate the effectiveness of the proposed method.
2013, 50(8): 1683-1689.
Abstract:
Sentiment classification of documents aims to determine the opinion (e.g., negative or positive) of a given document. Existing studies have shown that, usually, supervised classification approaches perform well in sentiment classification. However, in most cases, the existing labeled data and the unlabeled data don’t belong to the same domain. And the performance of sentiment classification decreases sharply when transferred from one domain to another domain. This causes cross-domain sentiment classification, which is a very significant problem and getting more and more attention. A unified framework is proposed, which integrates several stages for cross-domain sentiment classification. Firstly, we utilize the accurate labels of source-domain documents to get the initial labels of target-domain documents. Then, we build the target domain as a weighted network, and choose some target-domain documents whose opinions are determined more accurately as “source components” and “sink components”. Further, we apply heat conduction process to the weighted network to improve the performance of cross-domain sentiment classification of target-domain data, with the help of “source components” and “sink components”. An experiment is conducted using data from three different domains, and we transfer between two of them. The experiment results indicate that the proposed framework could improve the performance of cross-domain sentiment classification dramatically.
Sentiment classification of documents aims to determine the opinion (e.g., negative or positive) of a given document. Existing studies have shown that, usually, supervised classification approaches perform well in sentiment classification. However, in most cases, the existing labeled data and the unlabeled data don’t belong to the same domain. And the performance of sentiment classification decreases sharply when transferred from one domain to another domain. This causes cross-domain sentiment classification, which is a very significant problem and getting more and more attention. A unified framework is proposed, which integrates several stages for cross-domain sentiment classification. Firstly, we utilize the accurate labels of source-domain documents to get the initial labels of target-domain documents. Then, we build the target domain as a weighted network, and choose some target-domain documents whose opinions are determined more accurately as “source components” and “sink components”. Further, we apply heat conduction process to the weighted network to improve the performance of cross-domain sentiment classification of target-domain data, with the help of “source components” and “sink components”. An experiment is conducted using data from three different domains, and we transfer between two of them. The experiment results indicate that the proposed framework could improve the performance of cross-domain sentiment classification dramatically.
Abstract:
Manifold learning has become a hot issue in the field of machine learning and data mining. Its algorithms often assume that the data resides on a single manifold. And both the theories and algorithms are lacking when the data is supported on a mixture of manifolds. A new method, which is called DC-ISOMAP method, is proposed for the nonlinear dimensionality reduction of data lying on the separated multi-manifold with same intrinsic dimension. Although several algorithms based on spectral clustering or manifold clustering can separate sub-manifolds and get their low-dimensional embeddings, the algorithms do not think about the topological structure of multi-manifolds and must know the number of the clustered sub-manifolds. DC-ISOMAP firstly decomposes a given data set into several sub-manifolds by propagating the tangent subspace of the point with maximum sampling density to a separate sub-manifold, and then the low-dimensional embeddings of each sub-manifold is independently calculated. Finally the embeddings of all sub-manifolds are composed into their proper positions and orientations based on their inter-connections. Experimental results on synthetic data as well as real world images demonstrate that our approaches can construct an accurate low-dimensional representation of the data in an efficient manner.
Manifold learning has become a hot issue in the field of machine learning and data mining. Its algorithms often assume that the data resides on a single manifold. And both the theories and algorithms are lacking when the data is supported on a mixture of manifolds. A new method, which is called DC-ISOMAP method, is proposed for the nonlinear dimensionality reduction of data lying on the separated multi-manifold with same intrinsic dimension. Although several algorithms based on spectral clustering or manifold clustering can separate sub-manifolds and get their low-dimensional embeddings, the algorithms do not think about the topological structure of multi-manifolds and must know the number of the clustered sub-manifolds. DC-ISOMAP firstly decomposes a given data set into several sub-manifolds by propagating the tangent subspace of the point with maximum sampling density to a separate sub-manifold, and then the low-dimensional embeddings of each sub-manifold is independently calculated. Finally the embeddings of all sub-manifolds are composed into their proper positions and orientations based on their inter-connections. Experimental results on synthetic data as well as real world images demonstrate that our approaches can construct an accurate low-dimensional representation of the data in an efficient manner.
2013, 50(8): 1700-1709.
Abstract:
Recently, quite a few works focus on single machine parallel batch scheduling problem with penalties. While most of them concern the makespan objective, little effort is put into the total completion time, which is studied in this paper. We study the following single machine parallel batch scheduling problem: There are n given jobs. Each job has a processing time and a rejection penalty (Some job may be rejected for processing. Some rejection penalty is paid if a job is rejected). There is a single parallel batch machine that can process at most b jobs simultaneously. The jobs processed together form a batch. All jobs processed in the same batch have the same start time and the same completion time, that is, the start time plus the maximum processing time over all the jobs in the batch. We need to determine how to choose jobs for processing, divide these jobs into batches, and sequence these batches so that the sum of completion time of the processed jobs plus the sum of rejection penalties of the rejected ones is minimized. We give an O((b2-b)×n2b2-2b+2×bb2-b-1)-time dynamic programming algorithm, and thus conclude that this problem can be solved in polynomial time when the batch size b is fixed.
Recently, quite a few works focus on single machine parallel batch scheduling problem with penalties. While most of them concern the makespan objective, little effort is put into the total completion time, which is studied in this paper. We study the following single machine parallel batch scheduling problem: There are n given jobs. Each job has a processing time and a rejection penalty (Some job may be rejected for processing. Some rejection penalty is paid if a job is rejected). There is a single parallel batch machine that can process at most b jobs simultaneously. The jobs processed together form a batch. All jobs processed in the same batch have the same start time and the same completion time, that is, the start time plus the maximum processing time over all the jobs in the batch. We need to determine how to choose jobs for processing, divide these jobs into batches, and sequence these batches so that the sum of completion time of the processed jobs plus the sum of rejection penalties of the rejected ones is minimized. We give an O((b2-b)×n2b2-2b+2×bb2-b-1)-time dynamic programming algorithm, and thus conclude that this problem can be solved in polynomial time when the batch size b is fixed.
2013, 50(8): 1710-1721.
Abstract:
Aiming at the problem that the existing no-wait algorithm could only schedule the process with SIPP (sole immediate predecessor process) which limits the application of integrated scheduling algorithm, we have proposed the no-wait integrated scheduling algorithm based on reversed order signal-driven. This algorithm establishes device and scheduling subsystem, both subsystems are driven by the signals transmitted between them; For the NWPG (no-wait process group) with NSIPP (non-sole immediate predecessor process), we define the NWPG as a no-wait subtree and adopt the reversed order method to solve the no-wait problem top to down. In the case that there are several schedulable processes or schedulable process groups, we choose the process or process group by the BPCP (biggest parallelism chosen policy), which looks for the process or process group with longest child-node critical path; If the chosen one is a process, we lock the timespan of the process immediately; If the chosen one is a process group, we determine the locked timespans by the FGP (frontier greedy policy). Because we have adopted the reversed order method and the FGP, the NWPG could be scheduled independently, achieving the no-wait integrated scheduling with unlimited number of SIPP.
Aiming at the problem that the existing no-wait algorithm could only schedule the process with SIPP (sole immediate predecessor process) which limits the application of integrated scheduling algorithm, we have proposed the no-wait integrated scheduling algorithm based on reversed order signal-driven. This algorithm establishes device and scheduling subsystem, both subsystems are driven by the signals transmitted between them; For the NWPG (no-wait process group) with NSIPP (non-sole immediate predecessor process), we define the NWPG as a no-wait subtree and adopt the reversed order method to solve the no-wait problem top to down. In the case that there are several schedulable processes or schedulable process groups, we choose the process or process group by the BPCP (biggest parallelism chosen policy), which looks for the process or process group with longest child-node critical path; If the chosen one is a process, we lock the timespan of the process immediately; If the chosen one is a process group, we determine the locked timespans by the FGP (frontier greedy policy). Because we have adopted the reversed order method and the FGP, the NWPG could be scheduled independently, achieving the no-wait integrated scheduling with unlimited number of SIPP.
2013, 50(8): 1722-1727.
Abstract:
Along with the vigorous development of Web 2.0, lots of comments that represent the voices of customers appeared on the Internet, and the general public's sentiments toward products are increasingly influenced by the underlying viewpoints. Therefore mining the sentiment information from reviews would produce practical values for predicting sales performance and adjusting market strategy. Aiming at this problem, based on the result of the analysis on the characteristics of online book reviews, it proposes a sentiment analysis method. First, a polarity word dictionary is automatically constructed by the part of speech list and the prefix list. Afterwards the sentiments in the reviews can be extracted based on the polarity dictionary. Finally, the paper presents an ARES (autoregressive emotion-sensitive model), to utilize the emotion information acquired by the sentiment analysis method for predicting sales performance. Experiments are conducted on a book data set. By comparing the ARES with alternative models that do not take sentiment information into consideration, as well as a model with a different sentiment analysis method, the results, on the one hand, indicate that our sentiment analysis approach could generate a well summary of the review itself, and on the other hand, confirm the effectiveness of the proposed prediction model.
Along with the vigorous development of Web 2.0, lots of comments that represent the voices of customers appeared on the Internet, and the general public's sentiments toward products are increasingly influenced by the underlying viewpoints. Therefore mining the sentiment information from reviews would produce practical values for predicting sales performance and adjusting market strategy. Aiming at this problem, based on the result of the analysis on the characteristics of online book reviews, it proposes a sentiment analysis method. First, a polarity word dictionary is automatically constructed by the part of speech list and the prefix list. Afterwards the sentiments in the reviews can be extracted based on the polarity dictionary. Finally, the paper presents an ARES (autoregressive emotion-sensitive model), to utilize the emotion information acquired by the sentiment analysis method for predicting sales performance. Experiments are conducted on a book data set. By comparing the ARES with alternative models that do not take sentiment information into consideration, as well as a model with a different sentiment analysis method, the results, on the one hand, indicate that our sentiment analysis approach could generate a well summary of the review itself, and on the other hand, confirm the effectiveness of the proposed prediction model.
2013, 50(8): 1728-1736.
Abstract:
Sentence similarity computing plays an important role in many tasks of natural language processing. Recent approaches to sentence similarity computing have focused on word-level information without considering the semantic structural information; these methods based on the sentence structure are not generally desirable as they are severely affected by the incomplete description of sentence semantic. Hence, similarity computing isn't able to get better results. To solve this problem, this paper proposes a novel similarity computing approach based on Chinese FrameNet. The approach implements to measure the sentences' semantics similarity by multi-frame semantic parsing, importance measure of frames, similar match of frames, similarity computing between frames and so on. From the frame perspective, the multi-frame semantic parsing comprehensively describes sentences' semantics by identifying all the target words, choosing corresponding frames and labeling the frame elements. On that basis, the similarity result can be more accurate by distinguishing the different frames' importance in accordance with the semantic coverage area of the frame. In addition, by means of extracting the semantic core words of the frame element, the approach improves the precision of similarity among the frames of chunk form. The sentences which contain multiple target words are chosen as the corpus of the experiments. In contrast with traditional approaches, the results show that the proposed approach could achieve better similarity results.
Sentence similarity computing plays an important role in many tasks of natural language processing. Recent approaches to sentence similarity computing have focused on word-level information without considering the semantic structural information; these methods based on the sentence structure are not generally desirable as they are severely affected by the incomplete description of sentence semantic. Hence, similarity computing isn't able to get better results. To solve this problem, this paper proposes a novel similarity computing approach based on Chinese FrameNet. The approach implements to measure the sentences' semantics similarity by multi-frame semantic parsing, importance measure of frames, similar match of frames, similarity computing between frames and so on. From the frame perspective, the multi-frame semantic parsing comprehensively describes sentences' semantics by identifying all the target words, choosing corresponding frames and labeling the frame elements. On that basis, the similarity result can be more accurate by distinguishing the different frames' importance in accordance with the semantic coverage area of the frame. In addition, by means of extracting the semantic core words of the frame element, the approach improves the precision of similarity among the frames of chunk form. The sentences which contain multiple target words are chosen as the corpus of the experiments. In contrast with traditional approaches, the results show that the proposed approach could achieve better similarity results.
2013, 50(8): 1737-1743.
Abstract:
Query expansion is one of the important steps in information retrieval. Many studies have shown its effectiveness in improving retrieval performance. Most of the query expansion methods analyze each expansion term separately and select several best terms as the query expansion. However, several studies have shown that combining several individually best terms can not guarantee the best query expansion. In this paper, the impacts of term combinations are considered and the result shows that the retrieval performance is impacted largely by them. To address this problem, this paper tries to select expansion terms as a whole instead of one by one. The query expansion task is formulated as an integer linear programming problem, which is a global optimization problem. In this model, the weights of each candidate term are learned by a supervised method, and a few constraints are added to capture the relation among terms. By solving this global optimization problem, the best query expansion can be found. Finally, the experiment on three TREC collections shows that compared with the baseline query expansion method, the retrieval effectiveness can be greatly improved by selecting expansion terms as a set via integer linear programming.
Query expansion is one of the important steps in information retrieval. Many studies have shown its effectiveness in improving retrieval performance. Most of the query expansion methods analyze each expansion term separately and select several best terms as the query expansion. However, several studies have shown that combining several individually best terms can not guarantee the best query expansion. In this paper, the impacts of term combinations are considered and the result shows that the retrieval performance is impacted largely by them. To address this problem, this paper tries to select expansion terms as a whole instead of one by one. The query expansion task is formulated as an integer linear programming problem, which is a global optimization problem. In this model, the weights of each candidate term are learned by a supervised method, and a few constraints are added to capture the relation among terms. By solving this global optimization problem, the best query expansion can be found. Finally, the experiment on three TREC collections shows that compared with the baseline query expansion method, the retrieval effectiveness can be greatly improved by selecting expansion terms as a set via integer linear programming.
2013, 50(8): 1744-1754.
Abstract:
MPI Alltoall is an important collective operation. In multicore clusters, many processes run in a node. On the one hand, shared memory can be adopted to optimize Alltoall communications of small messages by leader-based schemes. However, as these schemes adopt a fixed number of leader processes, the optimal performance can't be obtained for all small messages. On the other hand, processes within a node contend for the same network resource. In Alltoall communications of large messages, many synchronization messages are used. Nevertheless, the contention makes their latency increase many times and the synchronization overhead can't be ingored. To solve these problems, two optimizations are presented. For small messages, the PLP method adopts changeable numbers of leader processes. For large messages, the LSS method reduces the number of synchronization messages from 3N to 2N. The evaluations prove two methods. For small messages, the PLP method always obtains optimal performance. For large messages, the LSS method brings almost constant improvement percentage. The performance is improved by 25% for 32KB and 64KB messages.
MPI Alltoall is an important collective operation. In multicore clusters, many processes run in a node. On the one hand, shared memory can be adopted to optimize Alltoall communications of small messages by leader-based schemes. However, as these schemes adopt a fixed number of leader processes, the optimal performance can't be obtained for all small messages. On the other hand, processes within a node contend for the same network resource. In Alltoall communications of large messages, many synchronization messages are used. Nevertheless, the contention makes their latency increase many times and the synchronization overhead can't be ingored. To solve these problems, two optimizations are presented. For small messages, the PLP method adopts changeable numbers of leader processes. For large messages, the LSS method reduces the number of synchronization messages from 3N to 2N. The evaluations prove two methods. For small messages, the PLP method always obtains optimal performance. For large messages, the LSS method brings almost constant improvement percentage. The performance is improved by 25% for 32KB and 64KB messages.
2013, 50(8): 1755-1761.
Abstract:
As the scale of HPC systems and parallel applications keep increasing, time of large-scale parallel job startup cannot be ignored anymore. Various efforts have been made to improve the performance of program launching and runtime environment initialization. The experiences and results of starting MPI jobs, on Tianhe-1A supercomputer system are presented. Detailed study of the time costs of job startup in different stages, including control message transferring, file access, and MPI environment initialization, shows that for large scale MPI jobs, the environment initialization time dominates the job startup time. Based on this discovery, some preliminary optimization work has been done to reduce the data exchanged during MPI environment initialization and avoid unnecessary data transfer costs. The optimization improves the job startup performance notably. An optimizing process management design with hierarchical structure for MPI environment initialization is proposed to further improve the scalability of job startup. For completeness, we also compare and analyze the job start time of other process management mechanism in main-stream MPI implementations.
As the scale of HPC systems and parallel applications keep increasing, time of large-scale parallel job startup cannot be ignored anymore. Various efforts have been made to improve the performance of program launching and runtime environment initialization. The experiences and results of starting MPI jobs, on Tianhe-1A supercomputer system are presented. Detailed study of the time costs of job startup in different stages, including control message transferring, file access, and MPI environment initialization, shows that for large scale MPI jobs, the environment initialization time dominates the job startup time. Based on this discovery, some preliminary optimization work has been done to reduce the data exchanged during MPI environment initialization and avoid unnecessary data transfer costs. The optimization improves the job startup performance notably. An optimizing process management design with hierarchical structure for MPI environment initialization is proposed to further improve the scalability of job startup. For completeness, we also compare and analyze the job start time of other process management mechanism in main-stream MPI implementations.
2013, 50(8): 1762-1768.
Abstract:
In parallel computational fluid dynamics ( CFD ) applications with multi-block structured grids , to exploit fully the available resources of CPUs and / or memories for parallel computing platform , it is a common approach to repartition the original grids to obtain more sub-blocks , each with smaller volume.Two strategies of grid repartitioning are proposed for multi-block structured grids CFD application in this paper based on the investigation of existing approaches.Furthermore , a software tool named TH-MeshSplit is developed to facilitate the process of grid repartitioning , in which a simple heuristic strategy is used to solve the combinatorial optimization for the optimal splitting and yield a near-optimal solution for the problem.TH-MeshSplit can run in three modes : fully-automatic mode , half-automatic mode and manual mode , which provides user the freedom of choices and shows a good flexibility of the tool.Some numerical experiment results were reported.By evaluating the quality of grids according to several criteria , it is shown that a resulting grid with high quality can be achieved after the repartitioning and it thus confirms the validity of our grid repartitioning approach.
In parallel computational fluid dynamics ( CFD ) applications with multi-block structured grids , to exploit fully the available resources of CPUs and / or memories for parallel computing platform , it is a common approach to repartition the original grids to obtain more sub-blocks , each with smaller volume.Two strategies of grid repartitioning are proposed for multi-block structured grids CFD application in this paper based on the investigation of existing approaches.Furthermore , a software tool named TH-MeshSplit is developed to facilitate the process of grid repartitioning , in which a simple heuristic strategy is used to solve the combinatorial optimization for the optimal splitting and yield a near-optimal solution for the problem.TH-MeshSplit can run in three modes : fully-automatic mode , half-automatic mode and manual mode , which provides user the freedom of choices and shows a good flexibility of the tool.Some numerical experiment results were reported.By evaluating the quality of grids according to several criteria , it is shown that a resulting grid with high quality can be achieved after the repartitioning and it thus confirms the validity of our grid repartitioning approach.
2013, 50(8): 1769-1777.
Abstract:
The change impact analysis can provide some useful information to software developers and managers during software development process, so it is significantly important. Aspect-oriented document-driven requirements engineering starts from textual requirements and provides a way and process of modularizing requirements in the forms of concerns and their relationships, identifying crosscutting concerns and creating formal requirements documentation. An approach to requirements change impact analysis is given under the framework of aspect-oriented document-driven requirements engineering approach, where concerns and variables relationships have been identifies with the usage of matrix, the impact scope of concerns and variables has been calculated, the impact weights of concerns and variables has been quantified. The effectiveness of our approach has been indicated with a case study.
The change impact analysis can provide some useful information to software developers and managers during software development process, so it is significantly important. Aspect-oriented document-driven requirements engineering starts from textual requirements and provides a way and process of modularizing requirements in the forms of concerns and their relationships, identifying crosscutting concerns and creating formal requirements documentation. An approach to requirements change impact analysis is given under the framework of aspect-oriented document-driven requirements engineering approach, where concerns and variables relationships have been identifies with the usage of matrix, the impact scope of concerns and variables has been calculated, the impact weights of concerns and variables has been quantified. The effectiveness of our approach has been indicated with a case study.
2013, 50(8): 1778-1786.
Abstract:
In recent years, as an important branch of remote sensing image processing technology, remote sensing image fusion technologies have been widely applied, especially in the fields of resource exploration, environmental monitoring, region analysis and so on. The techniques can fuse different images from different sensors to an image which has complete information and accurate expression. Contourlet transform is comprehensively concerned in the discipline of remote sensing image processing for its excellent characteristics such as non-linear approximation, multi-resolution, time-frequency localization, multi-directional and anisotropy. In this paper, combining the directional characteristics of Contourlet transform, we analyze the correlativity attribute and propose a novel image fusion algorithm for remote sensing images based on Contourlet coefficients' correlativity. Firstly, we separately perform Contourlet transform on the intensity component of multi-spectral remote sensing image obtained by IHS transform, and panchromatic remote sensing image. Secondly, we propose the fusion priciple of self-adaption calculating fuesd weighting coefficients. Finally, the target image is obtained by reverse Contourlet transform and reverse IHS transform. Compared with the traditional fusion methods, our algorithm can enhance the spatial resolution of target image. Meanwhile, it preserves the spectral information of multi-spectral image well.
In recent years, as an important branch of remote sensing image processing technology, remote sensing image fusion technologies have been widely applied, especially in the fields of resource exploration, environmental monitoring, region analysis and so on. The techniques can fuse different images from different sensors to an image which has complete information and accurate expression. Contourlet transform is comprehensively concerned in the discipline of remote sensing image processing for its excellent characteristics such as non-linear approximation, multi-resolution, time-frequency localization, multi-directional and anisotropy. In this paper, combining the directional characteristics of Contourlet transform, we analyze the correlativity attribute and propose a novel image fusion algorithm for remote sensing images based on Contourlet coefficients' correlativity. Firstly, we separately perform Contourlet transform on the intensity component of multi-spectral remote sensing image obtained by IHS transform, and panchromatic remote sensing image. Secondly, we propose the fusion priciple of self-adaption calculating fuesd weighting coefficients. Finally, the target image is obtained by reverse Contourlet transform and reverse IHS transform. Compared with the traditional fusion methods, our algorithm can enhance the spatial resolution of target image. Meanwhile, it preserves the spectral information of multi-spectral image well.
2013, 50(8): 1787-1796.
Abstract:
To solve the problem that different viewing conditions results in different color appearance between a reproduced image and its original one during high dynamic range(HDR) image reproduction, a new algorithm based on color appearance mapping for HDR image reproduction is presented. Firstly, the chrominance and luminance of the HDR image are processed separately, through the estimation of the original environment parameters, the HDR image is reproduced under the display environment by a color appearance model, which preserves the chromatic appearance across scene and display environments. Then the luminance image is regionalized adaptively according to the histogram characteristics, and a piecewise linear tone scale function is constructed to allocate the range of display luminance for different regions dynamically, which raises perceptual contrast of the image. Meanwhile, the details of the luminance image is extracted through bilateral filtering technology in order to maintain details. Finally, the processed chromatic and achromatic image is synthesized and the change of color appearance affected by luminance compressing is corrected, and then the reproduced low dynamic range image is obtained, which is mapped in color appearance with the original HDR image. Experiments show that the proposed algorithm gains advantages over the traditional ones in color appearance maintaining, dynamic range compressing, and the performance of details.
To solve the problem that different viewing conditions results in different color appearance between a reproduced image and its original one during high dynamic range(HDR) image reproduction, a new algorithm based on color appearance mapping for HDR image reproduction is presented. Firstly, the chrominance and luminance of the HDR image are processed separately, through the estimation of the original environment parameters, the HDR image is reproduced under the display environment by a color appearance model, which preserves the chromatic appearance across scene and display environments. Then the luminance image is regionalized adaptively according to the histogram characteristics, and a piecewise linear tone scale function is constructed to allocate the range of display luminance for different regions dynamically, which raises perceptual contrast of the image. Meanwhile, the details of the luminance image is extracted through bilateral filtering technology in order to maintain details. Finally, the processed chromatic and achromatic image is synthesized and the change of color appearance affected by luminance compressing is corrected, and then the reproduced low dynamic range image is obtained, which is mapped in color appearance with the original HDR image. Experiments show that the proposed algorithm gains advantages over the traditional ones in color appearance maintaining, dynamic range compressing, and the performance of details.