ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 June 2015, Volume 52 Issue 6
Cache Optimization Approaches of Emerging Non-Volatile Memory Architecture: A Survey
He Yanxiang, Shen Fanfan, Zhang Jun, Jiang Nan, Li Qing’an, Li Jianhua
2015, 52(6):  1225-1241.  doi:10.7544/issn1000-1239.2015.20150104
Asbtract ( 1382 )   HTML ( 4)   PDF (2019KB) ( 1264 )  
Related Articles | Metrics
With the development of semiconductor technology and CMOS scaling, the size of on-chip cache memory is gradually increasing in modern processor design. The density of traditional static RAM (SRAM) has been close to the limit. Moreover, SRAM consumes a large amount of leakage power which severely affects system performance. Therefore, how to design efficient on-chip storage architecture has become more and more challenging. To address these issues, researchers have discussed a large number of emerging non-volatile memory (NVM) technologies which have shown attractive features, such as non-volatile, low leakage power and high density. In order to explore cache optimization approaches based on emerging non-volatile memory including spin-transfer torque RAM (STT-RAM), phase change memory (PCM), resistive RAM (RRAM) and domain-wall memory (DWM), this paper surveys the property of non-volatile memory compared with traditional memory devices. Then, the advantages, disadvantages and feasibility of architecting caches are discussed. To highlight their differences and similarities, a detailed analysis is then conducted to classify and summarize the cache optimization approaches and policies. These key technologies are trying to solve the high write power, limited write endurance and long write latency of emerging non-volatile memory. Finally, the potential research prospect of emerging non-volatile memory in future storage architecture is discussed.
Directory Cache Design for Multi-Core Processor
Wang Endong, Tang Shibin, Chen Jicheng, Wang Hongwei, Ni Fan, Zhao Yaqian
2015, 52(6):  1242-1253.  doi:10.7544/issn1000-1239.2015.20150140
Asbtract ( 821 )   HTML ( 6)   PDF (3171KB) ( 761 )  
Related Articles | Metrics
With the development of Internet of things, cloud computing and Internet public opinion analysis, big data applications are growing into the critical workloads in current data center. Directory cache is used to guarantee cache coherence in chip multi-processor, which is massively deployed in data centers. Previous researches proposed all kinds of innovation to improve the utilization of directory cache capacity and scalability, making it more suitable for high-performance computing. Big data workloads are timing sensitive, which is not satisfied by previous works. To meet the requirement of big data workloads, master-salve directory is a novel directory cache design, which can optimize the path of memory instruction. In the novel directory cache design, master directory picks up private data accesses and provides services for them to reduce miss-latency, and slave directory provides cache coherence for shared memory space to improve the utilization of cache capacity and the scalability of chip multi-processor. Our experiment benchmark is CloudSuite-v1.0, running on Simics+GEMS simulator. Compared with sparse directory with 2×capacity, the experimental results show that master-slave directory can reduce hardware overhead by 24.39%, and reduce miss-latency by 28.45%, and improve IPC by 3.5%. Compared with in-cache directory, the results show that master-slave directory sacrifices 5.14% miss-latency and 1.1% IPC, but reduces hardware overhead by 42.59%.
MACT: Discrete Memory Access Requests Batch Processing Mechanism for High-Throughput Many-Core Processor
Li Wenming, Ye Xiaochun, Wang Da, Zheng Fang, Li Hongliang, Lin Han, Fan Dongrui, Sun Ninghui
2015, 52(6):  1254-1265.  doi:10.7544/issn1000-1239.2015.20150154
Asbtract ( 938 )   HTML ( 2)   PDF (5554KB) ( 744 )  
Related Articles | Metrics
The rapid development of new high-throughput applications, such as Web services, brings huge challenges to traditional processors which target at high-performance applications. High-throughput many-core processors, as new processors, become hotspot for high-throughput applications. However, with the dramatic increase in the number of on chip cores, combined with the property of memory intensive of high throughput applications, the “memory wall” problems have intensified. After analyzing the memory access behavior of high throughput applications, it is found out that there are a large proportion of fine-grained granularity memory accesses which degrade the efficiency of bandwidth utilization and cause unnecessary energy consumption. Based on this observation, in high-throughput many-core processors design, memory access collection table (MACT) is implemented to collect discrete memory access requests and to handle them in batch under deadline constraint. Using MACT hardware mechanism, both bandwidth utilization and execution efficiency have been improved. QoS is also guaranteed by employing time-window mechanism, which insures that all the requests can be sent before the deadline. WordCount, TeraSort and Search are typical high-throughput application benchmarks which are used in experiments. The experimental results show that MACT reduces the number of memory accesses requests by 49% and improves bandwidth efficiency by 24%, and the average execution speed is improved by 89%.
A Trace-Driven Simulation of Memory System in Multithread Applications
Zhu Pengfei, Lu Tianyue, Chen Mingyu
2015, 52(6):  1266-1277.  doi:10.7544/issn1000-1239.2015.20150160
Asbtract ( 921 )   HTML ( 0)   PDF (3681KB) ( 648 )  
Related Articles | Metrics
Nowadays, chip-multiprocessors (CMPs) become significantly important for multithread applications due to their high-throughput performance in big data computing. But growing latency to memory is increasingly impacting system performance because of memory wall. Two independent simulation methods: trace-driven and execution-driven, are available for system researchers to study and evaluate the memory system. On one hand, in order to leverage simulation speed, researchers employ trace-driven simulation because it removes data processing and is faster than execution-driven counterpart. On the other hand, lack of data processing induces both global and local trace misplacements, which never exist in multithread applications on real machine. Through analytical modeling, remarkable performance metrics variations are observed due to trace misplacements. Basically speaking, the reasons are in trace-driven simulation: 1)locks do not prevent threads from non-exclusively entering critical range; 2)barriers do not synchronize threads as need; 3)the dependence among memory operations is violated. In order to improve memory system simulation accuracy in multithread applications, a methodology is designed to eliminate both global and local trace misplacement in trace-driven simulation. As shown in experiments, eliminating global trace misplacement of memory operation induces up to 10.22% reduction in various IPC metrics, while eliminating local trace misplacement of memory operation induces at least 50% reduction in arithmetic mean of IPC metrics. The proposed methodology ensures multithread application’s invariability in trace-driven simulation.
A Data Deduplication-Based Primary Storage System in Cloud-of-Clouds
Mao Bo, Ye Geyan, Lan Yanjia, Zhang Yangsong, Wu Suzhen
2015, 52(6):  1278-1287.  doi:10.7544/issn1000-1239.2015.20150139
Asbtract ( 1033 )   HTML ( 1)   PDF (3517KB) ( 839 )  
Related Articles | Metrics
With the rapid development of cloud storage technology, more and more companies are beginning to upload data to the cloud storage platform. However, solely depending on the particular cloud storage provider has a number of potentially serious problems, such as vendor lock-in, availability, and security issues. To address the problems, we propose a deduplication-based primary storage system in cloud-of-clouds in this paper by eliminating the redundant data block in the cloud computing environment and distributing the data among multiple independent cloud storage providers. The data is stored in multiple cloud storage providers by combining the replication and erasure code schemes. The replication way is easy to implement and deploy but has high storage overhead. The storage overhead of erasure code is small, but it requires computational overhead for encode and decode operations. To better utilize the advantages of both replication and erasure code schemes and to exploit the reference characteristics in data deduplication, the high referenced data blocks are stored with replication scheme and the other data blocks are stored with erasure code scheme. The experiments conducted on our lightweight prototype implementation of new system show that the deduplication-based primary storage system in cloud-of-clouds improves the performance and cost efficiency significantly than the existing schemes.
A Heterogeneous Cloud Computing Architecture and Multi-Resource-Joint Fairness Allocation Strategy
Wang Jinhai, Huang Chuanhe, Wang Jing, He Kai, Shi Jiaoli, Chen Xi
2015, 52(6):  1288-1302.  doi:10.7544/issn1000-1239.2015.20150168
Asbtract ( 924 )   HTML ( 1)   PDF (4971KB) ( 1118 )  
Related Articles | Metrics
Resource allocation strategies are an important research hotspot about cloud computing at present. The most fundamental problem is how to fairly allocate the finite amount of resources to multiple users or applications in complex application under heterogeneous cloud computing architecture, at the same time, to achieve maximize resource utilization or efficiency. However, tasks or users are often greedy for classical resource allocation problems, therefore, under the condition of finite amount of resource, the fairness of resource allocation is particularly important. To meet different task requirements and achieve multiple types resource fairness, we design a heterogeneous cloud computing architecture and present an algorithm of maximizing multi-resource fairness based on dominant resource(MDRF). We further prove the related attributions of our algorithm such as Pareto efficiency, and give the definition of dominant resource entropy (DRE) and dominant resource weight (DRW). DRE accurately depicts the adaption degree between the resource requirement of user and the resource type of server allocated for user tasks, and makes the system more adaptive and improves the system resource utilization. DRW guarantees the priority of users obtaining resource when cooperating with the adopted Max-Min strategy guaranteeing fairness, and makes the system resource allocation more ordered. Experimental results demonstrate that our strategy has more higher resource utilization and makes resource requirements and resource provision more matching. Furthermore, our algorithm makes users achieve more dominant resource and improves the quality of service.
EOFDM: A Search Method for Energy-Efficient Optimization in Many-Core Architecture
Zhu Yatao, Zhang Shuai, Wang Da, Ye Xiaochun, Zhang Yang, Hu Jiuchuan, Zhang Zhimin, Fan Dongrui, Li Hongliang
2015, 52(6):  1303-1315.  doi:10.7544/issn1000-1239.2015.20150153
Asbtract ( 681 )   HTML ( 0)   PDF (4170KB) ( 664 )  
Related Articles | Metrics
Based on the optimization of energy consumption, “area-power” assignment is one of research issues in many-core processors. The distribution of area-power in space of core number and frequency level can be obtained form energy-performance model. Then the progressive search for optimal solutions of “core number and frequency level” configuration can be implemented in two dimensions. However, the existing methods of searching for energy-efficient optimization have slow convergence speed and great overhead of search in the space of core number and frequency level. Moreover, though searching for optimal core number and frequency level in the space composed by an analytical energy-performance model can reduce the overhead of real execution, the accuracy of optimal solution greatly depends on the misprediction of the model. Therefore, a search method based on FDM(EOFDM) is developed to reduce the dimensions of core number and frequency, and to involve the real energy and the performance of each feasible point to correct the model computation. The experimental results show that, compared with hill-climbing heuristic(HCH) in the execution times, the performance overhead and the energy overhead, our method makes an average reduction by 39.5%, 46.8%, 48.3%, and 48.8%, 51.6%, 50.9% in doubling the number of cores, and 45.5%, 49.8%, 54.4% in doubling the number of frequency levels. Our method is improved in convergence, search cost and scalability.
Lightweight Error Recovery Techniques of Many-Core Processor in High Performance Computing
Zheng Fang, Shen Li, Li Hongliang, Xie Xianghui
2015, 52(6):  1316-1328.  doi:10.7544/issn1000-1239.2015.20150119
Asbtract ( 850 )   HTML ( 0)   PDF (3340KB) ( 700 )  
Related Articles | Metrics
Due to the advances in semiconductor techniques, many-core processors with a large number of cores have been widely used in high-performance computing. Compared with multi-core processors, many-core processors can provide higher computing density and ratio of computation to power consumption. However, many-core processors must design more efficient fault tolerance mechanism to solve the serious reliability problem and alleviate performance degradation, while the cost of chip area and power must be low. In this paper, we present a prototype of home-grown many-core processor DFMC(deeply fused and heterogeneous many-core). Referring to the processor’s architecture and the applications related to the characters among cores, independent and coordinated lightweight error recovery techniques are proposed. When errors are detected, the related cores can roll back to consistent recovery line quickly by coordinated error recovery technique which is controlled by centralized unit and connected by coordinated recovery bus. To guarantee the applications’ performance, error recovery techniques are performed by instructions and recovery states are saved in cores. Our experimental results show that the effect of the techniques is significant, and the transient errors can be corrected by 80% with the chip area increased by 1.257%. The influences of lightweight error recovery techniques on applications performance, chip frequency and chip power consumption are very little. The techniques can improve the fault tolerant ability of the many-core processor.
Paleyfly: A Scalable Topology in High Performance Interconnection Network
Lei Fei, Dong Dezun, Pang Zhengbin, Liao Xiangke, Yang Mingying
2015, 52(6):  1329-1340.  doi:10.7544/issn1000-1239.2015.20150162
Asbtract ( 1092 )   HTML ( 0)   PDF (4758KB) ( 910 )  
Related Articles | Metrics
High performance interconnection network is one of the most important parts in high performance computing system. How to design the topology of interconnection networks is the key point for the development of larger scale networks. Therefore, we contribute a new hierarchical topology structure Paleyfly (PF), which not only utilizes the property of strong regular graph with Paley graph but also supports the continued scale like Random Regular (RR) graph. Compared with other new high performance interconnection networks, Paleyfly can solve the problems of the scalability of Dragonfly (DF), the physical cost of Fat tree (Ft), the wiring complexity and the storage for routing table of Random Regular and so on. Meanwhile, according to the property of strong regular graph for load-balanced routing algorithm, we propose four routing algorithms to deal with congestion. Finally, through the simulation we briefly analyze the performance of Paleyfly comparing with other kinds of topologies and different routing algorithms. Experimental results show that our topology can achieve better effect compared with Random Regular under the various scales of network and different traffic patterns.
Ant Cluster: A Novel High-Efficiency Multipurpose Computing Platform
Xie Xianghui, Qian Lei, Wu Dong, Yuan Hao, Li Xiang
2015, 52(6):  1341-1350.  doi:10.7544/issn1000-1239.2015.20150201
Asbtract ( 946 )   HTML ( 1)   PDF (3407KB) ( 761 )  
Related Articles | Metrics
Driven by the demands of scientific computing and big data processing, high performance computers in the world have been more powerful and the system scales have been larger than ever before. However, the power consumption of the whole system is becoming a severe bottleneck in the further improvement of performance. In this paper, after analyzing four types of HPC systems deeply, we propose and study two key technologies which include reconfigurable micro server (RMS) technology and cluster constructing technology with the combination of node autonomy and node cooperation. RMS technology provides a new way to make the performance, the power consumption and the size of computing nodes in balance. By combining the node autonomy and the node cooperation, a large amount of small-sized computing nodes can be aggregated to be a scalable RMS cluster. Based on these technologies, we propose a new high-efficiency multipurpose computing platform architecture called Ant Cluster and construct a prototype system which consists of 2,048 low-power ant-like small-sized computing nodes. On this cluster, we implement two actual applications. The test results show that, for real-time large-scale fingerprint matching, single RMS node can achieve 34 times speed-up compared with single Inter Xeon core and the power consumption is only 5W. The whole prototype system supports processing hundreds of queries on a database of 10 million fingerprints in real time. For data sorting, our prototype system achieves 10 times more performance per watt than GPU platform and obtains higher efficiency.
Mitigating Log Cost through Non-Volatile Memory and Checkpoint Optimization
Wan Hu, Xu Yuanchao, Yan Junfeng, Sun Fengyun, Zhang Weigong
2015, 52(6):  1351-1361.  doi:10.7544/issn1000-1239.2015.20150171
Asbtract ( 877 )   HTML ( 0)   PDF (4980KB) ( 834 )  
Related Articles | Metrics
The sudden power failure or system crash can result in file system inconsistency upon updating permanent user data or metadata to their home locations in disk layout, an issue known as crash-consistency problem. Most existing file systems leverage some kind of consistency techniques such as write-ahead logging(WAL), copy-on-write(COW) to avoid this situation. Ext4 file system ensures the consistency of persistent operations through transaction as well as journaling mechanism. However, it is required to write file system metadata to disk twice. The metadata has the features with small granularity, big quantity and high repetition, which degrades the performance of program and also shortens the lifetime of flash-based SSD. This paper is proposed to employ non-volatile memory(NVM) as an independent log partition, which can be accessed through load/store interface directly. Furthermore, we optimize disk write operations by using reverse scan while checkpointing in order to reduce the repeated metadata updates to the same data block. The preliminary experimental results show that the performance can be improved up to 50% on HDD, and 23% on SSD for heavy-write workloads when using NVM as the external journal partition device and the number of write operations can be reduced significantly after using reverse scan checkpoint technique.
Elastic Mobile Cloud Computing:State of the Art and Security Analysis
Li Pengwei,Fu Jianming,Li Shuanbao, Lü Shaoqing, Sha Letian
2015, 52(6):  1362-1377.  doi:10.7544/issn1000-1239.2015.20140227
Asbtract ( 948 )   HTML ( 0)   PDF (2823KB) ( 714 )  
Related Articles | Metrics
Elastic mobile cloud computing (EMCC) enables the seamless and transparent use of cloud resource to augment the capability of mobile devices by off-loading parts of mobile devices’ tasks to cloud according to the real-time user requirement. By summarizing the service providing models of EMCC, we divide existing EMCC models into two categories: computing migration-mobile cloud computing(CM-MCC), in which the mobile devices employ the cloud to perform the parts of their computing intensive tasks; and cloud agent-mobile cloud computing(CA-MCC), in which the cloud maintains one or more virtual mobile devices for each mobile device and these virtual mobile devices are synchronized with the mobile device to complete various tasks such as computing, storage, security and communication instead of the mobile device. Then the applicable scenarios, the implementation method, the key issues, and the future research of EMCC models in each category are studied. After that, we analyze the critical security threats of EMCC, including users’ error operations or malicious actions, malicious applications, communications security problems, and cloud computing security issues such as the vulnerabilities of virtual system, multi-tenant problems, malicious cloud service providers. The corresponding defenses of these threats are discussed. At last, we point out that security is a key issue for EMCC.
Survey on Homomorphic Encryption and Its Applications to Cloud Security
Li Shundong, Dou Jiawei, Wang Daoshun
2015, 52(6):  1378-1388.  doi:10.7544/issn1000-1239.2015.20131494
Asbtract ( 1424 )   HTML ( 3)   PDF (1096KB) ( 2643 )  
Related Articles | Metrics
Cloud service mode has great economical and technical advantages and wide application prospects. The popularization of the cloud service is significant to both the informationization and the development of China. Cloud security is the most serious challenge in the generalization and the applications of the cloud service. Homomorphic encryption schemes, especially fully ones, are the most important technology to solve the security problem arising in cloud service, and a focus in the international cryptographic community. In this paper, we summarize the state of the art of the homomorphic encryption research, introduce the applications of the homomorphic encryption to the protection of the data confidentiality in cloud computing and to other fields, analyze the merits and the faults of various algebraic somewhat homomorphic encryption schemes and of fully homomorphic encryption schemes based on circuits, point out some open problems and new directions in the fully homomorphic encryption research, and briefly introduce the concept of secure plaintext computing, its advantages over cipher-text computing and some problems that need further studying.
Application of a Circular Secure Variant of LWE in the Homomorphic Encryption
Yang Xiaoyuan, Zhou Tanping, Zhang Wei, Wu Liqiang
2015, 52(6):  1389-1393.  doi:10.7544/issn1000-1239.2015.20131952
Asbtract ( 861 )   HTML ( 2)   PDF (607KB) ( 981 )  
Related Articles | Metrics
Homomorphic encryption scheme is a powerful cryptographic system which allows for a variety of applications. Fully homomorphic encryption(FHE) permits arbitrary computations on encrypted data. The recent breakthrough work in 2009 by Craig Gentry has shown the possibility of FHE schemes, and has provided the first construction. Consequently, during the past five years, numerous FHE involving novel mathematical techniques and a number of application schemes have appeared. Indeed, the construction and application of homomorphic encryption schemes have great theoretic and practical meaning. Homomorphic encryption has important applications in cloud computing. However, almost all of the homomorphic encryption schemes share two common flaws that the multiplication depth must be set in advance and they all use secret keys of large scales. We construct a circularly secure re-linearization process based on the “special b” variant of the learning with errors problem(bLWE). Then, we present an efficient homomorphic encryption scheme. Compared with Brakerski et al’s scheme, our scheme reduces the L+1 secret keys to one and doesn’t need to know the multiplication depth in advance. Finally, we prove the chosen-plaintext attack(CPA) security of the homomorphic scheme and the circular security of the re-linearization process in standard model by reducing them into learning with errors problem(LWE) assumption.
Pseudorandom Number Generators Based on One-Way Functions
Gao Shujing, Qu Yingjie, Song Tingqiang
2015, 52(6):  1394-1399.  doi:10.7544/issn1000-1239.2015.20131954
Asbtract ( 1036 )   HTML ( 4)   PDF (863KB) ( 893 )  
Related Articles | Metrics
Pseudorandom number generators (referred as PRNG) is an important cryptographic primitive that was first introduced and formalized as BMY generator in 1982. The PRNG based on one-way functions is constructed by iterating a one-way function (OWF) on a random seed and generating pseudorandom sequences periodically. The seed length and the property of the one-way function are two important factors of this kind PRNG, which measure the efficiency and the security of the PRNG. The security of the latest PRNG of this type relies on one-way function of length preserving or one-way permutation that is hard to be obtained. This paper revisits the current randomized iteration technique and makes improvement on the iteration process by expanding the outputs of one-way function. The new technique, which is called expanded randomized iteration, eliminates the length preserving property of the one-way function. On the basis of the expanded randomized iteration, our construction uses the general compression regular one-way function and universal hash function as the main components. In the BMY case, a hardcore-bit of each iteration step is taken as the output of the pseudorandom sequence. Our scheme adopts the similar structure as the current ones but relaxes the requirement of the property of the one-way function, reduces the seed length and improves the efficiency. Finally, the security of the iteration is proved irreversible and the security of the proposed pseudorandom generator is proved undistinguishable from the real random sequence.
A Communication Aware DAG Workflow Cost Optimization Model and Algorithm
Guo He, Chen Zheng, Yu Yulong, Wang Yuxin, Chen Xin
2015, 52(6):  1400-1408.  doi:10.7544/issn1000-1239.2015.20140205
Asbtract ( 929 )   HTML ( 1)   PDF (1976KB) ( 719 )  
Related Articles | Metrics
Communication overhead can not be neglected in cloud environment. However, without considering communication overhead among tasks, a cost optimization model of DAG(directed acyclic graph) workflow is difficult to apply in the actually cloud environment. Therefore, this paper puts forward a cost optimization model of DAG workflow with communication overhead. In addition, based on the hierarchical algorithm, which distributes the tasks into groups based on levels and schedules them by level, the paper proposes a cost optimization awared communication algorithm (CACO). CACO uses the forward consistent (FC) rules to solve the minimum completion time of the workflow. Also, by using the bottom hierarchical strategy to divide the task into separated layers, CACO transfers the cost optimization problem from the whole to the part. Furthermore, in order to increase the space of cost optimization and improve the results, CACO adopts dynamic programming method to collect discrete “time pieces” that is produced during the selecting services. The simulation results show that, compared with DTL(deadline top level),DBL(deadline bottom level),TCDBL(temporal consistency deadline bottom level), CACO has greatly enhanced the cost optimization effect considering communication overhead.
A Terrain Skeleton Feature Extraction Method Based on Morphological Encoding
Zhang Huijie, Liu Yaxin, Ma Zhiqiang, He Xinting, Bao Ning
2015, 52(6):  1409-1423.  doi:10.7544/issn1000-1239.2015.20131422
Asbtract ( 771 )   HTML ( 3)   PDF (7474KB) ( 716 )  
Related Articles | Metrics
Since the current profile recognition methods are not able to extract the precise skeleton and the special terrain feature, a new profile recognition method combined with morphology is proposed to solve these problems. In this method, the candidate points are extracted by the profile recognition, and then are connected into polygon stripe according to direction coefficients. Furthermore, the fulfill algorithms building the scale feature areas are put forward based on the polygon stripe by taking advantage of morphological strategy. In addition, multiple morphological codes are proposed to simplify the scale feature areas according to the concept of morphological erosion algorithm, and to obtain the scale feature lines. In order to satisfy the requirements about vector skeleton features in the various fields, the algorithms including restoration, detection and optimization are proposed to implement the transformation from the scale model to the vector model. Finally, a series of strategies about keeping the out-branches and recognizing the ring process are presented in this paper, which solve the problems about missing the long trunk lines and ring features in the result feature lines. These methods have been tested on the benchmark data and the real elevation data. As a result, the skeleton feature lines produced by our method outweigh the traditional method as a whole.
Night Color Image Enhancement via Optimization of Purpose and Improved Histogram Equalization
Zhao Huaxia, Yu Jing, Xiao Chuangbai
2015, 52(6):  1424-1430.  doi:10.7544/issn1000-1239.2015.20140067
Asbtract ( 944 )   HTML ( 2)   PDF (3449KB) ( 683 )  
Related Articles | Metrics
Due to the uneven distribution of light at night, the quality of night color image is usually poor, such as low image contrast, low brightness and less texture. Most of existing night color image enhancement algorithms can’t preserve the details and eliminate the “halo effect” at the edge areas of high contrast in the nighttime image processing. To solve these problems, we propose an image enhancement algorithm based on purposeful optimization and improved histogram equalization. The process of the algorithm is conducted in the luminance channel of the HSV color space: 1)enhance the contrast of the source image and reserve details furthest through improving the image gradient values using the method of optimization; 2)enhance the image by the improved histogram equalization which increases the probability of pixel values of small probability; 3)enhance the image brightness through gamma correction. Subjective and objective evaluation shows that our algorithm greatly enhances the image contrast and brightness, recovers the image details, and eliminates the “halo effect” efficiently. Experiments on the different nighttime images demonstrate the effectiveness of our approach. In summary, our algorithm is effective to complete the challenging task of enhancing the nighttime image.
A Reconstruction Method for Spatial Data Using Parallel SNESIM
Zhang Ting, Du Yi, Huang Tao, Li Xue
2015, 52(6):  1431-1442.  doi:10.7544/issn1000-1239.2015.20140356
Asbtract ( 764 )   HTML ( 2)   PDF (4838KB) ( 699 )  
Related Articles | Metrics
The application of spatial data is becoming increasingly large. Interpolation can effectively reconstruct the unknown data in space, which is actually a process of data reproduction, and also a process of reproducing data with higher resolution from original data. Interpolation methods are divided into two branches: definite interpolation and indefinite interpolation. On one hand, the uncertainty of indefinite interpolation shows in selecting certain stochastic interpolation ways; on the other hand, the uncertainty is reflected by selecting the interpolation parameters using probability principles. Multiple-point simulation(MPS) is an important indefinite interpolation method in reconstructing spatial data, and single normal equation simulation(SNESIM), as a frequently used MPS method, has been used in three-dimensional reconstruction of categorical spatial data in many fields currently. However, due to the large burdens on CPU and memory brought in by SNESIM, its practical application has been limited greatly. To overcome this limitation, SNESIM is parallelized using compute unified device architecture(CUDA). A proper size of data template is chosen using the entropy theory of training image (TI) and the reconstruction quality is improved by the integration of soft data and hard data. Compared with the CPU-based SNESIM method, the CUDA-based one shows the better reconstruction efficiency of spatial data.
Hybrid-Fixing: Toward Sound Fixing of Context Inconsistency
Chen Xiaokang, Xu Chang, Jiang Lei
2015, 52(6):  1443-1451.  doi:10.7544/issn1000-1239.2015.20131904
Asbtract ( 742 )   HTML ( 0)   PDF (1968KB) ( 726 )  
Related Articles | Metrics
In pervasive computing, environmental contexts are subject to frequent and rapid changes, and context-aware applications adapt their behavior accordingly. However, context inconsistency occurs due to various reasons including unpredictable and uncontrollable environmental noises and dynamics, which results in application anomaly or even failure. To address this problem, context inconsistency should be timely detected and then fixed in an automated and sound way. Based on our previous work we propose two techniques, named complete-fixing and CoSound-fixing, to fix context inconsistency automatically for context-aware applications. However, the two techniques are subject to some limitations in that complete-fixing has a time complexity issue and does work so efficiently, and CoSound-fixing does not have a satisfactory fixing success rate. In this paper, we propose a new fixing technique, named hybrid-fixing, which combines the static analysis of consistency constraints and the dynamic generation of repair actions to ensure the soundness of its generated repair cases, even when there exist complex dependencies inside consistency constraints. Experimental results show that our hybrid-fixing significantly increases the fixing success rate for detected context inconsistencies, as compared with CoSound-fixing when facing complex dependencies inside consistency constraints, while still incurring minor time cost only and achieving the fixing of context inconsistency in a fully automated way.
Query Optimization by Statistical Approach for Hive Data Warehouse
Wang Youwei, Wang Weiping, Meng Dan
2015, 52(6):  1452-1462.  doi:10.7544/issn1000-1239.2015.20140403
Asbtract ( 970 )   HTML ( 1)   PDF (4623KB) ( 836 )  
Related Articles | Metrics
Map/Reduce is an efficient parallel programming model, which is now widely utilized to analyze massive data. Hive is an open source data warehouse which utilizes Map/Reduce to implement its query processing engine. However, the issue of unbalanced workload distribution in the whole cluster arises when processing skewed data using Map/Reduce. The possible results range from low runtime efficiency to task failures. To solve such problem, we propose an approach named the computation balanced model (CBM), which optimizes to queries by using distribution statistics. The main contributions of this paper include two parts correspondingly: firstly, the runtime cost evaluation model is established for two widely-used types of queries, i.e., the GroupBy and Join queries, especially under different situations; secondly, the highly-efficient statistics approach for massive data is designed and implemented adapting to the data access mechanism of Hive. Experiment results show the processing time of GroupBy query optimized by CBM is reduced by about 8%-45%, while the processing time of Join query is reduced by over 12%-46%. And the balance distribution of cluster payload is improved by about 60%-80% for CPU and 60%-90% for I/O. We believe the optimized query plan generator by CBM significantly balances the payload distribution during the execution of Map/Reduce tasks, as well as improves the query efficiency greatly.