ISSN 1000-1239 CN 11-1777/TP


    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    A High Energy Efficiency FFT Accelerator on DSP Chip
    Lei Yuanwu, Chen Xiaowen, Peng Yuanxi
    Journal of Computer Research and Development    2016, 53 (7): 1438-1446.   DOI: 10.7544/issn1000-1239.2016.20160123
    Abstract2129)   HTML12)    PDF (3876KB)(939)       Save
    Fast Fourier transform (FFT) is a most time-consuming algorithm in the domain of digital signal processing (DSP). The performance and energy efficiency of FFT will make significant effect on different DSP applications. Thus, this paper presents a high energy efficiency variable-size FFT accelerator based on matrix transposition on DSP chip. Several parallel schemes are employed to exploit instruction level parallel and task level parallel of batch of small-size FFTs or big-size Cooley-Tukey FFT. A “Ping-Pong” structure of multi-bank data memory (MBDM) is presented to overlap the overhead of data move and FFT calculation. Moreover, based on MBDM, fast matrix transposition algorithm with basic block transposition is presented to avoid the matrix access with column-wise and improve the utilization of DDR bandwidth. Hybrid twiddle factor generating scheme, combining lookup table and on-line calculation with CORDIC, is presented to reduce the hardware for twiddle factor. Experimental results show that our FFT accelerator prototype with power efficiency of 146 GFLOPs/W, achieves energy efficiency improvement by about two orders of magnitude with multi-thread FFTW on Intel Xeon CPU.
    Related Articles | Metrics
    XOS: A QoE Oriented Energy Efficient Heterogeneous Multi-Processor Schedule Mechanism
    Gong Xiaoli, Yu Haiyang, Sun Chengjun, Li Tao, Zhang Jin, Ma Jie
    Journal of Computer Research and Development    2016, 53 (7): 1467-1477.   DOI: 10.7544/issn1000-1239.2016.20160113
    Abstract1727)   HTML0)    PDF (2548KB)(564)       Save
    Smart mobile devices are playing a more and more important part in people’s daily life. However, the pursuit of increasing performance of mobile devices directly conflicts with the limited battery capacity. The inevitable contradiction between them begins to block the development of smart mobile devices. To overcome this limitation, the heterogeneous multi-processor architecture can balance the user experience and the energy consumption on smart mobile devices, which makes it become a new solution. Based on a compartmental QoE model, a schedule mechanism called XOS oriented heterogeneous multi-processor devices is presented to provide a high energy efficient solution. In XOS, the user interaction tasks are recognized by the operating system based on the cross-layer information, and more computing resources are allocated to these tasks to guarantee the quality of experience, while resources would be limited for other tasks to reduce energy consumption. A simulation system is built to verify the effectiveness of the XOS model and then make a reasonable optimization. Then the implementation and the experiment of the XOS are conducted on Odroid-XU3 board with Android operation system. The result shows that the tasks scheduled by XOS decelerate lessens 2.7%~7.3% QoE lost, whereas they reduce 8%~48% energy consumption at the same time.
    Related Articles | Metrics
    Green Hierarchical Management for Distributed Datacenter Containers
    Hou Xiaofeng, Song Pengtao, Tang Weichao, Li Chao, Liang Xiaoyao
    Journal of Computer Research and Development    2016, 53 (7): 1493-1502.   DOI: 10.7544/issn1000-1239.2016.20160119
    Abstract1358)   HTML3)    PDF (2645KB)(524)       Save
    In recent years, modular datacenters (datacenter containers) have become promising IT infrastructure solutions due to their impressive efficiency and scalability. Pre-fabricated containers can be deployed not only in existing warehouse-scale datacenter facilities for capacity expansion, but also in urban/remote areas to support onsite Internet of Things (IoT) data processing. Combing conventional centralized datacenter servers with distributed containers can offer cloud providers new opportunities of exploiting local renewable energy resources and reducing unnecessary data movement overhead. This paper investigates a hierarchical management strategy for emerging geographically distributed datacenter containers. We logically group distributed datacenter containers into multiple classes that have different data accessing patterns. During runtime a central navigation system is used to monitor the utilization of each container class and perform dynamic sleeping scheduling to further improve the overall energy efficiency. Experimental results on our scaled-down test-bed show that the proposed mechanism can save 12%~32% energy cost while ensuring high performance.
    Related Articles | Metrics
    Energy-Efficient Fingerprint Matching Based on Reconfigurable Micro Server
    Qian Lei, Zhao Jinming, Peng Dajia, Li Xiang, Wu Dong, Xie Xianghui
    Journal of Computer Research and Development    2016, 53 (7): 1425-1437.   DOI: 10.7544/issn1000-1239.2016.20160076
    Abstract1325)   HTML1)    PDF (4765KB)(800)       Save
    Large-scale fingerprint based application needs high-performance fingerprint matching backend system as a support. Based on reconfigurable micro server(RMS) technology, we propose a software-hardware cooperated fingerprint matching approach. Relying on the advantages of reconfigurable hybrid core computing architecture, our approach can accelerate the computing intensive part of fingerprint matching algorithm by using highly customized hardware accelerator and process the parts which contain complex control flows and a large number of discrete memory accesses on general processing cores. Then, we complete the implementation of algorithm prototype and the performance test on RMS computing node. The test result shows that, single RMS node can achieve about 10,500 fingerprint matches per second with only 5 watts power consumption. Compared with related works, the fingerprint matching performance of a single RMS computing node is 15.5 times that of a single X86 cluster node. Its energy efficiency is 583 times of single X86 cluster node and 5.4 times of Tesla C2075 based GPU server. Based on RMS technology, our method is more flexible and extensible than FPGA platform. It is expected to become an effective technique solution for building large-scale fingerprint matching system in the future.
    Related Articles | Metrics
    Energy Optimization Heuristic for Deadline-Constrained Workflows in Heterogeneous Distributed Systems
    Jiang Junqiang, Lin Yaping, Xie Guoqi, Zhang Shiwen
    Journal of Computer Research and Development    2016, 53 (7): 1503-1516.   DOI: 10.7544/issn1000-1239.2016.20160137
    Abstract1305)   HTML6)    PDF (3863KB)(622)       Save
    Most of existing energy optimization heuristics with deadline constraint for workflows in DVFS-enabled heterogeneous distributed systems usually trap in local optima. In this paper, we propose a new energy optimization heuristic called backward frog-leaping global energy conscious scheduling: BFECS. This algorithm makes full use of surplus time between the lowerbound of the workflow and the constrained deadline. Specifically, it starts from the constrained deadline, and leapfrogs towards the lowerbound of the workflow with different leap interval. During the whole process of leapfrogging, the leap intervals are continually changed according to the locally optimal value until the endpoint of leapfrogging is reached; the scheduling sequence with least run energy consumption is also saved at the same time. Furthermore, more energy consumption can be reduced by leveraging slack time reclamation technique, and the idle time slots caused by precedence constraints can be assimilated by the tasks through running at a lower and suitable voltage/frequency using DVFS technique, without violating the precedence constraints of the workflow and breaking the deadline. The experimental results show that the proposed algorithm can decrease energy consumption significantly.
    Related Articles | Metrics
    P-SMART: An Energy-Efficient NoC Router Based on SMART
    Li Bin, Dong Dezun, Wu Ji, Xia Jun
    Journal of Computer Research and Development    2016, 53 (7): 1447-1453.   DOI: 10.7544/issn1000-1239.2016.20160150
    Abstract1149)   HTML1)    PDF (3553KB)(554)       Save
    As the number of on-chip cores in chip multiprocessors (CMPs) increase, size of network-on-chips (NoCs) and network latency increase. NoCs consume an increasing fraction of the chip power as technology and voltage continue to scale down, and static power consumes a larger fraction of the total power. Currently, processor designers strive to send under-utilized cores into deep sleep states in order to improve overall energy efficiency. However, even in state-of-the-art CMP designs, when a core going to sleep the router attached to it remains active in order to continue packet forwarding. The router attached to a sleeping core has low traffic load, due to no packets to or from sleeping core. An on-chip network called SMART (single-cycle multi-hop asynchronous repeated traversal) that aims to present a single-cycle data-path all the way from the source to the destination. This paper, we propose reducing the VC(virtual channel) of router that is attached to sleeping core, based on SMART NoC, reducing power consumption and bringing little performance penalty. We evaluate our network using synthetic traffics. Our evaluation results show that VC power gating increases network latency less than 2% when the workload is low, and compared with no bypass path network, the power is reduced about 13.4%.
    Related Articles | Metrics
    PLUFS: An Overhead-Aware Online Energy-Efficient Scheduling Algorithm for Periodic Real-Time Tasks in Multiprocessor Systems
    Zhang Dongsong, Wang Jue, Zhao Zhifeng, Wu Fei
    Journal of Computer Research and Development    2016, 53 (7): 1454-1466.   DOI: 10.7544/issn1000-1239.2016.20160163
    Abstract1108)   HTML0)    PDF (4387KB)(559)       Save
    Although some existing multiprocessor energy-efficient approaches for periodic real-time tasks can achieve more energy savings with taking practical overhead of processor into consideration, they cannot guarantee the optimal feasibility of periodic tasks. For the non-ignorable overhead of switching the processor state in embedded real-time systems, this paper proposes an overhead-aware online energy-efficient real-time scheduling algorithm in multiprocessor systems, the periodic tasks with largest utilization first based on switching overhead (PLUFS). PLUFS utilizes the fluid scheduling concept of time local (TL) remaining execution plane and the switching overhead of the processor states to implement energy-efficient scheduling for real-time tasks in multiprocessors at the initial time of each TL plane as well as at the end execution time of a periodic task in each TL plane. Consequently, PLUFS can obtain a reasonable tradeoff between the real-time constraint and the energy-saving while realizing the optimal feasibility of periodic tasks. Mathematical proof and extensive simulation results demonstrate that PLUFS guarantees the optimal feasibility of periodic tasks, and on average saves more energy than existing algorithms, and improves the saved energy of some existing algorithms by about 10% to 20% at the same time.
    Related Articles | Metrics
    Optimization Research on Thermal and Power Management for Real-Time Systems
    Li Tiantian, Yu Ge, Song Jie
    Journal of Computer Research and Development    2016, 53 (7): 1478-1492.   DOI: 10.7544/issn1000-1239.2016.20160134
    Abstract1085)   HTML1)    PDF (2463KB)(524)       Save
    Power consumption issue of real-time system has been paid much attention to by both academia and industry due to its constraints on energy, peak temperature and deadline of real-time task. Up to now, there have been many related researches. Temperature-unaware traditional researches usually adopt DVS to scale processor states for optimal power management. However, with the increasing power density of processors due to the continuous shrinking of chip size, the mutual effect between temperature and power has become unignorably. As a consequence, many new temperature-aware optimization approaches have derived based on the traditional methods. This paper firstly makes an overview of the three models (task, thermal and power) this research bases on; secondly, this paper divides existing researches into two categories: temperature-unaware traditional researches and temperature-aware optimization researches, and the latter one is further divided as single task optimization and multi-task scheduling; thirdly, this paper makes a comparison of the researches from mechanism, optimization goal and effect, and scheduling time etc., analyzing their advantages and disadvantages; finally, this paper points out the future research directions.
    Related Articles | Metrics
    A Statistic-Based Method for Hard-Disk Power Consumption in Storage System
    Sun Jian, Li Zhanhuai, Zhang Xiao, Wang Huifeng, Zhao Xiaonan
    Journal of Computer Research and Development    2016, 53 (7): 1517-1531.   DOI: 10.7544/issn1000-1239.2016.20160133
    Abstract1050)   HTML1)    PDF (8104KB)(352)       Save
    Due to the rapid development of big data in the data center, power consumption of storage system is a major issue in today’s datacenters. How to reduce the power consumption of storage systems has become an urgent issue and a hot research topic in the field of computer science. As the hard disk drive is the primary storage medium in today’s storage systems, modeling hard-disk power consumption is attracting more attention in the current state of research. The accurate power consumption model of disk can not only solve the problem of power matching in data center devices, but also estimate the accuracy of energy-efficient solutions. We develop a statistic-based hard-disk power modeling method that estimates the power consumption of storage workloads. The model makes up the weakness of traditional fine-grained model and it is more accurate than the coarse-grained model. In practical applications, it does not need to record the disk internal activities, and does not need to trace complex parameter. Our power estimation results are highly accurate, which means error of 3% and the model is applicable to the synchronous IO and asynchronous IO. Moreover, our model can also be applied to various online storage systems and data center.
    Related Articles | Metrics
    Journal of Computer Research and Development    2016, 53 (7): 1423-1424.  
    Abstract1038)   HTML1)    PDF (389KB)(671)       Save
    Related Articles | Metrics