ISSN 1000-1239 CN 11-1777/TP


    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Journal of Computer Research and Development    2018, 55 (2): 227-228.  
    Abstract700)   HTML5)    PDF (418KB)(1058)       Save
    Related Articles | Metrics
    NV-Shuffle: Shuffle Based on Non-Volatile Memory
    Pan Fengfeng, Xiong Jin
    Journal of Computer Research and Development    2018, 55 (2): 229-245.   DOI: 10.7544/issn1000-1239.2018.20170742
    Abstract801)   HTML1)    PDF (6147KB)(678)       Save
    In the popular big data processing platforms like Spark, it is common to collect data in a many-to-many fashion during a stage traditionally known as the Shuffle phase. Data exchange happens across different types of tasks or stages via Shuffle phase. And during this phase, the data need to be transferred via network and persisted into traditional disk-based file system. Hence, the efficiency of Shuffle phase is one of the key factors in the performance of the big data processing. In order to reducing I/O overheads, we propose an optimized Shuffle strategy based on Non-Volatile Memory (NVM)—NV-Shuffle. Next-generation non-volatile memory (NVM) technologies, such as Phase Change Memory (PCM), Spin-Transfer Torque Magnetic Memories (STTMs) introduce new opportunities for reducing I/O overhead, due to their non-volatility, high read/write performance, low energy, etc. In the big data processing platform based on memory computing such as Spark, Shuffle data access based on disks is an important factor of application performance, NV-Shuffle uses NVM as persist memory to store Shuffle data and employs direct data accesses like memory by introducing NV-Buffer to organize data instead of traditional file system.We implemented NV-Shuffle in Spark. Our performance results show, NV-shuffle reduces job execution time by 10%~40% for Shuffle-heavy workloads.
    Related Articles | Metrics
    Heterogeneous Memory Programming Framework Based on Spark for Big Data Processing
    Wang Chenxi, Lü Fang, Cui Huimin, Cao Ting, John Zigman, Zhuang Liangji, Feng Xiaobing
    Journal of Computer Research and Development    2018, 55 (2): 246-264.   DOI: 10.7544/issn1000-1239.2018.20170687
    Abstract869)   HTML0)    PDF (5066KB)(624)       Save
    Due to the boom of big data applications, the amount of data being processed by servers is increasing rapidly. In order to improve processing and response speed, industry is deploying in-memory big data computing systems, such as Apache Spark. However, traditional DRAM memory cannot satisfy the large memory request of these systems for the following reasons: firstly, the energy consumption of DRAM can be as high as 40% of the total; secondly, the scaling of DRAM manufacturing technology is hitting the limit. As a result, heterogeneous memory integrating DRAM and NVM (non-volatile memory) is a promising candidate for future memory systems. However, because of the longer latency and lower bandwidth of NVM compared with DRAM, it is necessary to place data in appropriate memory module to achieve ideal performance. This paper analyzes the memory access behavior of Spark applications and proposes a heterogeneous memory programming framework based on Spark. It is easy to apply this framework to existing Spark applications without rewriting the code. Experiments show that for Spark benchmarks, by utilizing our framework, only placing 20%~25% data on DRAM and the remaining on NVM can reach 90% of the performance when all the data is placed on DRAM. This leads to an improved performance-dollar ratio compared with DRAM-only servers and the potential support for larger scale in-memory computing applications.
    Related Articles | Metrics
    Design and Verification of NVM Control Architecture Based on High-Performance SOC FPGA Array
    Liu Ke, Cai Xiaojun, Zhang Zhiyong, Zhao Mengying, Jia Zhiping
    Journal of Computer Research and Development    2018, 55 (2): 265-272.   DOI: 10.7544/issn1000-1239.2018.20170695
    Abstract912)   HTML3)    PDF (3108KB)(522)       Save
    Emerging non-volatile memory (NVM) technologies are getting mature with lower latency and higher bandwidth. In the future, these new technologies show the potentials that not only replace the DRAM as the main memory but also serve in the external memory storage. Meanwhile, designing an efficient memory system has become popular in both the academic world and the industrial world. In this paper, we describe a high-performance NVM verification architecture based on the array of SOC FPGAs. Within the architecture, multiple levels of FPGAs are employed to connect many NVMs. Based on the architecture, we propose a novel master-slave NVM controller and then design a hardware prototype accordingly. The experiment results running on this prototype show that the architecture can not only test the performance of the homogenous NVM groups, but also verify the management scheme of hybrid NVM arrays. Moreover, the high performance of MRAM shows that MRAM has the potential to serve in both cache and main memory.
    Related Articles | Metrics
    Large-Scale Graph Processing on Multi-GPU Platforms
    Zhang Heng, Zhang Libo, WuYanjun
    Journal of Computer Research and Development    2018, 55 (2): 273-288.   DOI: 10.7544/issn1000-1239.2018.20170697
    Abstract857)   HTML3)    PDF (5112KB)(929)       Save
    GPU-based node has emerged as a promising direction toward efficient large-scale graph processing, which is relied on the high computational power and scalable caching mechanisms of GPUs. Out-of-core graphs are the graphs that exceed main and GPU-resident memory capacity. To handle them, most existing systems using GPUs employ compact partitions of fix-sized ordered edge sets (i.e., shards) for the data movement and computation. However, when scaling to platforms with multiple GPUs, these systems have a high demand of interconnect (PCI-E) bandwidth. They suffer from GPU underutilization and represent scalability and performance bottlenecks. This paper presents GFlow, an efficient and scalable graph processing system to handle out-of-core graphs on multi-GPU nodes. In GFlow, we propose a novel 2-level streaming windows method, which stores graph’s attribute data consecutively in shared memory of multi-GPUs, and then streams graph’s topology data (shards) to GPUs. With the novel 2-level streaming windows, GFlow streams shards dynamically from SSDs to GPU devices’ memories via PCI-E fabric and applies on-the-fiy updates while processing graphs, thus reducing the amount of data movement required for computation. The detailed evaluations demonstrate that GFlow significantly outperforms most other competing out-of-core systems for a wide variety of graphs and algorithms under multi-GPUs environment, i.e., yields average speedups of 256X and 203X over CPU-based GraphChi and X-Stream respectively, and 1.3~2.5X speedup against GPU-based GraphReduce (single-GPU). Meanwhile, GFlow represents excellent scalability as we increase the number of GPUs in the node.
    Related Articles | Metrics
    Partitioning Acceleration Between CPU and DRAM: A Case Study on Accelerating Hash Joins in the Big Data Era
    Wu Linyang, Luo Rong, Guo Xueting, Guo Qi
    Journal of Computer Research and Development    2018, 55 (2): 289-304.   DOI: 10.7544/issn1000-1239.2018.20170842
    Abstract958)   HTML4)    PDF (5194KB)(471)       Save
    Hardware acceleration has been very effective in improving energy efficiency of existing computer systems. As traditional hardware accelerator designs (e.g. GPU, FPGA and customized accelerators) remain decoupled from main memory systems, reducing the energy cost of data movement remains a challenging problem, especially in the big data era. The emergence of near-data processing enables acceleration within the 3D-stacked DRAM to greatly reduce the data movement cost. However, due to the stringent area, power and thermal constraints on the 3D-stacked DRAM, it is nearly impossible to integrate all computation units required for a sufficiently complex functionality into the DRAM. Therefore, there is a need to design the memory side accelerator with this partitioning between CPU and accelerator in mind. In this paper, we describe our experience with partitioning the acceleration of hash joins, a key functionality for databases and big data systems, using a data-movement driven approach on a hybrid system, containing both memory-side customized accelerators and processor-side SIMD units. The memory-side accelerators are designed for accelerating execution phases that are bounded by data movements, while the processor-side SIMD units are employed for accelerating execution phases with negligible data movement cost. Experimental results show that the hybrid accelerated system improves energy efficiency up to 47.52x and 19.81x, compared with the Intel Has well and Xeon Phi processor, respectively. Moreover, our data-movement driven design approach can be easily extended to guide the design decisions of accelerating other emerging applications.
    Related Articles | Metrics
    Persistent Transactional Memory for Databases
    Hillel Avni, Wang Peng
    Journal of Computer Research and Development    2018, 55 (2): 305-318.   DOI: 10.7544/issn1000-1239.2018.20170863
    Abstract901)   HTML4)    PDF (3702KB)(541)       Save
    Hardware transactional memory (HTM) and byte-addressable nonvolatile memory (NVM) are already available in new computer equipment. It is tempting, but not trivial, to combine them to implement transactions having the capabilities of ACID (atomicity, consistency, isolation and durability), by using HTM for consistency and isolation, and NVM for durability. ACID transactions are especially useful in databases but, because of the size of database transactions, the challenge is to cope with the inherent HTM limitations of size and contention level. In this paper, we first present persistent HTM (PHTM), a software-hardware solution for ACID transactions with HTM. We continue with two methods to mitigate PHTM limitations. One is a persistent hybrid TM algorithm called PHyTM, which allows PHTM transactions to execute concurrently with pure software, unbounded transactions. The other is for workloads where most transactions are too large for PHTM. For the purpose we propose a new algorithm called split transactions execution (STE), which is tailored for relational database transactions. In a nutshell, this paper discusses the extension of HTM to ACID database transactions on NVM.
    Related Articles | Metrics
    X-DB: Software and Hardware Co-Designed Database System
    Zhang Tieying, Huang Gui, Zhang Yingqiang, Wang Jianying, Hu Wei, Zhao Diankui, He Dengcheng
    Journal of Computer Research and Development    2018, 55 (2): 319-326.   DOI: 10.7544/issn1000-1239.2018.20170868
    Abstract3371)   HTML23)    PDF (2166KB)(1761)       Save
    The field of database system has three stages of development. The first stage is when relational model was proposed by E.F Codd. Relational model establishes the foundation of the database theory and database system. It contributes many database market giants, like IBM DB2, Microsoft SQLServer and Oracle. The second stage is due to the rapid development of Internet, which produces NoSQL database system. NoSQL focuses on system scalability but sacrifices transactional features. The third stage is called modern database era represented by new hardware features. Alibaba X-DB is such kind of database system. X-DB fully utilizes new hardware in different areas including storage, network, multi-core, parallel and heterogeneous computing. X-DB co-designs hardware and software and is compatible with MySQL ecosystem with the goal to renovate the relational database system.
    Related Articles | Metrics