ISSN 1000-1239 CN 11-1777/TP

    null
    null
    null

    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Journal of Computer Research and Development    2017, 54 (6): 1314-1315.  
    Abstract1212)   HTML11)    PDF (961KB)(821)       Save
    Related Articles | Metrics
    Real-Time Panoramic Video Stitching Based on GPU Acceleration Using Local ORB Feature Extraction
    Du Chengyao, Yuan Jingling, Chen Mincheng, Li Tao
    Journal of Computer Research and Development    2017, 54 (6): 1316-1325.   DOI: 10.7544/issn1000-1239.2017.20170095
    Abstract3618)   HTML15)    PDF (8791KB)(1774)       Save
    Panoramic video is a sort of video recorded at the same point of view to record the full scene. The collecting devices of panoramic video are getting widespread attention with the development of VR and live-broadcasting video technology. Nevertheless, CPU and GPU are required to possess strong processing abilities to make panoramic video. The traditional panoramic products depend on large equipment or post processing, which results in high power consumption, low stability, unsatisfying performance in real time and negative advantages to the information security. This paper proposes a L-ORB feature detection algorithm. The algorithm optimizes the feature detection regions of the video images and simplifies the support of the ORB algorithm in scale and rotation invariance. Then the features points are matched by the multi-probe LSH algorithm and the progressive sample consensus (PROSAC) is used to eliminate the false matches. Finally, we get the mapping relation of image mosaic and use the multi-band fusion algorithm to eliminate the gap between the video. In addition, we use the Nvidia Jetson TX1 heterogeneous embedded system that integrates ARM A57 CPU and Maxwell GPU, leveraging its Teraflops floating point computing power and built-in video capture, storage, and wireless transmission modules to achieve multi-camera video information real-time panoramic splicing system, the effective use of GPU instructions block, thread, flow parallel strategy to speed up the image stitching algorithm. The experimental results show that the algorithm mentioned can improve the performance in the stages of feature extraction of images stitching and matching, the running speed of which is 11 times than that of the traditional ORB algorithm and 639 times than that of the traditional SIFT algorithm. The performance of the system accomplished in the article is 59 times than that of the former embedded one, while the power dissipation is reduced to 10W.
    Related Articles | Metrics
    Cited: Baidu(3)
    A Key-Value Database Optimization Method Based on Raw Flash Device
    Qin Xiongjun, Zhang Jiacheng, Lu Youyou, Shu Jiwu
    Journal of Computer Research and Development    2017, 54 (6): 1326-1336.   DOI: 10.7544/issn1000-1239.2017.20170092
    Abstract1271)   HTML4)    PDF (5767KB)(815)       Save
    In recent years, NoSQL key-value databases have been widely used. However, the current mainstream key-value databases are based either on disk, or on traditional file system and flash translation layer, which makes it difficult to utilize the characteristics of flash devices, and also limits I/O concurrency of flash devices. Moreover, garbage collection process under such kind of architecture is complex. This paper designs and implements Flashkv, a key-value data management architecture based on raw flash device. Flashkv doesn’t use file system and flash translation layer, instead, it’s space management and garbage collection are done by the management unit in the user mode. Flashkv makes full use of the concurrent features inside the flash device, and simplifies the garbage collection process and removes redundant function modules which exist in both traditional file system and flash translation layer, and also shortens the I/O path. This paper proposes I/O scheduling technology based on the characteristics of flash memory, which reduces read and write latency of flash memory and improves throughput. The user mode cache management technology is proposed, which reduces write amount and also the cost of frequent system calls. Test results show that Flashkv’s performance is 1.9 to 2.2 times that of levelDB and the write amount reduces by 60% to 65%.
    Related Articles | Metrics
    A Quantitative Analysis on the “Approximatability” of Machine Learning Algorithms
    Jiang Shuhao, Yan Guihai, Li Jiajun, Lu Wenyan, Li Xiaowei
    Journal of Computer Research and Development    2017, 54 (6): 1337-1347.   DOI: 10.7544/issn1000-1239.2017.20170086
    Abstract1469)   HTML4)    PDF (5472KB)(1144)       Save
    Recently, Machine learning algorithms, such as neural network, have made a great progress and are widely used in image recognition, data searching and finance analysis field. The energy consumption of machine learning algorithms becomes critical with more complex issues and higher data dimensionality. Because of the inherent error-resilience of machine learning algorithms, approximate computing techniques, which trade the accuracy of results for energy savings, are applied to save energy consumption of these algorithms by many researchers. We observe that most works are dedicated to leverage the error-resilience of certain algorithm while they ignore the difference of error-resilience among different algorithms. Understanding the difference on “approximatability” of different algorithms is very essential because when the approximate computing techniques are applied, approximatability can help the classification tasks choose the best algorithms to achieve the most energy savings. Therefore, we choose 3 common supervised learning algorithms, that is, SVM, random forest (RF) and neural network (NN), and evaluate their approximatibility targeted to different kinds of energy consumption. Meanwhile, we also design several metrics such as memory storage contamination sensitivity, memory access contamination sensitivity and energy diversity to quantify the difference on approximatability of learning algorithms. The conclusion from evaluation will assist in choosing the appropriate learning algorithms when the classification applications apply approximate computing techniques.
    Related Articles | Metrics
    A Comparison Among Different Numeric Representations in Deep Convolution Neural Networks
    Wang Peiqi, Gao Yuan, Liu Zhenyu, Wang Haixia, Wang Dongsheng
    Journal of Computer Research and Development    2017, 54 (6): 1348-1356.   DOI: 10.7544/issn1000-1239.2017.20170098
    Abstract1264)   HTML3)    PDF (4457KB)(1211)       Save
    Deep convolution neural networks have been widely used in industries as well as academic area because of their triumphant performance. There are tendencies toward deeper and more complex network structures, which leads to demand of substantial computation and memory resources. Customized hardware is an appropriate and feasible option, which is beneficial to maintain high performance in lower energy consumption. Furthermore, customized hardware can also be adopted in some special situations where CPU and GPU cannot be placed. During the hardware-designing processes, we need to address some problems like how to choose different types of numeric representation as well as precision. In this article, we focus on two typical numeric representations, fixed-point and floating-point, and propose corresponding error models. Using these models, we theoretically analyze the influence of different types of data representation on the hardware overhead of neural networks. It is remarkable that floating-point has clear advantages over fixed-point under ordinary circumstances. In general, we verify through experiments that floating-point numbers, which are limited to certain precision, preponderate in both hardware area and power consumption. What’s more, according to the features of floating-point representation, our customized hardware implementation of convolution computation declines the power and area with 14.1× and 4.38× respectively.
    Related Articles | Metrics
    Increasing PCM Lifetime by Using Pipelined Pseudo-Random Encoding Algorithm
    Gao Peng, Wang Dongsheng, Wang Haixia
    Journal of Computer Research and Development    2017, 54 (6): 1357-1366.   DOI: 10.7544/issn1000-1239.2017.20170065
    Abstract996)   HTML11)    PDF (8297KB)(689)       Save
    Phase change memory (PCM) is a promising technique due to its low static power, non-volatility, and density potential. However, the low endurance remains as the key problem to be solved before it can be widely used in practice. Generally, minimizing modified bits in write operation by writing the different bits, is an effective method to extend the lifetime of PCM. But it’s still challenging to reach the minimum without causing significant slowdown of read/write operations. To this end, we propose FEBRE: A fast and efficient bit-flipping reduction technique to extend PCM lifetime. The key idea of our method is to design and use a novel one-to-many parallel mapping before differential write stage. Specifically, FEBRE employs a new data encoding method to generate multiple highly random distributed encoded vectors from one writing data item, which thus increases the possibility of identifying the nearest one to stored data in those vectors. The other contribution of our technique is a pipelined pseudo-random encoding algorithm (PPREA). The new algorithm reduces writing overhead because it is able to accelerate the procedure of the one-to-many mapping. The experiment shows that our technique, compared with PRES, can reduce bit flips by 5.31% on average, and improve the encodingdecoding speed by 2.29x and 45%, respectively.
    Related Articles | Metrics
    A Memristor-Based Processing-in-Memory Architecture for Deep Convolutional Neural Networks Approximate Computation
    Li Chuxi, Fan Xiaoya, Zhao Changhe, Zhang Shengbing, Wang Danghui, An Jianfeng, Zhang Meng
    Journal of Computer Research and Development    2017, 54 (6): 1367-1380.   DOI: 10.7544/issn1000-1239.2017.20170099
    Abstract1383)   HTML5)    PDF (8816KB)(1036)       Save
    Memristor is one of the most promising candidates to build processing-in-memory (PIM) structures. The memristor-based PIM with digital or multi-level memristors has been proposed for neuromorphic computing. The essential frequent AD/DA converting and intermediate memory in these structures leads to significant energy and area overhead. To address this issue, a memristor-based PIM architecture for deep convolutional neural network (CNN) is proposed in this work. It exploits the analog architecture to eliminate data converting in neuron layer banks, each of which consists of two special modules named weight sub-arrays (WSAs) and accumulate sub-arrays (ASAs). The partial sums of neuron inputs are generated in WSAs concurrently and are written into ASAs continuously, in which the results are computed finally. The noise in proposed analog style architecture is analyzed quantitatively in both model and circuit levels, and a synthetic solution is presented to suppress the noise, which calibrates the non-linear distortion of weight with a corrective function, pre-charges the write module to reduce the parasitic effects, and eliminates noise with a modified noise-aware training. The proposed design has been evaluated by varying neural network benchmarks, in which the results show that the energy efficiency and performance can both be improved about 90% in specific neural network without accuracy losses compared with digital solutions.
    Related Articles | Metrics
    Design of RDD Persistence Method in Spark for SSDs
    Lu Kezhong, Zhu Jinbin, Li Zhengmin, Sui Xiufeng
    Journal of Computer Research and Development    2017, 54 (6): 1381-1390.   DOI: 10.7544/issn1000-1239.2017.20170108
    Abstract1475)   HTML13)    PDF (5951KB)(1012)       Save
    SSD (solid-state drive) and HDD (hard disk drive) hybrid storage system has been widely used in big data computing datacenters. The workloads should be able to persist data of different characteristics to SSD or HDD on demand to improve the overall performance of the system. Spark is an industry-wide efficient data computing framework, especially for the applications with multiple iterations. The reason is that Spark can persist data in memory or hard disk, and persisting data to the hard disk can break the insufficient memory limits on the size of the data set. However, the current Spark implementation does not specifically provide an explicit SSD-oriented persistence interface, although data can be distributed proportionally to different storage mediums based on configuration information, and the user can not specify RDD’s persistence locations according to the data characteristics, and thus the lack of relevance and flexibility. This has not only become a bottleneck to further enhance the performance of Spark, but also seriously affected the played performance of hybrid storage system. This paper presents the data persistence strategy for SSD for the first time as we know. We explore the data persistence principle in Spark, and optimize the architecture based on hybrid storage system. Finally, users can specify RDD’s storage mediums explicitly and flexibly leveraging the persistence API we provided. Experimental results based on SparkBench shows that the performance can be improved by an average of 14.02%.
    Related Articles | Metrics
    Design and Implementation of Positive and Negative Discriminator of MSD Data for Ternary Optical Processor
    Zhang Honglie, Zhou Jian, Zhang Sulan, Liu Yanju, Wang Xianchao
    Journal of Computer Research and Development    2017, 54 (6): 1391-1404.   DOI: 10.7544/issn1000-1239.2017.20170093
    Abstract821)   HTML0)    PDF (7239KB)(827)       Save
    The numerical positive/negative or zero value discriminator is a key component to compare the data size in computer. With the advent of the MSD (modified signed-digit) parallel adder which using three state optical signal to express number in the ternary optical processor, the research of positive/negative or zero value discriminator of MSD digit is becoming an important test to perfect ternary optical processor. Based on the characteristics of MSD data and the correspondence of the optical signal and the MSD digit, this paper proposes a method to ascertain the positive/negative or zero value of the multi-bit MSD data via direct analysis of a group of tree state optical signals which expressing the MSD data. By applying this method to the subtraction result of MSD data, it is realized to discriminate the size of two MSD data. According to the above theory, in this paper a structure of MSD data discriminator is established, which is made of polarizer, liquid crystal and half-mirror. In addition to FPGA as the control circuit, a 3-bit MSD data discriminator is realized. The validity of the discriminator is proved by some experiment, and the correctness of the basic theory and the feasibility of the structural design are proved too.
    Related Articles | Metrics