ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 January 2021, Volume 58 Issue 1
Survey on Automatic Text Summarization
Li Jinpeng, Zhang Chuang, Chen Xiaojun, Hu Yue, Liao Pengcheng
2021, 58(1):  1-21.  doi:10.7544/issn1000-1239.2021.20190785
Asbtract ( 3709 )   HTML ( 493)   PDF (1756KB) ( 3195 )  
Related Articles | Metrics
In recent years, the rapid development of Internet technology has greatly facilitated the daily life of human, and it is inevitable that massive information erupts in a blowout. How to quickly and effectively obtain the required information on the Internet is an urgent problem. The automatic text summarization technology can effectively alleviate this problem. As one of the most important fields in natural language processing and artificial intelligence, it can automatically produce a concise and coherent summary from a long text or text set through computer, in which the summary should accurately reflect the central themes of source text. In this paper, we expound the connotation of automatic summarization, review the development of automatic text summarization technique and introduce two main techniques in detail: extractive and abstractive summarization, including feature scoring, classification method, linear programming, submodular function, graph ranking, sequence labeling, heuristic algorithm, deep learning, etc. We also analyze the datasets and evaluation metrics that are commonly used in automatic summarization. Finally, the challenges ahead and the future trends of research and application have been predicted.
Deep Neural Architecture Search: A Survey
Meng Ziyao, Gu Xue, Liang Yanchun, Xu Dong, Wu Chunguo
2021, 58(1):  22-33.  doi:10.7544/issn1000-1239.2021.20190851
Asbtract ( 2130 )   HTML ( 53)   PDF (1178KB) ( 1550 )  
Related Articles | Metrics
Deep learning has achieved excellent results on data tasks with multiple modalities such as images, speech, and text. However, designing networks manually for specific tasks is time-consuming and requires a certain level of expertise and design experience from the designer. In the face of today’s increasingly complex network architectures, relying on manual design alone increasingly becomes complex. For this reason, automatic architecture search of neural networks with the help of algorithms has become a hot research topic. The approach of neural architecture search involves three aspects: search space, search strategy, and performance evaluation strategy. The search strategy samples a network architecture in the search space, evaluates the network architecture by a performance evaluation strategy, and feed-back the results to the search strategy to guide it to select a better network architecture, and obtains the optimal network architecture through continuous iterations. In order to better sort out the methods of neural architecture search, we summarize the common methods in recent years from search space, search strategy and performance evaluation strategy, and analyze their strengths and weaknesses.
Label-Specific Features Learning for Feature-Specific Labels Association Mining
Cheng Yusheng, Zhang Lulu, Wang Yibin, Pei Gensheng
2021, 58(1):  34-47.  doi:10.7544/issn1000-1239.2021.20190674
Asbtract ( 730 )   HTML ( 15)   PDF (2792KB) ( 455 )  
Related Articles | Metrics
In multi-label learning, a label may be determined by its own set of unique features only, which are called label-specific features. Using label-specific features in multi-label classification can effectively avoid some useless features affecting the performance of the constructed classification model. However, existing label-specific features methods only extract important features from the label’s perspective, while ignoring extracting important labels from the feature’s perspective. In fact, it’s easier to extract the unique features for labels by focusing on certain labels from the feature’s perspective. Based on this, a novel label-specific features learning algorithm for multi-label classification is proposed. It combines the label’s important features with the feature’s important labels. Firstly, in order to ensure the efficiency and accuracy of the model, the extreme learning machine is used to construct the joint learning model. Subsequently, the elastic network regularization theory is applied to the extreme learning machine’s loss function, and the mutual information theory is used to construct the correlation matrix of feature-specific labels as the L\-2 regularization term, and the label-specific features are extracted by the L\-1 regularization term. The learning model improves the deficiencies of label-specific features and the adaptability of the extreme learning machine in multi-label learning. Compared with several state-of-the-art algorithms on several benchmark multi-label datasets, the experimental results show the rationality and effectiveness of the proposed model.
Video Anomaly Detection Based on Space-Time Fusion Graph Network Learning
Zhou Hang, Zhan Yongzhao, Mao Qirong
2021, 58(1):  48-59.  doi:10.7544/issn1000-1239.2021.20200264
Asbtract ( 993 )   HTML ( 21)   PDF (2870KB) ( 756 )  
Related Articles | Metrics
There are strong correlations among spatial-temporal features of abnormal events in videos. Aiming at the problem of performance for abnormal event detection caused by these correlations, a video anomaly detection method based on space-time fusion graph network learning is proposed. In this method, spatial similarity graph and temporal trend graph for the segments are constructed in terms of the features of the segments. The spatial similarity graph is built dynamically by treating the features of the video segments as the vertexes in graph. In this graph, the weights of edges are dynamically formed by taking the relationship between vertex and its Top-k similarity vertexes into account. The temporal trend graph is built by taking the time distance for m sequential segments into account. The space-time fusion graph convolutional network is constructed by adaptively weighting the spatial similarity graph and temporal trend graph. The video embedding features are learnt and generated by using this graph convolutional network. A graph sparse regularization is added to the ranking loss, in order to reduce the over-smoothing effect of graph model and improve detection performance. The experiments are conducted on two challenging video datasets: UCF-Crime(University of Central Florida crime dataset) and ShanghaiTech. ROC(receiver operating characteristic curve) and AUC (area under curve) are taken as performance metrics. Our method obtains the AUC score of 80.76% rising by 5.35% compared with the baseline on UCF-Crime dataset, and also gets the score of 89.88% rising by 5.44% compared with SOTA(state of the art) weakly supervised algorithm on ShanghaiTech. The experimental results show that our proposed method can improve the performance of video abnormal event detection effectively.
Safe Tri-training Algorithm Based on Cross Entropy
Zhang Yong, Chen Rongrong, Zhang Jing
2021, 58(1):  60-69.  doi:10.7544/issn1000-1239.2021.20190838
Asbtract ( 1230 )   HTML ( 11)   PDF (600KB) ( 236 )  
Related Articles | Metrics
Semi-supervised learning methods improve learning performance with a small amount of labeled data and a large amount of unlabeled data. Tri-training algorithm is a classic semi-supervised learning method based on divergence, which does not need redundant views of datasets and has no specific requirements for basic classifiers. Therefore, it has become the most commonly used technology in semi-supervised learning methods based on divergence. However, Tri-training algorithm may produce the problem of label noise in the learning process, which leads to a bad impact on the final model. In order to reduce the prediction bias of the noise in Tri-training algorithm on the unlabeled data and learn a better semi-supervised classification model, cross entropy is used to replace the error rate to better reflect the gap between the predicted results and the real distribution of the model, and the convex optimization method is combined to reduce the label noise and ensure the effect of the model. On this basis, we propose a Tri-training algorithm based on cross entropy, a safe Tri-training algorithm and a safe Tri-training learning algorithm based on cross entropy, respectively. The validity of the proposed method is verified on the benchmark dataset such as UCI (University of California Irvine) machine learning repository and the performance of the method is further verified from a statistical point of view using a significance test. The experimental results show that the proposed semi-supervised learning method is superior to the traditional Tri-training algorithm in classification performance, and the safe Tri-training algorithm based on cross entropy has higher classification performance and generalization ability.
Target Community Detection with User Interest Preferences and Influence
Liu Haijiao, Ma Huifang, Zhao Qiqi, Li Zhixin
2021, 58(1):  70-82.  doi:10.7544/issn1000-1239.2021.20190775
Asbtract ( 764 )   HTML ( 14)   PDF (2154KB) ( 553 )  
Related Articles | Metrics
Target community detection is to find the cohesive communities consistent with user’s preference. However, all the existing works either largely ignore the outer influence of the communities, or not “target-based”, i.e., they are not suitable for a target request. To solve the above problems, in this paper, the target community detection with user interest preferences and influence (TCPI) is proposed to locate the most influential and high-quality community related to user’s preference. Firstly, the node structure and attribute information are synthesized, and maximum k-cliques containing sample nodes are investigated as the core of the potential target community, and an entropy weighted attribute weight calculation method is designed to capture the attribute subspace weight of the potential target community. Secondly, the internal compactness and the external separability of the community is defined as the community quality function and the high-quality potential target community is expanded with each of the maximum k-cliques as the core. Finally, the external impact score of the community is defined, and all potential target communities are ranked according to the quality function and the external impact score of the community, and the communities with higher comprehensive quality are decided as the target communities. In addition, a pruning strategy of two-level is designed to improve the performance and efficiency of the algorithm after calculating the attribute subspace weights of all maximal k-cliques. Experimental results on synthetic networks and real-world network datasets verify the efficiency and effectiveness of the proposed method.
Fault Detection Context Based Equivalent Mutant Identification Algorithm
Yu Chang, Wang Yawen, Lin Huan, Gong Yunzhan
2021, 58(1):  83-97.  doi:10.7544/issn1000-1239.2021.20190817
Asbtract ( 597 )   HTML ( 13)   PDF (1595KB) ( 371 )  
Related Articles | Metrics
Although studied for almost forty years, the mutation testing has been prevented from being widely applied in industrial practice by the problem of equivalent mutants. To overcome the problem, a algorithm of using fault detection context to predict the equivalence of mutants is proposed. It makes use of static analysis technique to extract feature information about the program context around mutated program, which is called its fault detection context. Then the context information is translated into a document model, which describes the feature of mutant using natural language. The representation learning network is further used to encode fault context features. Finally, machine learning model is used to predict the equivalence of each mutant with respect to its fault detection context. An empirical study on 118000 mutants from 22 C programs is performed to validate the proposed method. The results show that the method achieves 91% of precision and 82% of recall in classifying mutants as equivalent, while 77% of precision and 78% of recall are achieved in cross-project validation. It implies the fault detection context based technique can dramatically improve the efficiency and effectiveness of equivalent mutants detection, which effectively facilitates the efficiency for mutation testing process.
Survey on Network of Distributed Deep Learning Training
Zhu Hongrui, Yuan Guojun, Yao Chengji, Tan Guangming, Wang Zhan, Hu Zhongzhe, Zhang Xiaoyang, An Xuejun
2021, 58(1):  98-115.  doi:10.7544/issn1000-1239.2021.20190881
Asbtract ( 1766 )   HTML ( 62)   PDF (3061KB) ( 1682 )  
Related Articles | Metrics
In recent years, deep learning has achieved better results than traditional algorithms in many fields such as image, speech, and natural language processing. People are increasingly demanding training speed and data processing capabilities for deep learning. However, the calculating ability of a single server has a limit and cannot achieve human demands. Distributed deep learning training has become the most effective method to expand deep learning training computing ability. At present, distributed deep learning faces a training bottleneck due to communication problems in the network during the training process which leads the communication network to be the most influential factor. There are currently many network performance optimization researches for distributed deep learning. In this paper, the main performance bottlenecks and optimization schemes are firstly demonstrated. Then the current state-of-art ultra-large-scale distributed training architecture and methods for optimization performance are specifically analyzed. Finally, a comparative summary of each performance optimization scheme and the difficulties still existing in distributed deep learning training are given, and the future research directions are pointed out as well.
Video Delivery over Named Data Networking: A Survey
Hu Xiaoyan, Tong Zhongq, Xu Ke, Zhang Guoqiang, Zheng Shaoqi, Zhao Lixia, Cheng Guang, Gong Jian
2021, 58(1):  116-136.  doi:10.7544/issn1000-1239.2021.20190697
Asbtract ( 896 )   HTML ( 11)   PDF (1263KB) ( 413 )  
Related Articles | Metrics
The Internet has developed into a network dominated by content delivery services such as delivering live and on-demand videos. There are some problems in traditional IP network in terms of supporting video delivery, such as the complexity and high overhead of the deployment of multicast, the disability to effectively utilize multipath transmission, the poor support for mobility and so on. Named data networking (NDN), a promising future Internet architecture, intrinsically supports in-network caching and multipath transmission. Consumers actively use interest message to request data packet from producer, and this consumer-driven communication model enables NDN to naturally support the mobility of consumers. These features offer the potential for NDN to efficiently deliver videos. This paper first introduces the background of video delivery and NDN, and then elaborates some schemes that take the advantages of NDN to deliver video: firstly, how do the strategies in NDN improve video bit rate; secondly, how do the strategies in NDN improve video playback stability; thirdly, how do the strategies in NDN protect video copyright and privacy; finally, how do the strategies in NDN transfer new types of video. According to the analysis of these existing schemes and the comparison of their performance over IP and NDN, the challenges of delivering videos over NDN are finally pointed out.
Resource Management of Service Function Chain in NFV Enabled Network: A Survey
Zu Jiachen, Hu Guyu, Yan Jiajie, Li Shiji
2021, 58(1):  137-152.  doi:10.7544/issn1000-1239.2021.20190823
Asbtract ( 1112 )   HTML ( 45)   PDF (2139KB) ( 934 )  
Related Articles | Metrics
With the emergence of new network technologies such as cloud computing, software-defined network (SDN) and network function virtualization (NFV), the future network’s management is supposed to become virtual and intelligent. NFV provides an approach to realize service functions based on the virtualization technology, and it adopts general servers to substitute the dedicated middlebox in traditional network, which is able to greatly reduce the capital expenditure (CAPEX) and the operating expense (OPEX) of the telecom service provider (TSP). NFV can also improve flexibility and scalability in the management of network services. Since the end-to-end network services are usually composed of different service functions, it is an important research topic to adopt virtualization technology to build service function chain (SFC) and reasonably allocate and schedule resources. In this paper, based on the background of NFV technology, we introduce the infrastructure, technical basis, and application scenarios of SFC in the NFV enabled network. Afterward, we mainly focus on the different stages of SFC orchestration: SFC composition, SFC placement, SFC scheduling, and SFC adaptive scaling. The correlated existing theoretical research is summarized. Finally, in view of the existing problems, some solutions are proposed and the future research directions are prospected.
FlexTSN: A Flexible TSN Switch Implementation Model
Yang Xiangrui, Yan Jinli, Chen Bo, Peng Jintao, Li Junshuai, Quan Wei, Sun Zhigang
2021, 58(1):  153-163.  doi:10.7544/issn1000-1239.2021.20190784
Asbtract ( 847 )   HTML ( 16)   PDF (3083KB) ( 363 )  
Related Articles | Metrics
TSN (time-sensitive networking) has gained increasing attention from both industry and academia because of its ability to enable deterministic switching and best effort switching in the same network. Compared with traditional Ethernet, TSN provides quite different mechanisms ranging from time synchronization, gate control mechanism, to time-aware scheduling. This enables the Ethernet to provide the packet forwarding service with deterministic delay. Currently, IEEE 802.1 TSN Group is working on over 17 standards and drafts about TSN. And academic researchers also put much effort into proposing novel mechanisms from frame preemption to flow scheduling. However, there are rare, if exist, general models that enable rapid prototyping of TSN system. And we find this is quite important for rapid design and validation of key technologies in TSN. In this paper, FlexTSN, a flexible TSN switch model with loose-coupling modular design for TSN evaluation, is proposed. The TSN switch pipeline is decoupled into general processing modules and time-aware modules for supporting the rapid building of TSN switches. Moreover, FlexTSN provides a light-weight high-reliable network management mechanism by extending the PTP synchronization protocol for fine-grained centralized network monitoring and configuration. Furthermore, a simplified CQF (cyclical queuing and forwarding) model is adapted based on the FlexTSN prototype. The evaluation result shows that FlexTSN can provide clear abstractions for re-design and rapid evaluation of novel mechanisms in TSN.
Accelerating Byzantine Fault Tolerance with In-Network Computing
Yang Fan, Zhang Peng, Wang Zhan, Yuan Guojun, An Xuejun
2021, 58(1):  164-177.  doi:10.7544/issn1000-1239.2021.20190723
Asbtract ( 793 )   HTML ( 17)   PDF (2582KB) ( 320 )  
Related Articles | Metrics
Byzantine fault tolerance algorithm is one kind of fault-tolerant algorithms which can tolerate various software errors and system vulnerabilities. It is of vital importance to the reliability of cloud computing. Compared with other fault-tolerant algorithms, such as proof-of-work (PoW), Byzantine fault tolerance algorithm is much more stable, however, its poor performance cannot meet the demand of cloud computing which requires high throughput and low latency. In-network computing is a data-centric architecture that uses the network to perform some calculations. Using in-network computing, data can be processed as it moves, thereby improving system performance. To solve the performance problem of Byzantine fault tolerant system, in this paper, we propose a Byzantine fault tolerance algorithm optimization strategy with in-network computing, which offloads some of the computational tasks to the network interface card (NIC). The processor and NIC form a multi-stage pipeline which helps us improve the system throughput. Simply using in-network computing can not meet the performance goals in all scenarios, hence we utilize multi-threading technology to scale the system. We evaluate our method on real testbed, and the experimental results show that, compared with the default Byzantine fault tolerant system, we can obtain 46% improvement in overall throughput and 65% decrease in latency. The results have proved our solution to be available and effective.
ECC Multi-Label Code Smell Detection Method Based on Ranking Loss
Wang Jina, Chen Junhua, Gao Jianhua
2021, 58(1):  178-188.  doi:10.7544/issn1000-1239.2021.20190836
Asbtract ( 623 )   HTML ( 8)   PDF (1668KB) ( 335 )  
Related Articles | Metrics
Code smell is a software feature of bad code or design problem, which seriously affects the reliability and maintainability of software systems. In a software system, a piece of code element may be affected by multiple code smells at the same time, which makes the quality of the software significantly reduced. Multi-label classification is suitable for this case, by placing multiple code smells with high co-occurrence in one label group, the correlation of code smells can be better considered, but the existing multi-label code smell detection methods do not consider the influence of the code smell detection order in the same code element. As a result, an ECC multi-label code smell detection method based on ranking loss is proposed. This method aims at minimizing ranking loss and chooses an optimal set of label sequences to optimize code smell detection order problem and simulate the mechanism of code smell generation by selecting random forest as the basic classifier and adopting multiple iterations of ECC to detect whether a piece of code element has long method-long parameter list, complex class-message chain or message chain-blob simultaneously. Finally, nine evaluation metrics are used and experimental results show that the proposed method is superior to the existing multi-label code smell detection method with an average F1 of 97.16%.
Recommending Interface Patches for Forward Porting of Linux Device Drivers Based on Existing Instances
Li Bin, He Yeping, Ma Hengtai, Rui Jianwu
2021, 58(1):  189-207.  doi:10.7544/issn1000-1239.2021.20200284
Asbtract ( 476 )   HTML ( 7)   PDF (1792KB) ( 353 )  
Related Articles | Metrics
The extent and scope of associated impact of Linux kernel version upgraded frequently on the drivers are very large. In order to repair the inconsistency error of the driver calling the kernel interface caused by this related impact, constantly modifying the old version drivers’ codes for forward porting is a continuing and urgent problem. There are existing researches on assistant understanding of driven evolution, assistant adaptation of driver porting middle lib and assistant information of driver porting. The efficiency of driver porting is improved by retrieving assistant information at the statement level. However, the existing methods only focus on retrieving assistant information itself without distinguishing the effective patch materials. Therefore, manual analysis and manual construction of adaptable patches are required. To overcome the above limitations, in this paper we propose a new method to recommend high quality patches for interface errors in drivers forward porting. We observe that: there are the same or similar kernel interfaces’ calls between multiple different drivers that rely on the same kernel interface services, and there may be existing instance codes in the development history of other drivers, which share the same interfaces reuse and interfaces changes after kernel version is upgraded. This paper uses the commonality of the error interface statements and similar existing instances in historical development information to analyze the characteristics of the error problem, and extracts targeted interface modification modes and contents of fine-grained materials to generate patches to be recommended. Specifically, the effective modification modes are determined by combining boundary point identification, similarity calculation, fine-grained difference comparison and frequency calculation. A classification algorithm based on the different characteristics of existing instances is proposed for the first time, by distinguishing the different types of modification contents, then content materials from two data sources are extracted respectively. Finally we use the editing script technology to generate the recommended patches using above materials. Experiment on 9 different types of real drivers shows that this method can recommend for 7 types of interface errors patches in driver porting, and the effective patches account for about 67.4%. Partly, it effectively supplements and expands existing assistant methods for driver porting.
A Qualitative Evaluation Approach for Requirement Change Technical Debt Based on Marginal Contribution
Zhang Yunjie, Zhang Xuan, Wang Xu, Ren Junmin, Tang Ziqi
2021, 58(1):  208-223.  doi:10.7544/issn1000-1239.2021.20190459
Asbtract ( 458 )   HTML ( 4)   PDF (4037KB) ( 140 )  
Related Articles | Metrics
Software technical debt uses the concept of “debt” in economics to describe the technical compromise implemented in software development for the short-term benefits. However, for the long-term goal, technical debt will affect the quality, cost and development efficiency, so it is necessary to manage it systematically and effectively. Aiming at the technical debt caused by the changing requirements in the software life cycle, the requirement change technical debt is defined and quantified firstly. Then, the idea of “marginal contribution” in economics is used to obtain the marginal contributions of the changing requirements. They are the basis of the priority for the requirement changes. Then, marginal contribution analytical method provides a reference for the implementation value of requirement changes. In the experiment and case study, taking Hadoop as an example, the feasibility of the marginal benefit for requirement changes is verified. Finally, gradient boosting decision tree is used to study the history reports of requirement changes in Spring Framework. A method for analyzing the requirement changes’ marginal contribution abilities is proposed. The priority of each field in change reports to its marginal contribution is ranked. The results show that the analysis method can provide valuable results for requirement engineers to measure their workload and risks.
Aspect Extraction Model Based on Interactive Feature Representation
Zeng Biqing, Zeng Feng, Han Xuli, Shang Qi
2021, 58(1):  224-232.  doi:10.7544/issn1000-1239.2021.20190305
Asbtract ( 572 )   HTML ( 15)   PDF (1357KB) ( 265 )  
Related Articles | Metrics
Aspect extraction is one of the key tasks in aspect level sentiment analysis, whose result will directly affect the accuracy of aspect level sentiment classification. In aspect extraction task, it is both time and labor consuming to enhance the performance of the model by handcraft features. Aiming at resolving the problems of insufficient data scale, insufficient feature information, etc., aspect extraction model based on interactive feature representation (AEMIFR) is proposed. Compared with other models, AEMIFR combines character level embedding and word embedding to capture the semantic features of words, the morphological features of characters and the internal relationship between characters and words. Furthermore, AEMIFR obtains the local feature representation and context-dependent feature representation of text, learns the interaction between the two feature representations, enhances the importance of similar features between the two feature representations, reduces the negative impact of useless features on the model, and learns higher quality feature representations. Finally experiments are conducted on the data sets L-14, R-14, R-15 and R-16 in SimEval 2014, SemEval 2015 and SemEval 2016, and the competitive effect is achieved.