Loading [MathJax]/jax/output/SVG/jax.js
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Publish Online Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes /issues, but are citable by Digital Object Identifier (DOI).
Abstract:

Subgraph matching is an optimization problem in graph, which is to find all matching subgraphs of the query graph in a large target graph. Although subgraph matching is an NP-Hard problem, the problem is common in many fields such as social networks, biochemistry, and cognitive science. Backtracking searching algorithms for subgraph matching have high time complexity, and the pruning strategy is essential to reduce the operating time. However, complex expanding in the existing pruning strategy leads to high complexity of time and space. To balance the efficiency and the effectiveness, only limited neighborhood structure information can be used in conflicting judging, which lets lots of useless states pass the pruning judging and wastes time. An efficient, accurate, and adaptive subgraph matching algorithm is proposed. The algorithm captures the detail structure of the whole graph by graph neural network, builds structure connections, and generates pruning possibilities for all candidate searching states. It replaces complex expanding pruning method with inferring by the neural network model to rapidly estimate the probability of pruning during searching. A data sampling mechanism is designed to alleviate the problem of network training collapse. Experiments show that using our pruning method in traditional backtracking search can improve search efficiency.

Abstract:

Collective spatial keyword queries play an important role in the fields such as spatial databases, location services, intelligent recommendations, and group intelligence perception. The existing collective spatial keyword query methods do not consider the problem of requiring time-distance constrained and cost aware, and cannot meet the query needs of most users under time-distance constrained. Existing research results have significant limitations. To make up for the shortcomings of existing methods, collective spatial keyword query based on time-distance constrained and cost aware (called TDCCA-CoSKQ) is proposed. To address the issue of not being able to include both keyword information and time information in existing indexes, the TDCIR-Tree index is proposed, which combines inverted files and time attribute label files. TDCIR-Tree can reduce the cost of query calculation. TDCCA_PP algorithm is proposed to address the issue of subsequent screening of collections that meet query criteria for TDCCA-CoSKQ, including TDCCAPruning1, TDCCAPermutation, and TDCCAPruning2, and it can improve the efficiency of keyword queries. The TDC cost function and its corresponding sorting algorithm are proposed. The TDC cost function is composed of distance cost and time cost, which includes independent variable coefficients representing user preference α and β, and it can increase users’ freedom of choice. The problem of existing cost functions not meeting the collective spatial keyword query based on time-distance constrained and cost aware is effectively solved. Theoretical research and experiments have shown that the proposed method has good efficiency and accuracy.

Abstract:

To facilitate researchers’ understanding of the application, acceptance, and funding processes for projects in the artificial intelligence discipline under the National Natural Science Foundation of China (NSFC), this paper provides a statistical analysis of the discipline’s projects in 2024. It first introduces the significant reform measures implemented by the NSFC in 2024. Subsequently, it summarizes and analyzes the application and funding status of projects for both the research and scholar series within the artificial intelligence discipline (F06) during the current year. Special attention is given to the changes in project applications and funding, shifts in the age distribution of applicants, and the distribution of host institutions, under the new reform measures. Finally, the paper provides an outlook on priority development directions in the field of artificial intelligence.

Abstract:

For the past few years, the storage industry has undergone tremendous changes. Semiconductor storage devices, like solid state drives (SSDs), have flourished and are able to completely outperform traditional hard disk drives (HDDs), addressing data by moving magnetic head. Nowadays, the mainstream protocols supporting SSDs are NVMe and SAS. NVMe is a high-performance storage protocol designed specifically for SSDs that can maximize the performance of SSDs; while the SAS protocol fully considers the requirements of data centers, providing high reliability and high scalability while considering the balance between system performance and cost. Compared with the increasingly fast storage media, the time overhead of the software stack designed for slow storage devices in an I/O process is becoming increasingly significant. To address this issue, numerous excellent works have been proposed by academia and industry. For example, Intel’s SPDK (storage performance development kit) has greatly shortened the response time of NVMe SSD to applications by implementing device drivers in user space and polling I/O completion, extremely improving the performance of the entire system. However, previous research on the optimization of SAS SSD storage software stack is very limited. Therefore the SAS software stack optimization for SSD is implemented in user space. Experimental result shows that this optimization can effectively improve the data access efficiency with applications and storage devices. Besides, aiming to accurately evaluate the time cost of storage devices in I/O stack, a hardware performance testing tool HwPerfIO is proposed, which can eliminate the impact of most software overhead to measure the more accurate storage equipment performance.

Abstract:

A large number of application practices have proven the effectiveness of fuzzy testing to detect program vulnerabilities. The existing fuzzy testing methods lack the analysis of differences in performance specific to the testing tasks and adjust testing policies appropriately. Instead, they mostly adopt a unified process, resulting in unsatisfactory testing results. It is necessary to modify the policy based on specific information during the testing process to achieve better testing performance, and a new program defect fuzzy testing method based on execution context orientation is proposed, which can break through the protection mechanism. By capturing and analyzing specific contextual information during the actual processing of input test cases by the tested program, and achieving rapid exploration of program structural features, the sample mutation policy can be optimized. Meanwhile, a prototype tool CBFuzzer for program defect fuzzy detection based on execution context orientation is implemented. The experimental results indicate that CBFuzzer can effectively explore the internal structure of programs (including breakthroughs in protection mechanisms), simulate unconventional program state transitions, and more efficiently expose vulnerability points. By comparison, CBFuzzer shows improvements ranging from 6.8% to 36.76% in terms of vulnerability exposure, with the highest increase in the number of actual vulnerabilities detected reaching up to 66.67%. With the investment of a small amount of additional testing resources within an acceptable range, CBFuzzer not only achieves improved detection performance for regular types of vulnerabilities but also exhibits higher detection capabilities for vulnerabilities with strong concealment. As of August 10, 2023, a total of 126 new vulnerabilities have been identified through the utilization of CBFuzzer in 13 testing tasks (reported to related software developers and submitted to CVE® organization).

Abstract:

Given the frequent cybersecurity incidents, anomaly detection methods have been widely employed for the identification of malicious behaviors. However, these anomalous accesses often exhibit prominent characteristics only in certain attribute fields, rendering the detection results susceptible to interference from attributes where anomalies are less prominent. To address this issue, MNDetecctor, an anomaly access detection framework that introduces the multiplex network structure into this field is proposed. Through association analysis, closely associated attribute fields are constructed into single-layer networks, with cross-layer connections added to form a multiplex network. Subsequently, cross-layer walks are performed to obtain node sequences within the same layer and across layers, facilitating node embedding. Ultimately, a hierarchical generative adversarial network is employed to merge reconstruction losses and discriminative results across different layers, thereby achieving anomaly access detection. Experimental results demonstrate that MNDetector surpasses the performance of state-of-the-art detection methods on multiple public datasets, achieving an approximately 8% increase in F1 score compared with commonly used methods. In-depth case studies elucidate the variation in detection outcomes across diverse scenarios by analyzing the distribution of anomalous attributes within fields. Furthermore, an examination from a network structural perspective clarifies the disparities among results obtained from different layers, substantiating MNDetector’s efficacy in addressing the attribute interference issue caused by attribute fields with insignificant anomalous characteristics.

Abstract:

In recent years, large-scale autoregressive Chinese pre-trained language models (PLMs) have demonstrated outstanding performance on various natural language processing (NLP) tasks. However, these models are computationally expensive, and their word-based vocabulary poses significant challenges for practical applications. In addition, most of them use only unidirectional context information, which may result in performance degradation on many tasks, especially tasks requiring a nuanced understanding of context. To address these challenges, we introduce LingLong, a high-quality small-scale Chinese pre-trained language model. LingLong stands out due to its modest scale, comprising only 317 million parameters, making it highly deployable and resource-efficient. We tokenize the training corpus with a character-based vocabulary to mitigate the negative impacts of unknown tokens and word segmentation errors. Moreover, we go beyond the conventional unidirectional context by introducing a novel backward model. This model is trained by reversing the input order of the training data. Combining LingLong and its backward version allows for the use of bidirectional information on downstream tasks. Extensive experimental results validate the effectiveness of LingLong across a diverse set of NLP tasks. LingLong outperforms similar-sized Chinese PLMs on six downstream tasks and surpasses popular large-scale Chinese PLMs on four downstream tasks. These findings underscore the versatility and efficiency of LingLong, opening up possibilities for practical applications and advancements in the Chinese NLP field.

Abstract:

With the rapid development of deep learning, signal modulation recognition based on deep neural networks has gained popularity in wireless communications research. However, it has been observed that the deep neural network model is vulnerable to adversarial perturbations, rendering the modulation identification task ineffective. Currently, there are theoretical gaps and bottlenecks in wireless communication security research. Due to the multidimensional nature of wireless communication, including factors such as experimental environments, data structures, and signal characteristics, it is not feasible to transfer the established attack and defense methods from other domains to signal countermeasures. In this paper, we comprehensively summarize the research on adversarial attack and defense technology in the field of signal modulation recognition. As the first Chinese review of its kind, we propose a generic classification framework and threat model for adversarial attacks in this field. Classify the research in this field into two categories: physical self-defense attacks and digital direct access attacks. Then, systematically integrate and visualize the research as two-dimensional diagrams to demonstratively showcase the methods, models, and techniques of adversarial attack. Additionally, provide details on the methods and models of adversarial attack. We present the latest research on adversarial attack methods, adversarial examples generation techniques, theoretical formulas, and adversarial detection and defense techniques. We systematically refine the characteristics of the three dimensions of adversarial attacks on wireless communications and summarize the corresponding processing methods. Finally, we summarize the future research and development direction of the attack and defense security field oriented towards signal modulation recognition.

Abstract:

In order to improve the hiding capacity of information hiding algorithms while ensuring the quality of generated text, we propose a generative information hiding method based on couplet carrier. Firstly, we pre-train the couplet text data and build a couplet generation model based on a multi-flow pre-training and fine-tuning framework; secondly, we use the subject words as the input to generate the first line of a couplet, and the model can generate the first line of couplets on the same subject words; then we use the first line of a couplet as the input to generate the second line of a couplet. The method mitigates the semantic ambiguity in the current couplet generation model by utilizing the span-by-span learning approach, the padding generation mechanism and the noise perception mechanism to ensure that the generated couplets correspond to each other in terms of their metrical patterns. The secret information can be effectively hidden by different choices of subject words, candidate the first line of couplets and candidate words for generating the second line of a couplet. The experimental results show that the method can obtain high hiding capacity, and the average hiding capacity of 7-word couplets can reach 10.24B, and the generated couplets satisfy the strict form and content requirements of couplets, such as equal number of words, comparable lexicality, proportional structure and harmonious ping-ze. The overall performance of the proposed method is better than the current mainstream generative text information hiding schemes.

Abstract:

In recent years, large language models (LLMs) represented by ChatGPT have developed rapidly. As the scale of model parameters continues to grow, building and deploying LLMs puts forward higher requirement for data scale and storage access efficiency, which poses significant challenges to traditional storage systems. This study first analyzes the storage access characteristics across the three critical stages of LLM workflows: data preparation, model training, and inference. It also explores in depth the major issues and bottlenecks faced by traditional storage systems in LLM scenarios. To address these challenges, the study proposes and implements ScaleFS, a high-performance and scalable distributed metadata design. ScaleFS adopts a decoupled design for directory tree metadata and attribute metadata, and combines with a hierarchical partitioning strategy that balances depth and breadth in the directory tree. This design enables efficient path resolution, load balancing, and system scalability, thereby making it capable of effectively managing hundreds of billions of files. Additionally, ScaleFS introduces fine-grained metadata structures, optimizes metadata access patterns, and develops a metadata key-value store tailored for file semantics. These innovations significantly improve metadata access efficiency while reducing disk I/O operations. The experimental results demonstrate that ScaleFS achieves operations per second (OPS) rates 1.04 to 7.12 times higher than HDFS, with latency reduced to only 12.67% to 99.55% of HDFS. Furthermore, at a scale of hundreds of billions of files, ScaleFS outperforms HDFS in most operations, even when HDFS operates at a billion-file scale. This demonstrates its superior scalability and access efficiency. ScaleFS is thus well-suited to meet the demands of LLM scenarios for managing and efficiently accessing massive file datasets.

Abstract:

With the increasing demand for people counting, the technology of human flow monitoring based on channel state information (CSI) attracts much attention because of its advantages such as easy deployment, privacy protection and strong applicability. However, in the existing human flow monitoring work, the accuracy of pedestrian recognition is easily affected by the density of the crowd. To ensure the monitoring accuracy, the monitoring can only be carried out when the crowd is sparse, which leads to the lack of practicability of the human flow monitoring technology based on CSI. In order to solve this problem, a monitoring method that can identify continuous flow of people is proposed. The method firstly uses phase unwrapping and linear phase correction algorithm to eliminate random phase offset and phase compensation for original data, then extracts valid data packets from continuous flow data by standard deviation and variance, and finally inputs phase difference information in the time domain as feature signals into the deep learning convolutional, long short-term memory, deep neural network (CLDNN) for pedestrian recognition. After actual testing, the method achieves outdoor accuracy of 96.7% and indoor accuracy of 94.1% under the condition that the distance between pedestrians in front and back is not less than 1 m, outperforming the existing method of human flow monitoring.

Abstract:

The existing multi-view clustering algorithms exhibit limitations in accurately capturing the high-order information and complementary information embedded in multi-view data during the low-dimensional representations learning process. Meanwhile, these algorithms fail to capture the local information of data, and their information extraction methods lack robustness to noise and outliers. To address these challenges, an adaptive tensor singular value shrinkage multi-view clustering algorithm named ATSVS is proposed. ATSVS proposes a novel tensor log-determinant function to enforce the low-rank constraint on the representation tensor, which can adaptively enable adaptive shrinkage of singular values based on their magnitude. Consequently, ATSVS effectively captures high-order information and complementary information within multi-view data from the global perspective. Then, ATSVS captures the local information of the data by using the l1,2 norm that combines the advantages of sparse representation and manifold regularization technology, while improving the robustness of the algorithm to noisy points by combining with l2,1 norms to impose sparse constraints on the noise. The experimental results with eleven comparison algorithms on nine different types of datasets show that our proposed algorithm ATSVS has the superior clustering performance, outperforming state-of-the-art baselines significantly. Consequently, ATSVS is an excellent algorithm that can effectively handle the task of clustering multi-view data.

Abstract:

Recent advances in large language models (LLM) have significantly elevated requirements for data quality in practical applications. Real-world scenarios often involve heterogeneous data from multiple correlated domains. Yet cross-domain data integration remains challenging due to privacy and security concerns that prohibit centralized sharing, thereby limiting LLM' effective utilization. To address this critical issue, we propose a novel framework integrating LLM with knowledge graphs (KG) for cross-domain heterogeneous data query. Our approach presents a systematic governance solution under the LLM-KG paradigm. First, we employ domain adapters to fuse cross-domain heterogeneous data and construct corresponding KG. To enhance query efficiency, we introduce linear knowledge graphs and develop a Homogeneous Knowledge Graph Extraction (HKGE) algorithm for graph reconstruction, significantly improving cross-domain data governance performance. Subsequently, we propose a Trusted Subgraph Matching (TrustHKGM) algorithm to ensure high-confidence multi-domain queries through confidence computation and low-quality node filtering. Finally, we design a Multi-Domain Knowledge-Linear Graph Prompting (MKLGP) algorithm to enable efficient and trustworthy cross-domain query answering within the LLM-KG framework. Extensive experiments on multiple real-world datasets demonstrate the superior effectiveness and efficiency of our approach compared to state-of-the-art solutions.

Abstract:

With the rapid development of large-scale model technology, these models have exhibited remarkable performance in fields such as natural language processing and computer vision, becoming essential tools for addressing complex issues and drawing significant interest from both the scientific community and the industry. Nonetheless, current cloud-platform-based schemes for training and inference of large models face multiple challenges, including high expenses, restricted scalability, and information security risks. As the scale of model parameters expands continually, the need for low-cost, efficient training and inference methods grows ever more pressing. Carrying out collaborative training and inference of large models on edge devices can dramatically decrease latency and bandwidth demands, concurrently reinforcing data privacy and operational efficiency. This strategy furnishes vital technological support for the economical deployment of large models across a variety of contexts, thereby evolving into one of the prominent research hotspots. This article conducts a thorough investigation of research pertinent to large models in the context of edge intelligence, with an in-depth analysis and discourse primarily focused on two aspects: edge-based training and inference of large models. Ultimately, it outlines the challenges confronted in the progression of large model technologies tailored for edge intelligence and delineates future prospects. The ambition is to stimulate a heightened comprehension and intensified attention from both academic and industrial sectors towards technologies involving large models for edge intelligence, thereby encouraging further scholarly exploration in this thriving domain.

Abstract:

Legal intelligence aims to analyze texts within the legal domain automatically by employing various natural language processing (NLP) technologies. This field has garnered significant attention from the NLP community. One of the most critical tasks in legal intelligence is Legal Judgment Prediction (LJP). This task seeks to forecast judgment outcomes, such as applicable law articles, charges, and penalties, based on the fact descriptions of legal cases, making it a promising application of artificial intelligence (AI) techniques. However, current LJP methods primarily address cases with a single defendant, neglecting the complexities of cases involving multiple defendants. In real-world criminal cases, multiple defendants are often involved, creating intricate interactions that single-defendant LJP technologies cannot accurately handle. These existing technologies struggle to distinguish judgment outcomes for different defendants in such scenarios. To advance research in LJP tasks involving multiple defendants, this paper presents a large-scale multi-defendant LJP dataset with three key characteristics: 1) It is the largest manually annotated dataset for multi-defendant LJP; 2) It necessitates distinguishing legal judgment predictions for each defendant; 3) It includes comprehensive judgment chains, covering criminal relationships, sentencing contexts, law articles, charges, and penalties. Furthermore, this paper conducts an extensive and detailed analysis of the dataset, examining the distribution of law articles, charges, penalties, criminal relationships, sentencing contexts, text length, and number of defendants. It also provides statistical insights into multi-defendant judgment results and the chain of judgment based outcomes. Additionally, this paper introduces a novel chain of judgment based method, featuring a strategy for generating judgment chains related to the crime facts and a comparison strategy to differentiate correct judgment chains from easily confused ones, enhancing overall effectiveness. Experimental results reveal that the multi-defendant LJP dataset presents a significant challenge to existing LJP methods and pre-trained models. However, the chain of judgment based LJP method significantly surpasses baseline methods, highlighting the crucial role of judgment chains in improving LJP.

Abstract:

Implicit discourse relation recognition aims at automatically identifying semantic relations (such as Comparison) between two arguments (sentence or clause) in the absence of explicit connectives. Existing methods have confirmed that the introduction of phrase information can effectively boost the performance. However, there are still the following shortcomings: 1) These models typically rely on syntactic parsers and do not fully capture the interactions between words, phrases, and arguments. 2) The problem of data sparsity often occurs during training when incorporating the phrase information. To address the above issues, we propose an implicit discourse relation recognition model based on multi-granularity information interaction (MGII) and develop a chain decoding-inspired data augmentation method (DAM). Specifically, our proposed model is designed to automatically acquire semantic representations of n-grams using a stacked convolutional neural network. It then explicitly models the interactions between words, phrases and arguments based on Transformer layers and ultimately predicts multi-level discourse relationships in a chain-decoding way. Our data augmentation method simultaneously pretrains both the encoding and decoding modules, enabling the effective utilization of massive explicit discourse data, which are naturally annotated by connectives, to mitigate the issue of data sparsity. The proposed method significantly outperforms recent benchmark models on the PDTB datasets. Furthermore, it does not rely on syntactic parsers, demonstrating strong applicability.

Abstract:

Stencil computations are widely adopted in scientific applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more complex in terms of stencil order, memory accesses, and computation patterns. To adapt stencil computations to GPU architectures, the academic community has proposed a variety of optimization techniques based on streaming and tiling. Due to the diversity of stencil computational patterns and GPU architectures, no single optimization technique fits all stencil instances. Therefore, researchers have proposed stencil auto-tuning mechanisms to conduct parameter searches for a given combination of optimization techniques. However, existing mechanisms introduce huge offline profiling costs and online prediction overhead, unable to be flexible to arbitrary stencil patterns. To address the above problems, this paper proposes a generalized stencil auto-tuning framework GeST, which achieves the ultimate performance optimization of stencil computations on GPU platforms. Specifically, GeST constructs the global search space through the zero-padding format, quantifying parameter correlations via the coefficient of variation to generate parameter groups. After that, GeST iteratively selects parameter values from the parameter groups, adjusting the sampling ratio according to the reward policy and avoiding redundant execution through hash coding. The experimental results show that GeST can identify better-performing parameter settings in a short time compared to other state-of-the-art auto-tuning works.

Abstract:

Public key encryption with keyword search (PEKS) over lattice plays an important role in ensuring the privacy, confidentiality, and flexibility of outsourced data while resisting quantum attacks. However, most existing lattice-based PEKS schemes are limited by the underlying preimage sampling algorithm, which suffers from high storage overhead or low efficiency issues. To address the above problems, an optimized public key encryption with keyword search scheme is first proposed. The scheme utilizes a new approximate trapdoor sampling algorithm to improve the computational efficiency. The algorithm outputs an approximate rather than an exact preimage. Then, a combination of non-spherical Gaussian sampling technique and an ideal extendable-output function is used to reduce key and trapdoor storage. Furthermore, an extended scheme with forward security and backward security is introduced to address the basic scheme’s update and search operation leakage. To avoid newly updated ciphertexts matching previous trapdoors, i.e., forward security, the key is periodically updated through a lattice-based delegation mechanism. To prevent subsequent searches from leaking information about deleted files, i.e., backward security, the addition and deletion of files is achieved by combining the bitmap index and lattice-based homomorphic encryption scheme. Theoretical analysis and experimental results exhibit that, compared with the efficient PEKS scheme, the proposed scheme reduces the public key storage overhead by 4.6% and the trapdoor storage overhead by 50.1%, and improves the efficiency of encryption, trapdoor generation, and search by 11.11%, 2.5%, and 26.15%, respectively.

Abstract:

With the global population aging and lifestyle changing, the management and treatment of chronic diseases become increasingly important. Chronic diseases include cardiovascular diseases, diabetes, chronic respiratory diseases, etc. They require long-term or even lifelong health management, the core of which is to design and implement long-term health plans, including balanced dieting, appropriate exercising, regular inspection, and medication management. In recent years, large language models make progress in the medical field but do not focus on chronic disease health management. Therefore, they lack understanding of Chinese dietary habits and culture. These medical large language models also have limited capabilities in handling numerical information. To address these issues, this paper constructs a chronic disease health management information system based on large language model. By integrating foundational knowledge of chronic diseases, health management guidelines, and actual health management plans as domain data, this paper trains the QingTing large language model as the core of the system for effectively answering health-related questions. Additionally, the system introduces a tool enhancement strategy, improving the QingTing’s ability to handle numerical information in health data by invoking tools. The system also adopts a retrieval-augmented generation technology based on uncertain knowledge graph to enhance the accuracy and reliability of QingTing. Experiments on the chronic disease health management information system based on a large language model demonstrate that QingTing significantly outperforms other baseline large language models in health management dialogues, and verify the effectiveness of the designed tool enhancement and retrieval-augmented methods.

Abstract:

Supercomputing has rapidly developed from traditional CPU clusters to heterogeneous platforms. With the type conversion of hardware platforms, it faces significant challenges in optimizing computing software programs and performance evaluation. Currently, some international mainstream parallel program performance analysis tools and software generally have low compatibility with domestic supercomputing heterogeneous system processor products, often requiring instrumentation and recompilation of code, and low accuracy in single node performance data collection. To improve these shortcomings, this article proposes a floating-point performance data collection method for heterogeneous system computing software. This method is based on the domestic supercomputing system verification platform to develop and verify the floating-point performance collection prototype. At present, effective collection of single node and multi node performance indicator data has been achieved, and it is non-invasive to the original program. There is no need to modify the code of the monitored program for monitoring in a plug-in manner, making it highly versatile. Finally, we conducted comparative experimental analysis with three types of programs: rocHPL, Cannon, and mixbench, and conducted performance data collection monitoring research on ResNet (residual network, ResNet) program for AI computing. We have demonstrated that the collection method proposed in this article has high accuracy, achieves the expected collection effect in experiments, and has good reference value for program optimization, verifying the effectiveness of the proposed method.

Abstract:

With the rapid development of multimedia and network technology, the security of digital image content is becoming more and more prominent. In this paper, we propose a deep perceptual image authentication hashing schema based on window self-attention feature fusion, that can effectively detect whether the perceptual content of the original image has changed. It can be applied to content authentication, tampering recognition, copy detection, and other similar scenarios. This model uses a convolutional neural network architecture that integrates a window self-attention mechanism to build a hashing model that encompasses global and local image features. The model chunks the shallow features obtained from the backbone network and extracts the corresponding window features, then calculates the correlation between each intermediate local feature and the global feature to filter out the final local features, and finally inputs the local features and global features into the hash generation module for fusion and compression to obtain the final image hash code. In the training process, an integrated loss function based on hash loss and classification loss is used to constrain the model to improve the robustness and discrimination. The experimental results show that this scheme can achieve superior image content authentication performance compared with existing typical perceptual authentication hashing schemes.

Abstract:

Software Code Cache is widely used in dynamic binary translators to manage the dynamically generated code blocks. The translation, refresh, and memory occupancy of code blocks are key metrics for software code cache. There has been little research on software code cache for system-level dynamic binary translators. Existing system-level dynamic binary translators use state label scheme to achieve correct and efficient instruction semantic simulation, but this scheme introduces additional problems for software code cache management. Through in-depth analysis of the state label scheme, two types of problems are summarized: conflicts and redundancies. To address these two problems, two code cache optimization schemes based on fine-grained state label are proposed, including multi-state code cache scheme and weak state label scheme. These two schemes are implemented in LATX-SYS and evaluated with Ubuntu/x86 16.04 and Windows XP/x86 system booting on LoongArch platform. The evaluation results show that the code block refresh and translation are reduced by 43% and 18% respectively. The code block similarity ratio is decreased from 59.63% to 5.06%. The translation overhead and memory occupancy are both reduced. Overall, the system boot time was reduced by 20%. Finally, testing of the weak state label scheme on SPEC CPU2000 shows that the number of code blocks is reduced by an average of 13%, with only 2%-3% performance overhead introduced.

Abstract:

3D shape reconstruction aims to recover the 3D structure information of the scene by using image sequences with different focus levels. Most of the existing 3D shape reconstruction methods evaluate the focus level of the image sequence from a single scale, and guide the reconstruction process by introducing regularization or post-processing methods. Due to the limitation of the selection space of depth information, the reconstruction results often cannot converge effectively. To address this issue, this paper proposes a multi-scale cost aggregation framework for shape from focus, MSCAS. Firstly, non-downsampling multi-scale transformation is introduced to increase the depth information selection space of the input image sequence, and then the cost aggregation is performed by combining the intra-scale sequence correlation and the inter-scale information constraint. Through this expansion-aggregation mode, the doubling of scene depth representation information and the effective fusion of cross-scale and cross-sequence representation information are realized. As a general framework, the MSCAS framework can embed existing model design methods and deep learning methods to achieve performance improvement. The experimental results show that the MSCAS framework in this paper reduces the root mean square error (RMSE) on average by 14.91% and improves the structural similarity (SSIM) by 56.69% in the four datasets after embedding the model design class SFF method. After embedding the deep learning class SFF method, the RMSE in the four datasets decreases by an average of 1.55% and the SSIM increases by an average of 1.61%. These results verify the effectiveness of the MSCAS framework.

Abstract:

With the rapid advancement of artificial intelligence generation models and deepfakes, the techniques for generating talking face videos using various methods have become increasingly mature. Among them, audio-driven talking face video generation methods have attracted significant attention due to their remarkably realistic and natural output. Such methods utilize audio as a driving source to synthesize videos where the target character’s mouth movements synchronize with the audio, often combining image or video materials. Currently, these technologies are widely applied in fields such as virtual anchors, gaming animation, and film and television production, demonstrating vast prospects for development. However, the potential negative impacts of this technology are also becoming apparent. Improper or abusive use could lead to serious political and economic consequences. In this context, research on identifying various types of facial forgery videos has emerged. This research primarily assesses the authenticity of videos by detecting the veracity of individual video frames or the spatio-temporal consistency of video sequences. Firstly, this paper systematically analyzes the classic algorithms and latest advancements in audio-driven talking face video generation tasks based on the timeline and the development history of foundational models. Secondly, it exhaustively lists the commonly used datasets and evaluation criteria for this task, conducting comprehensive comparisons across multiple dimensions. Subsequently, the paper meticulously analyzes and summarizes the forgery facial video identification task, categorizing it based on whether the discrimination technology focuses on individual video frames or multiple frames, and also summarizes its commonly used datasets and evaluation criteria. Finally, the paper outlines the challenges and future directions in this research field, aiming to provide valuable references and support for subsequent related research.

Abstract:

NTRU lattice is an important choice for building a practical post-quantum lattice-based key encapsulation mechanism. The software optimization engineering implementation of lattice cryptography is of great significance for the subsequent application deployment of post-quantum cryptography. CTRU is a lattice-based key encapsulation mechanism based on NTRU lattice proposed by Chinese scholars. At present, there only exists CTRU-768 scheme C and AVX2 implementation, and there is room for further optimization. In addition, the implementation of CTRU-768 cannot be directly extended to the CTRU-512 and CTRU-1024 schemes. This paper completes the first optimized reference C implementation of CTRU-512 and CTRU-1024 schemes and its variant CNTR-512 and CNTR-1024 with and the corresponding AVX2 parallel optimization implementation, and optimizes the existing CTRU-768 reference implementation and AVX2 implementation. It employs mixed radix number theoretic transformation (NTT) to accelerate polynomial multiplication and uses the Karatsuba algorithm to speed up the decomposition of low-degree polynomial ring multiplication. In addition, combined with the central Barrett reduction, this paper proposes index-based delay reduction in reverse NTT. For the time-consuming polynomial inversion under CTRU-1024 scheme, we employ the Bernstein fast inversion algorithm. Furthermore, this paper provides a more efficient AVX2 optimization implementation scheme, which uses the single instruction multiple data (SIMD) instruction set AVX2 proposed by Intel to accelerate the performance bottleneck in CTRU. This paper uses layer merging and coefficient permutation to reduce the load/store instructions. In addition, the Bernstein fast polynomial inversion algorithm is vectorized and optimized using AVX2. We also implement the time-consuming SHA-3 hash module in AVX2 assembly. Compared with the latest CTRU-768 scheme AVX2 implementation, the AVX2 optimized implementation in this paper improves by 8%-11%. For the CTRU scheme, compared with the reference implementation, the performance improvement of the AVX2 optimized implementation in this paper on three sets of parameters is significant. The key generation, key encapsulation, and key decapsulation improvements are 56%-91%, 74%-90%, and 70%-83% respectively.

Abstract:

In recent years, LLM(large language model) has exhibited remarkable performance, profoundly transforming various aspects of human life. As these models grow in size and user demand for long-context inference increases, LLM inference systems face significant storage challenges. These challenges stem primarily from the vast number of model parameters and the key value cache required for efficient inference, both of which strain GPU memory resources. Additionally, inefficiencies in storage usage in distributed systems often result in over-provisioning and fault tolerance issues, further complicating resource management. Researchers explore memory optimization, heterogeneous storage, and distributed storage, synthesizing various research efforts to address GPU memory constraints and enhance resource utilization. Memory-optimized LLM inference systems improve GPU memory efficiency and reduce memory footprint through techniques like efficient key value cache management, compression, and attention operator optimization. Heterogeneous storage based LLM inference systems expand storage capacity by integrating various storage resources, thereby minimizing I/O overhead via tensor placement strategies, asynchronous data transfer, and intelligent memory allocation and prefetching methods. Distributed LLM systems enhance the utilization of multi-machine resources, boosting execution efficiency and fault tolerance in LLM inference tasks through batching, multi-level scheduling, and redundant replication. Finally, we review existing research and outlines future research directions to further optimize storage solutions for LLM inference systems.

Abstract:

Audio recognition has been widely applied in the typical scenarios, like Auto-Driving, Internet of Things, and etc. In recent years, research on adversarial attacks in audio recognition has attracted extensive attention. However, most of the existing studies mainly rely on the coarse-grain audio features at the instance level, which leads to expensive generation time costs and weak universal attacking ability in real world. To address the problem, this paper proposes a phonemic adversarial noise (PAN) generation paradigm, which exploits the audio features at the phoneme level to perform fast and universal adversarial attacks. Experiments were conducted using a variety of datasets commonly used in speech recognition tasks, such as LibriSpeech, to experimentally validate the effectiveness of the PAN proposed in this paper, its ability to generalize across datasets, its ability to migrate attacks across models, and its ability to migrate attacks across tasks, as well as further validating the effectiveness of the attack civilian-oriented Internet of Things audio recognition application in the physical world devices. Extensive experiments demonstrate that the proposed PAN outperforms the compared baselines by large margins (about 24× speedup and 38% attacking ability improvement on average), and the sampling strategy and learning method proposed in this paper are significant in reducing the training time and improving the attack capability.

Abstract:

Graphs often have rich temporal information and evolve dynamically over time, which can be modeled as temporal graph streams. A temporal graph stream consists of a set of vertices and a series of timestamped and directed edges, where new vertices and edges arrive continuously over time. Temporal motifs are generalizations of subgraph patterns in static graphs which take into account edge orderings and durations in addition to topologies. Counting the number of occurrences of temporal motifs is a fundamental problem for temporal graph analysis. However, traditional streaming subgraph counting methods cannot support temporal matching, and are only suitable for simple graphs that do not contain temporal information. In addition, existing temporal motifs counting methods suffer from poor performance in temporal graph streams. We thus study approximate temporal motif counting via random sampling in temporal graph streams. We propose a generic streaming edge sampling (SES) algorithm to estimate the number of instances of any temporal motif in a given temporal graph stream. We then provide comprehensive analyses of the theoretical bounds and time complexities of SES. Finally, we perform extensive experimental evaluations for SES on four real world datasets. The results show that SES achieves up to three orders of magnitude speedups over the state-of-the-art sampling methods while having comparable estimation errors for temporal motif counting in the streaming setting.

Abstract:

By merging the functions of Boolean logic and non-volatile memory, memristive stateful logic can achieve the real sense of in-memory computing through eliminating data movement during computation, which breaks the “memory wall” and “energy wall” of traditional von Neumann computing system. In recent years, a series of the memristor-based in-memory stateful logic gates have been proposed by linking the “conditional switching” process and mathematical logic function, whose functions cover multiple logic functions such as IMP, NAND, NOR, and NIMP etc. However, the automated synthesis and mapping method for implementing the in-memory complex stateful logic computation by cascading the stateful logic gates is still embryonic, especially lacking the investigations on the device wear, which limits the application of in-memory stateful logic in edge computing scenarios. To reduce the device wear (toggle rate) in a complex in-memory stateful logic computation process, we propose a stateful logic synthesis and mapping process based on multiple stateful logic gates for low-wear in-memory computing. Compared with the state-of-art two stateful logic synthesis and mapping tools of SIMPLER-MAGIC and LOSSS, the proposed low-wear logic synthesis and mapping process achieves an average improvement of over 35.55% and 8.48% in the toggle rates respectively under the EPFL combinational benchmark circuits. Moreover, the proposed tool achieves an average improvement of over 47.26% and 6.72% in the toggle rates respectively under the LGSynth91 benchmark circuits.

Abstract:

Image-text cross-modal entity linking is an extension of traditional named entity linking. The inputs are images containing entities, which are linked to textual entities in the knowledge base. Existing models usually adopt a dual-encoder architecture. It encodes entities of visual and textual modality into separate vectors, then calculates their similarities using dot product, and links the image entities to the most similar text entities. The training process usually adopts the cross-modal contrastive learning task. For a given modality of entity vectors, this task pulls closer the vector of another modality that corresponds to itself, and pushes away the vector of another modality corresponding to other entities. However, this approach overlooks the differences in representation difficulty within the two modalities: visually similar entities are often more difficult to distinguish than textual similar entities, resulting in the incorrect linking of the former ones. To solve this problem, we propose two new contrastive learning tasks, which can enhance the discriminative power of the vectors. The first is self-contrastive learning, which aims to improve the distinction between visual vectors. The second is hard-negative contrastive learning, which helps a textual vectors to distinguish similar visual vectors. We conduct experiments on the open-source dataset WikiPerson. With a knowledge base of 120k entities, our model achieves an accuracy improvement of 4.5% compared to the previous state-of-the-art model.

Abstract:

With the rapid development of natural language processing and deep learning technologies, large language models have been increasingly applied in various fields such as text processing, language understanding, image generation, and code auditing. These models have become a research hotspot of common interest in both academia and industry. However, adversarial attack methods allow attackers to manipulate large language models into generating erroneous, unethical, or false content, posing increasingly severe security threats to these models and their wide-ranging applications. This paper systematically reviews recent advancements in adversarial attack methods and defense strategies for large language models. It provides a detailed summary of fundamental principles, implementation techniques, and major findings from relevant studies. Building on this foundation, the paper delves into technical discussions of four mainstream attack modes: prompt injection attacks, indirect prompt injection attacks, jailbreak attacks, and backdoor attacks. Each is analyzed in terms of its mechanisms, impacts, and potential risks. Furthermore, the paper discusses the current research status and future directions of large language models security, and outlooks the application prospects of large language models combined with multimodal data analysis and integration technologies. This review aims to enhance understanding of the field and foster more secure, reliable applications of large language models.

Abstract:

With the continuous development and rapid popularization of 5G networks, the number of user devices and potential demand is increasing sharply. However, the high frequency of 5G signals leads to significant propagation losses. In order to achieve broader coverage of user devices, it is necessary to optimize existing 5G base station sites or guide the selection of new base station sites with low cost and high efficiency. The state-of-the-art methods for site selection mostly use heuristic algorithms to optimize the sites. However, the convergence time increases exponentially with the increase of the number of possible 5G base station sites, bringing many challenges for the site optimization. Therefore, we propose the method of selecting 5G base station sites based on user demand points to sufficiently consider the communications among users. Specifically, the planning area gridding method is proposed to reduce the time complexity of computation for user demand points covered by base stations. Then, the concept of separate degree among base stations is proposed and measured based on the number of user demand points covered by the base station. We give the objective function that satisfies the submodularity and the greedy algorithm to obtain the optimal scheme of base station site selection. Experimental results show that the proposed method outperforms the comparative algorithms on all evaluation metrics, and can effectively improve the coverage of 5G base station signals. In the same base station planning area, our proposed method achieves the maximum coverage rate with the minimum number of 5G base stations, thereby effectively reducing the construction cost of 5G base stations.

Abstract:

Sequential recommendation is centered on mining users' preferences and behavior patterns from their interaction sequences. Existing works have recognized the inadequacy of single-modal interaction data, and have utilized a large amount of multi-modal data, including item reviews, homepage images, and other sources, to complement interaction data and improve recommendation performance. However, these multi-modal data are often interspersed with unavoidable noise that may limit the exploration of personalized user preferences. While suppressing inter-modal inconsistent information can reduce noise interference, it is almost impossible to completely eliminate noise from user-generated multimodal content. To address the above challenges, we propose a Large language model-based Trusted multi-modal Recommendation (Large-TR) algorithm, which aims to provide the trustworthy recommendation in noisy multi-modal data scenarios. Specifically, the algorithm relies on the excellent natural language understanding capability of the large language model, which efficiently filters the noise in multi-modal data and achieves more accurate and detailed modelling of user preferences. Additionally, we design a trustworthy decision mechanism that dynamically evaluates the uncertainty of recommendation results and ensures the usability of recommendation results in high-risk scenarios. Experimental results on four widely used public datasets show that the algorithm proposed in this paper has better performance compared to other baseline algorithms. Our source code is available at https://github.com/ hhbray/Large-TR.

Abstract:

Given the risk of adversarial attacks on tracking models and the lack of relevant adversarial detection methods, this paper addresses the problem from the perspective of frequency domain. Combined with the visual invisible property of perturbation noise, this paper first theoretically proves that perturbation noise mainly exists in the mid-to-high frequency bands of images. Then we quantitatively analyze that the low-frequency components of the video sequence contribute the most to tracking performance and are least affected by adversarial attacks. Finally, based on the above theoretical proof and qualitative analysis, this paper proposes a detection framework based on the tracking performance difference of frequency bands, in which the frequency domain decomposition module for extracting the low-frequency components of the video sequence. The target tracker and its mirror tracker with the same structure and parameters respectively take the full-frequency and low-frequency components of the video sequence as input. The discriminator module determines whether the input video sequence is an adversarial input by comparing the output differences of the two trackers. This detection framework uses a tracker as a carrier and does not require adversarial training. It can achieve adversarial detection only by comparing the tracking performance difference across different frequency bands. Extensive experimental results show that the detection framework can not only effectively detect current mainstream adversarial attacks, such as CSA, TTP, and Spark with a detection precision of 97.55%, but also has little negative impact on the original tracking performance of the tracker. In addition, this framework is generalizable and can be flexibly integrated into multiple trackers, such as SiamRPNpp, SiamMask, SiamCAR, and SiamBAN.

Abstract:

Pre-trained models have mitigated the challenges posed by extensive training data and computational resources, and also give birth to the new paradigm of model development and application, which we refer to as model supply chain. In this framework, a pre-trained model is uploaded by its publisher and subsequently transferred, compressed, and deployed by secondary developers to meet various application needs. This emerging model supply chain introduces additional stages and multiple elements, inevitably leading to security concerns and privacy risks. Despite the widespread adoption of model supply chains, there is currently a lack of systematic review of security threats in them. To address this research gap, in this paper, we provide a comprehensive overview of the deep learning model supply chain, introducing its concept and fundamental structure. We conduct an in-depth analysis of vulnerabilities at various stages of the model’s lifecycle, including design, development, deployment, and usage. Furthermore, we compare and summarize prevalent attack methods, alongside introducing corresponding security protection strategies. To assist readers in effectively utilizing pre-trained models, we review and compare publicly available model repositories. Finally, we discuss potential future research avenues in areas such as security checks, real-time detection, and problem tracing. It aims to offer insights for safer and more reliable development and use of pre-training models. For the benefit of ongoing research, related papers and open-source codes of the methods discussed are accessible at https://github.com/Dipsy0830/DNN-supply-chain-survey.

Abstract:

Model-based diagnosis mainly models the behavior of the system, and once the abnormal behavior is observed, a diagnosis algorithm is run on the system model to return a possible explanation. The existing diagnosis algorithm computes a minimal hitting set (MHS) each time a conflict set is identified, and then verifies whether this MHS satisfies the system observations. While this approach reduces the generation of redundant solution sets, the difficulty of computing the MHSs of conflict sets increases exponentially with the number of conflict sets. Since computing the MHS of a partial conflict set is not necessarily a diagnosis, it is also time-consuming to check whether the MHS satisfies the system observations. We have designed a filtering function to remove low-quality conflict sets based on the diagnosis cardinality and quantity, while ensuring that the obtained hitting sets are as diagnosis as possible. In addition, to facilitate the rapid verification of hitting sets for diagnosis, we have proposed an efficient decision algorithm based on the logical relationships of the circuit. In the experimental section, we conducted a detailed analysis comparing the runtime and diagnosis yield under varying numbers of fault conditions. Compared to state-of-the-art algorithms, our approach showed efficiency improvements of up to 2-40 times in runtime and diagnosis yield enhancements ranging from 5-200 times.

Abstract:

In complex environments and under sudden background noise conditions,speech enhancement tasks are extremely challenging due to the limited capturing of spectrogram features by existing methods, especially in capturing local information of the spectrogram. Previous works on Transformer models primarily focused on global information of the audio while neglecting the importance of local information. Many models only utilized the magnitude information and ignored the phase information after the audio underwent Short-Time Fourier Transform (STFT), resulting in suboptimal capturing of spectrogram features and unsatisfactory speech enhancement results.Based on this, this paper proposes a dual-branch speech enhancement neural network with convolutional enhancement window attention. The model adopts a U-Net architecture and simultaneously models the magnitude and phase information of the audio through the dual-branch structure. A complex computation module is introduced for information interaction between the two branches. The convolutional enhancement window attention module is employed in the skip-connection part between the encoder and decoder layers. It performs self-attention based on non-overlapping windows, significantly reducing the computational complexity of the speech enhancement model while capturing local contextual information. The proposed model is evaluated on the publicly available Voicebank-Demand dataset. Compared to the baseline models DCUNET16 and DCUNET20, it achieves improvements of 0.51 and 0.47, respectively, in the perceptual evaluation of speech quality (PESQ) metric. Other evaluation metrics also show significant enhancements. Compared to various existing speech enhancement models, the proposed model outperforms them in various metrics, particularly demonstrating remarkable improvements in PESQ scores.

Abstract:

Multi-anchor graph approaches have attracted more and more attention for their potential in addressing the challenges of large-scale multi-view clustering. However, existing methods leveraging multi-anchor graphs encounter several hurdles when it comes to tackling this challenge. The consistency-anchored graph learning methods struggle with handling misaligned anchor graphs and necessitates additional post-processing with consistency graph, thereby constraining the accuracy and reliability of clustering outcomes. And the anchor graph ensemble clustering method fails to harness the complementary information from different views during the independent generation of candidate base clustering and overlooks the original anchor graphs during fusion, thus impacting the effectiveness and stability of clustering results. To address these challenges, we propose a novel approach based on double-ended joint learning for multi-view clustering. The method fully considers the duality between multi-anchor information and samples in multi-anchor graphs, achieving synchronized clustering between anchor-end and sample-end. Moreover, under the guidance of multi-anchor information, it achieves joint alignment between sample-end clustering and multiple anchor-end clustering. Unlike existing methods, the approach does not require direct learning of consistent anchor graph, thus can handle any type of anchor misalignment issues and mitigating the negative impact of separate graph learning and partitioning on clustering performance. Additionally, it utilizes multiple anchor graphs for anchor-end clustering and sample-end clustering within a unified optimization framework, effectively addressing the limitations of base clustering and the ensemble stage in leveraging multiple anchor graphs. Experimental results demonstrate that the proposed method outperforms several comparative methods in terms of clustering performance and time consumption, effectively enhancing the clustering performance of multi-view data. The relevant code for the proposed method and comparative methods is provided in the supplementary material: http://github.com/lxd1204/DLMC.

Abstract:

The Domain Name System (DNS) recursive resolving service acts as a bridge between users and upstream DNS authoritative servers to enable users conveniently resolving domain names through local DNS servers. However, as the first gateway for communication with users, DNS recursive resolving services have become a significant target for attacks on Internet infrastructure. Given the vast scale and variety of DNS recursive service deployments, current DNS security enhancements struggle with implementation complexity and compatibility issues. Despite its importance, there is a noticeable lack of research focused on the deployment of security protection mechanisms for DNS recursive services, as well as the comprehensive assessment of the associated security threats. To bridge this gap, we categorize the security risks associated with DNS recursive services into five main types: cache poisoning, DNS hijacking, direct attacks on recursive servers, leveraging recursive servers to target other servers, and exploiting software vulnerabilities. Additionally, we provide a summary of the latest research on DNS recursive service security threats and DNS security enhancement mechanisms. Our review also summarizes measurement methods for assessing the security risks. Finally, we analyze the current state of DNS recursive service security and offer insights into future research directions for improving the security monitoring and governance of DNS recursive services.

Abstract:

The purpose of computing first network is to deeply integrate the ubiquitous computation with network, in order to effectively allocate multi-dimensional basic resources such as computation and storage between clouds, edges and ends through the network, allowing users to use them as transparently as water and electricity resources. Computing resources can be requested on demand and used at any time. Due to heterogeneous computing resources, dynamic network and diverse user needs, it has become one of the core challenging problems to effectively schedule and route resources in computing first network. To address this problem, we design a Multi-tier computing resource system(CRS). Different from the existing resource allocation, CRS is a complete computing first network technology solution based on the application layer, considering the computing resources awareness and computational routing. The computing resource system is composed of computing resource awareness strategy and computing resource routing protocol. The computing resource awareness strategy defines the intra-domain awareness rules within the jurisdiction and the inter-domain awareness rules between different jurisdictions. Based on this, we proposed a Greedy-Based Resource Routing Algorithm (GBRA), which can dynamically generate a search tree for each task. The computing resource routing protocol completes the allocation of resources through CRS request message, authorization notification message, notification confirmation message and CRS response message. Through extensive simulation experiments, compared with other algorithms, it is demonstrated that CRS can complete the resource allocation of more tasks within the maximum response latency tolerated. In addition, better load balancing can be achieved among the computing nodes within the jurisdiction.

Abstract:

Lossless networks are increasingly widely used in high performance computing (HPC), data centers and other fields. Lossless networks use link layer flow control to ensure that packets will not be dropped by switches due to buffer overflow, thus avoiding loss retransmission and greatly improving the latency and throughput performance of applications. However, the negative effects introduced by link layer flow control (congestion spreading, deadlock, etc.) impose challenges for the large-scale deployment of lossless networks. Therefore, the introduction of traffic management technology to improve the scalability of lossless networks has received great attention. We systematically review the research progress of traffic management in typical lossless networks used in HPC and data centers including InfiniBand and lossless Ethernet. First, we introduce the negative impact of link layer flow control and the goals of traffic management, and summarize the traditional traffic management architecture of lossless networks. Then according to the traffic management technical route (congestion control, congestion isolation, load balancing etc.) and the driven location (sender-driven, receiver-driven, etc.), we classify and elaborate on the latest research progress of InfiniBand and lossless Ethernet traffic management, and analyze the corresponding advantages and limitations. Finally, we point out the issues that need to be explored in further research on lossless network traffic management, including unified architecture for traffic management, joint congestion management within the host and the network, and traffic management for domain applications.

Abstract:

As the scale of available data increases, the importance and impact of machine learning grows. It has been found that quantum computing can be realized with the help of the principles of quantum mechanics, and the quantum machine learning algorithm formed by combining quantum computing and machine learning can theoretically produce exponential acceleration advantages over classical machine learning algorithms. Quantum versions of many classical algorithms have been proposed which may solve problems that are difficult to solve on classical computers. At present, limited by the quantum computing hardware, the number of controllable qubits, noise, and other factors restrict the development of quantum computers. Quantum computing hardware is unlikely to reach the level needed for universal quantum computers in the short term, and current research focuses on algorithms that can run on Noisy Intermediate-Scale Quantum (NISQ) computers. Variational quantum algorithms (VQAs) are hybrid quantum classical algorithms which are suitable for current quantum computing devices. Related research is one of the research hotspots in the field of quantum machine learning. Variational quantum circuits (VQCs) are parameterized quantum circuits (PQCs) used in variational quantum algorithms to solve quantum machine learning tasks. It is also be called Ansatz and quantum neural networks (QNNs). The framework of variational quantum algorithm mainly contains five steps: 1) Design the loss function according to the task. Design parameterized quantum circuits as model and initialize parameters. 2) Embed classical data. The classical data is pre-processed and encoded to the quantum state. If quantum data is used as input, it only needs to be pre-processed without encoding. 3) Calculate the loss function through parameterized quantum circuit. This step is where quantum advantage comes in. 4)Measure and post-process. Through quantum measurement operation, the quantum superposition state wave packet collapses into classical state. The classical data can be obtained after post-processing. 5) Optimize the parameters. Update parameters and optimize the model with classical optimization algorithms then return to step 3 until the loss function converges after several iterations. We can obtain a set of optimal parameters. The final result is the output of the optimal model. This paper reviews the basic theory of quantum computing and the basic framework of variational quantum algorithm. We further introduce the application and progress of variational quantum algorithm in the field of quantum machine learning. We review supervised quantum machine learning including quantum classifiers,unsupervised quantum machine learning including quantum circuit born machine,variational quantum boltzmann machine and quantum autoencoder, semi-supervised quantum learning including quantum generative adversarial network,quantum reinforcement learning, and quantum circuit architecture search in detail. We compare the models and analyse their advantages and disadvantages. We briefly discuss and summarize the related datasets and simulation platforms that can reproduce the introduced models. Finally, we put forward the challenges and future research trends of quantum machine learning algorithms based on variational quantum circuit.

Abstract:

Time-Sensitive Networking (TSN) has emerged as a primary choice for communication in distributed real-time systems such as industrial automation, avionics, and automotive applications. TSN traffic planning aims to allocate conflict-free transmission times for time-sensitive frames while managing constraints related to network topology, resources, device capabilities, and stream requirements. The traffic planning problem is NP-complete. There is a need of quick development of open-source traffic planning software for both academia and industry. Our paper introduces LOCAP, an architecture for TSN planning with interfaces named Minimum Collection of Planning and General Table of Planning. LOCAP separates planning algorithms and tools, as well as planning software and hardware details. Based on LOCAP, we implemented an open-source TSN planner called OpenPlanner. OpenPlanner integrates multiple algorithms that leverage satisfiability modulo theories and heuristics to solve planning problems. We evaluate the runtime and solution quality of various algorithms using OpenPlanner, highlighting the need for diverse planning algorithms in different TSN applications. To the best of our knowledge, OpenPlanner is the first open-source TSN planner. Its planning results have been deployed on multiple hardware platforms, including OpenTSN, Yinhe Hengxin TSN chip, and XZ-TTE. It has been applied in various systems such as satellites, unmanned vehicles, and artillery.

Abstract:

Open-vocabulary multi-label action recognition tasks aim to identify various human actions in videos that were not seen during the training phase. Compared to traditional action recognition, this task is more practical as it closely mirrors real-world scenarios and has broader application prospects. However, it poses significant challenges in effectively generalizing models to unseen action categories. To address this issue, this paper proposes an open-vocabulary multi-label action recognition method enhanced by the knowledge of large language models knowledge. This method extracts rich co-occurrence knowledge of action categories implicit in large language models and incorporates this co-occurrence knowledge into prompt learning of visual-language models, facilitating information transfer between base classes and novel classes to improve the recognition performance of novel classes. We set up two ratios of base action classes to novel action classes in experiments, namely 3꞉1 and 1꞉1, represented as "75% seen" and "50% seen" respectively. Experimental results on the AVA and MovieNet datasets show that compared to existing methods, when the base action classes are "75% seen", our method improves the mAP metric for novel action recognition by 1.95% and 1.21% on the AVA and MovieNet datasets, respectively. When faced with the more challenging scenario of "50% seen", our method improves the mAP metric for novel action recognition by 2.59% and 1.06% on the two datasets, respectively.

Abstract:

Knowledge graphs often face the challenge of incompleteness, which can be alleviated by completing missing information through link prediction tasks. However, most knowledge graph completion works overly focus on extracting embedding features without sufficiently considering the complex semantics contained in the predicted node neighborhood information, global feature information, and directional feature information, making it difficult to accurately predict the missing information. This paper proposes a general representation learning semantic enhancement framework, ASFR, which utilizes an attention mechanism to extract local association information of the knowledge graph and structural features of the knowledge graph, and enhances existing knowledge graph representation learning models by incorporating positional information. By embedding these three types of additional knowledge graph information into the entity vectors of the knowledge graph, the quality of the knowledge graph representation vectors is improved. Comparative experiments are conducted using five different categories of classical methods, and the results indicate that this framework can effectively enhance the predictive capability of models, achieving an improvement of 6.89% on three public datasets.

Abstract:

As the fundamental computing component in constructing large-scale supercomputing systems, GPUs are undergoing architectural diversity and heterogeneity. GPU accelerators from various chip manufacturers exhibit significant variations in their architectural designs. Accelerator diversity and programming model diversity are important technical trends for building large-scale supercomputing systems. Diverse accelerators require developers to provide high-performance software for multiple hardware platforms, resulting in software duplication. To reduce the cost of duplication, the unified programming model SYCL (system-wide compute language) adapts to multiple hardware platforms, but SYCL’s performance on different hardware is not as good as the native programming model of the platform, and SYCL’s performance needs to be further optimized. In order to be able to apply the mature and complete CUDA (compute unified device architecture) programming ideas and high-performance programs to SYCL, it is necessary to discuss the performance of high-performance CUDA programs ported to SYCL on multiple platforms and the ideas for further optimization. Based on software-hardware co-design, this paper proposes paraTRANS: a common operator optimization system for the code migration process of cross-heterogeneous programming model SYCL, and gives the optimization methods for the migrated SYCL GEMM (general matrix multiplication) in different scenarios. The paper evaluates the performance of SYCL GEMM optimized by paraTRANS, which can achieve 96.95% of CUDA’s FLOPS on the original NVIDIA RTX 3090, and 100.47% of CUDA’s hardware peak performance percentage on AMD MI100, both close to the level before migration. This paper provides ideas for porting high-performance CUDA code to SYCL and further optimization.

Abstract:

With run-time configurable hardware, coarse-grained reconfigurable array (CGRA) is a potential platform to provide both program flexibility and energy efficiency for data-intensive applications. To exploit the access parallelism of the multi-bank memory, memory partitioning is usually introduced to CGRAs. However, existing works for memory partitioning on CGRAs either achieve the optimal partitioning solution with expensive addressing overheads or achieve area-and-energy efficient hardware at the sacrifice of more bank consumption. To this end, this paper proposes an efficient memory partitioning approach for loop pipelining on CGRA via access pattern morphing. By performing a memory partitioning and scheduling co-optimization on multi-dimensional arrays, a memory partition-friendly access pattern is formed in the data domain such that it can be partitioned with a minimized number of all-one partitioning hyperplanes, resulting in both optimized partition factor and reduced addressing overhead. To solve the partitioning problem, we first propose a backtracking-based scheduling algorithm to find the partition-friendly pattern with minimized initiation interval. Then, based on the partitioning result, we also propose an energy-area-efficient CGRA architecture by simplifying the address generators in load-store units. The experimental results show that our approach can achieve 1.25 × energy efficiency while keeping a moderate compilation time, as compared to the state-of-the-art.

Abstract:

Continuous-flow microfluidic biochips (CFMBs) have become a hot research topic in recent years due to their ability to perform biochemical assays automatically and efficiently. For the first time, PathDriver+ takes the requirements of the actual fluid transportation into account in the design process of CFMBs and implements the actual fluid transport and removal, and plans separate flow paths for each transport task, which have been neglected in previous work. However, PathDriver+ does not take full advantage of the flexibility of CFMBs routing because it only considers the optimization of flow channel length for the global routing in the mesh model, but not the detailed routing. In addition, PathDriver+ only considers the X architecture, while the existing work shows that the any-angle routing can utilize the routing resources more efficiently and shorten the flow channel length. To address the above issues, this paper proposes a flow path-driven arbitrary angle routing algorithm, which can improve the utilization of routing resources and reduce the flow channel length while considering the actual fluid transportation requirements. The proposed algorithm constructs a search graph based on constrained Delaunay triangulation to improve the search efficiency of routing solutions while ensuring the routing quality. Then, a Dijkstra-based flow path routing method is used on the constructed search graph to generate a routing result with a short channel length quickly. In addition, in the routing process, channel reuse strategy and intersection optimization strategy are proposed for the flow path reuse and intersection number optimization problems, respectively, to further improve the quality of routing results. The experimental results show that compared with the latest work PathDriver+, the length of channels, the number of ports used, and the number of channel intersections are significantly reduced by 33.21%, 11.04%, and 44.79%, respectively, and the channel reuse rate is improved by 26.88% on average, and the total number of valves introduced at intersections is reduced by 42.01% on average, which demonstrates the effectiveness of the algorithm in this paper.

Abstract:

With the advancement of electronic design automation, continuous-flow microfluidic biochips have become one of the most promising platforms for biochemical experiments. This chip manipulates fluid samples in milliliters or nanoliters by utilizing internal microvalves and microchannels, and thus automatically performs basic biochemical experiments, such as mixing and detection. To achieve the correct bioassay function, the microvalves deployed inside the chip are usually managed by a multiplexer-based control logic, and valves receive control signals from a core input through the control channel for accurate switching. Since biochemical reactions typically require high sensitivity, the length of control paths connecting each valve needs to be reduced to ensure immediate signal propagation, and thus to reduce the signal propagation delay. In addition, to reduce the fabrication cost of chips, a vital issue to be addressed in the logic architecture design is how to effectively reduce the total channel length within the control logic. To address the above issues, this paper proposes a deep reinforcement learning-based control logic routing algorithm to minimize the signal propagation delay and total control channel length, thereby automatically constructing an efficient control channel network. The algorithm employs the Dueling Deep Q-Network architecture as the agent of the deep reinforcement learning framework to evaluate the tradeoff between signal propagation delay and total channel length. Besides, the diagonal channel routing is implemented for the first time for control logic, thus fundamentally improving the efficiency of valve switching operations and reducing the fabrication cost of the chip. The experimental results demonstrate that the proposed algorithm can effectively construct a high-performance and low-cost control logic architecture.

Abstract:

Timing anomalies are counter-intuitive behaviors observed in worst-case execution time (WCET) analysis. A key aspect of these anomalies is that a locally faster execution does not necessarily lead to a reduction in the overall program execution time. Therefore, WCET analysis must examine all possible execution states conservatively to ensure the safety of the analysis results, making the process extremely challenging. On the contrary, if it can be ensured that there are no timing anomalies in the program and platform to be analyzed, the number of states and the time required for WCET analysis can be significantly reduced. Consequently, addressing timing anomalies is a critical challenge in WCET analysis. However, despite more than 20 years of research, the academic community has not reached a unified definition and consensus on the problem of timing anomalies. This article reviews various perspectives from the literature since the concept of timing anomalies was first introduced. We classified these viewpoints based on their definitions and descriptions, and evaluates their respective strengths and weaknesses. Additionally, we investigated the causes of timing anomalies, identifying three main factors: scheduling strategies, cache behavior, and component interactions. Furthermore, we explored current research efforts aimed at detecting and eliminating timing anomalies, highlighting the issues and limitations of these approaches. we suggest that future research on timing anomalies should be integrated with WCET analysis methods to more effectively address these challenges.

Abstract:

Due to the expensive cost of production of paired images, unpaired low-light image enhancement methods are more practical as they do not rely on paired image data. However, their lack of detailed supervised signals leads to visual degradation problems such as global exposure inconsistencies, color distortions, and lots of noise in the output image, which makes them challenging for practical applications. We propose an unpaired low light enhancement method based on global consistency (GCLLE) to meet practical needs. Firstly, we remodel and fuse the same scale features of the encoder and decoder through the Global Consistency Preserving Module (GCPM) to correct the contextual information of different scales, to ensure the consistency of the global exposure adjustment and the global structural consistency of the output image, making the image light distribution uniform and avoiding the distortion; The Local Smoothing and Modulation Module (LSMM) is used to learn a set of local low-order curve mappings, which provides extended dynamic range and further improves the quality of the image to achieve realistic and natural enhancement; The proposed Deep Feature Enhancement Module (DFEM), which uses two-way pooling to fuse deep features, compresses irrelevant information and highlights more discriminative coded features, reducing inaccuracies and making it easier for the decoder to capture low-intensity signals in the image and retaining more details. Unlike pairwise enhancement, which focuses on the one-to-one mapping relationship between pixels in paired images, GCLLE enhances by reducing the stylistic differences between low-light and unpaired normal-light images. Through extensive experiments on MIT and LSRW datasets, the method proposed in this paper outperforms the classical low-light enhancement algorithms in several objective metrics, demonstrating the effectiveness and superiority of our method.

Abstract:

Dynamic Functional Connections (dFCs) can be regarded as a process of dynamic changes in multiple time windows to explore the changes in functional connections of the brain in different time periods. It has been widely used in resting state functional magnetic resonance imaging (rs-fMRI) analysis, providing a new perspective and strategy for the diagnosis of brain diseases. However, the common dynamic brain network analysis methods can not effectively use the potential correlation and timing between dynamic data, and ignore the uncertainty factors caused by the inconsistent data quality of each window. Therefore, this paper proposes a brain network analysis algorithm based on dynamic evidence neural networks (DE-NNs). This algorithm designs a multi-view evidence acquisition module of dynamic brain network, which treats each time window of dynamic brain network as a view. Three different convolution filters are used to extract the feature maps of each time window of the dynamic brain network, and the evidence of the dynamic level is fully obtained. A dynamic evidence fusion mechanism is designed to make full use of dynamic evidence. The dynamic trust function is constructed according to the time sequence of dFC data based on the evidence theory synthesis rules. The evidence generated by multiple windows is fused at the decision level of classification, the uncertainty information is fully considered, and the classification performance is significantly improved. Experiments were conducted on three schizophrenia datasets compared with existing advanced algorithms in order to verify the effectiveness of the proposed DE-NNs. The results showed that the accuracy and F1 scores of DE-NNs on the three brain disease diagnosis tasks were significantly improved.

Abstract:

In recent years, large models have made unprecedented progresses in variety of domains, such as natural language processing and machine vision. Mixture of Experts (MoE) has emerged as one of the most popular architectures for large models due to its distinct advantages in model parameter scalability, computational cost control and complex task processing. However, with the continuous increase of the parameter scale, the execution efficiency and scalability of the system are becoming increasingly challenging to meet the demand, and must be addressed urgently. The system optimization approach is an effective solution to solve this problem, which has become a hot research area. In light of this, we review the present research status of MoE system optimization techniques in the era of large model in this paper. To begin, we describe the present development state of work for MoE large model, and analyze the performance bottlenecks it faced on the system side. Then, we comprehensively sort out and deeply analyze the most recent research progress from four system core dimensions, ranging from memory occupation, communication latency, computational efficiency to parallel scaling, and compare and elaborate on the key technologies, application scenarios and optimization directions; Finally, we summarize the current research state of MoE system optimization and outline some future research directions as well.

Abstract:

Deep learning-based object detection algorithms have been widely applied, while recent research indicates that these algorithms are vulnerable to adversarial attacks, causing detectors to either misidentify or miss the target. Nonetheless, research focusing on the transferability of adversarial attacks in autonomous driving is limited, and few studies address the stealthiness of such attacks in this scenario. To address these limitations in current research, an algorithmic module to enhance attack transferability is designed by drawing an analogy between optimizing adversarial examples and the training process of machine learning models. Additionally, through employing style transfer techniques and neural rendering, a transferable and stealthy attack method (TSA) is proposed and implemented. Specifically, the adversarial examples are first repeatedly stitched together and combined with masks to generate the final texture, which is then applied to the entire vehicle surface. To simulate real-world conditions, a physical transformation function is used to embed the rendered camouflaged vehicle into realistic scenes. Finally, the adversarial examples are optimized using a designed loss function. Simulation experiments demonstrate that the TSA method surpasses existing methods in attack transferability and exhibits a certain level of stealthiness in appearance. Furthermore, physical domain experiments validate that the TSA method maintains effective attack performance in real-world scenarios.

Abstract:

Personalized learning resource recommendation is derived from identifying learners' interests and recommending interesting and relevant learning resources accordingly. However, learners’ interests are influenced by various factors such as knowledge points, learning resources, and courses, which makes it a challenging task to accurately represent their interests. Additionally, these interests evolve dynamically over time, complicating the task of identifying learning interest patterns. To address this challenge, we propose a learning resource recommendation method based on spatio-temporal multi-granularity interest modeling, which is characterized as follow: An innovative architecture is designed and implemented for learning interest representation that integrates the learning space and temporal dimension in a heterogeneous graph-based learning space and the multi-granularity interest representation. The nodes in this graph represent entities, such as knowledge points, learning resources, courses, teachers, and schools; and the edges of the graph represent the inter-entity relationships. A graph neural network is utilized to express the multi-granularity interest in these nodes. Moreover, we propose a temporal multi-granularity interest pattern representation method by combining multi-dimensionality of time, learning space, and course preference, and slicing through the sequence of learner's historical behaviors is used to mine the learner's different granularity of interest patterns in the near-term within-course, mid-term across-course, and long-term across-course. Then, a multi-granularity interest adaptive fusion layer is proposed to fuse multi-granularity interest representations and interest patterns. Based on this method a multi-granularity interest self-supervision task is designed to solve the problem of lack of supervised signaling for spatio-temporal multi-granularity interests, and recommend relevant learning resources for learners via prediction layer. Our experimental results show that on MOOCCube dataset the proposed method outperforms the optimal comparison algorithms HinCRec in Recall@20 and NDCG@20 metrics by 3.13% and 7.45%, respectively. On MOOPer dataset, the proposed method outperforms optimal comparison algorithm HinCRec in Recall@20 and NDCG@20 metrics by 4.87% and 7.03%, respectively.

Abstract:

Because of the high frequency characteristics of rapid single-flux quantum circuits (RSFQ), it poses a great challenge to circuit layout design. In order to solve the circuit delay problem caused by the high frequency characteristics of RSFQ, delay elements such as passive transmission line can be used in the routing stage. The delay of a passive transmission line is roughly proportional to its length, and the power consumption of the passive transmission line does not increase with the increase of the wirelength, so length matching routing is a crucial problem for RSFQ circuits. Therefore, this paper proposes an efficient RSFQ circuit routing algorithm considering length matching, including the following key strategies: 1) when generating the initial path, a method of detour routing is presented to meet the partial length matching of passive transmission lines without changing the initial routing space; 2) an iterative resource insertion algorithm based on region-awareness is utilized to reduce the area of additional resources needed to be added; 3) a length-matching driven routing algorithm considering blocking cost is designed, which improves the resource utilization of routing space. Experimental results show that, compared with existing multi-terminal routing algorithms, the proposed algorithm reduces the area required for routing by 8% and the running time by 36%, thus achieving fast and high-quality routing results.

Abstract:

The problem of topological imbalance in graphs, arising from the non-uniform and asymmetric distribution of nodes in the topological space, significantly hampers the performance of graph neural networks. Current research predominantly focuses on labeled nodes, with relatively less attention given to unlabeled nodes. To address this challenge, we propose a self-supervised learning method based on random walk paths aimed at tackling the issues posed by topological imbalance, including the constraints imposed by homogeneity assumptions, topological distance decay, and annotation attenuation. Our method introduces the concept of multi-hop paths within the subgraph neighborhood, aiming to comprehensively capture relationships and local features among nodes. Firstly, through a strategy of aggregating between paths, we can learn both homogeneous and heterogeneous features within multi-hop paths, thereby preserving not only the nodes' original attributes but also maintaining their initial structural connections in the random walk sequences. Additionally, by combining a strategy of aggregating subgraph samples based on multiple paths with structured contrastive loss, we maximize the intrinsic features of local subgraphs for the same node, enhancing the expressive power of graph representations. Experimental results validate the effectiveness and generalization performance of our method across various imbalanced scenarios. This research provides a novel approach and perspective for addressing topological imbalance issues.

Abstract:

Software systems play an indispensable role across various industries, handling large-scale and high-density data. However, the numerous defects within these systems have troubled developers for a long time, constantly threatening the security of data elements. Automated Program Repair (APR) technology aims to assist developers in automatically fixing defects in code during software development process, thereby saving costs in software system development and maintenance, enhancing the confidentiality, availability, and integrity of data elements within software systems. With the development of Large Language Model (LLM) technology, many powerful code large language models have emerged. These models have demonstrated strong repair capabilities in the APR field, while also addressing shortcomings of traditional approaches in code comprehension and patch generation capabilities, further elevating the level of program repair tools. We thoroughly survey high-quality papers related to APR in recent years, summarizing the latest developments in the field. We then systematically categorize two types of LLM-based APR techniques: cloze style and neural machine translation style. We also conduct an in-depth comparison from various perspectives such as model usage, model size, types of defects repaired, programming languages involved, and the pros and cons of repair approaches. Additionally, we discuss the widely adopted APR datasets and metrics, and outline existing empirical studies. Finally, we summarize current challenges in the APR field along with future research directions.

Abstract:

Research on knowledge-grounded dialogue often suffers from the problem of external knowledge containing redundant or even noisy information irrelevant to the conversation topic, which leads to a degradation in the performance of the dialogue system. Knowledge selection becomes an important approach to solving this issue. However, existing work has not yet investigated in depth some issues involving it such as how to design a knowledge selector, how to exploit the selected knowledge, what are the suitable scenarios for the knowledge selection conversation methods, etc. In this paper, we propose a new neural conversation method based on conditional variational attention knowledge selection and a pre-trained language model. This method employs a knowledge selection algorithm based on CVAE and a multi-layer attention mechanism to pick up the most relevant textual knowledge collection to the current conversation, which effectively exploits the dialogue response in training data to improve the efficiency of knowledge selection. Our novel model adopts the pre-trained language model Bart as encoder-decoder architecture and incorporates selected textual knowledge into the Bart model to fine-tune it during the training process. The experimental results show that the model proposed, in contrast to the current representative dialog models, can generate more diverse and coherent dialogue responses with higher accuracy.

Abstract:

With the rapid expansion of data center and the significant increase in network bandwidth, traditional software network protocol stack has high processor overhead and is difficult to meet the needs of many data center applications in terms of throughput, latency and other aspects. Remote direct memory access(RDMA)technology uses the ideas of zero copy, kernel bypass and processor function offloading to read and write remote host memory data with high bandwidth and low latency. Ethernet-compatible RDMA technology is being applied in data centers, and Ethernet RDMA NIC plays a crucial role in its deployment as the main functional bearer device. This overview analyzes from three aspects: architecture, optimization, and implementation evaluation. 1) We summarize the general architecture of Ethernet RDMA NIC and introduce the key functional components; 2) We focus on the optimization techniques in storage resources, reliable transmission and application-related aspects, including optimization of both connection scalability for NIC cache resources and registration access for host memory resources, optimization of congestion control, flow control and retransmission mechanism for lossy Ethernet to achieve reliable transmission, and optimization of different storage types in distributed storage, database system, cloud storage system, and multi-tenant performance isolation, security and programmability for data center applications; 3) Then we investigate different implementation and evaluation methods. Finally, the summary and outlook are given.

Abstract:

Multi-view clustering aims to use heterogeneous information from different views to discover the underlying data structure and divide the samples into clusters. Consistency and complementarity are two key elements that affect the performance of multi-view clustering. Consistency emphasizes the semantic similarity between different views. Complementarity, on the other hand, emphasizes the mutual supplementation of specific information within each view. At present, the study of consistency has been relatively in-depth, but the study of complementarity is controversial, in which some methods believe that consistency and complementarity can assist each other, but constraining them to the same feature space actually causes a conflict between them. Other approaches accordingly argue that complementary information should be discarded, but this would result in a waste of information. Intuitively, complementarity should exist. The contribution of this paper is to find that existing methods do not have enough insight into the essence of complementarity, i.e., consistency and complementarity are not independent but entangled with each other, which results in conflict. Motivated by this finding, this paper realizes the separation of the two kinds of information through disentangling, specifically making them located in different feature subspaces instead of the same feature space, thus developing a multi-view clustering algorithm that takes into account both consistency and complementarity, effectively extracting the complementary information while avoiding the conflict between consistency and complementarity. Comparative experiments on standard datasets demonstrate the effectiveness of the proposed algorithm.

Abstract:

Quaternion-valued neural networks extend real-valued neural networks to the algebra of quaternions. Quaternion-valued neural networks achieve higher accuracy or faster convergence than real-valued neural networks in some tasks, such as singular point compensation in polarimetric synthetic aperture, spoken language understanding, and radar robot control. The performance of quaternion-valued neural networks is widely supported by empirical studies, but there are few studies about theoretical properties of quaternion-valued neural networks, especially why quaternion-valued neural networks can be more efficient than real-valued neural networks. In this paper, we investigate theoretical properties of quaternion-valued neural networks and the advantages of quaternion-valued neural networks compared with real-valued neural networks from the perspective of approximation. Firstly, we prove the universal approximation of quaternion-valued neural networks with a non-split ReLU (rectified linear unit)-type activation function. Secondly, we demonstrate the approximation advantages of quaternion-valued neural networks compared with real-valued neural networks. For split ReLU-type activation functions, we show that one-hidden-layer real-valued neural networks need about 4 times the number of parameters to possess the same maximum number of convex linear regions as one-hidden-layer quaternion-valued neural networks. For the non-split ReLU-type activation function, we prove the approximation separation between one-hidden-layer quaternion-valued neural networks and one-hidden-layer real-valued neural networks, i.e., a quaternion-valued neural network can express a real-valued neural network using the same number of hidden neurons and the same parameter norm, while a real-valued neural network cannot approximate a quaternion-valued neural network unless the number of hidden neurons is exponentially large or the parameters are exponentially large. Finally, simulation experiments support our theoretical findings.

Abstract:

Traceable attribute-based signature (TABS) inherits the merits of attribute-based signature, can trace the real identity of the signer through a trusted third party, avoiding the abuse of anonymity of attribute-based signature. At present, there are very few signature-policy attribute-based signature (SP-ABS) schemes that support traceability in one-to-many authentication scenario, and most of the existing schemes suffer from efficiency and security deficiencies, for example, the computational complexity of the verification phase is linearly related to the number of attributes, which is inefficient. Meanwhile, the fact that the policy is provided directly by the verifier to the signer can easily lead to policy privacy leakage. To solve the above problems, a traceable attribute-based signature scheme supporting policy hiding based on SM9 is proposed in this paper, which uses a linear secret sharing scheme (LSSS) with attribute name and attribute value splitting to construct the access structure, supports partial hiding of the policies, and can protect the policy privacy of the verifier while protecting the signer's identity privacy and attribute privacy. In the verification phase, the scheme requires only constant order bilinear pairing operations and exponential operations, which can achieve efficient fine-grained access control. Finally, the scheme is proved to be unforgeable under the random oracle model by the q-strong Diffie-Hellman (q-SDH) hard problem.

Abstract:

Automated essay scoring (AES) can effectively alleviate the burden on teachers when evaluating student essays and provide students with objective and timely feedback. It is a crucial application of natural language processing in the field of education. Cross-prompt AES aims to develop a transferable automated scoring model that performs well on essays from a target prompt. However, existing cross-prompt AES models primarily operate in scenarios where target prompt data is available. These models align feature distributions between source and target prompts to learn invariant feature representations for transferring to the target prompt. Unfortunately, such methods cannot be applied to scenarios where target prompt data is not available. In this paper, we propose a cross-prompt AES method based on Category Adversarial Joint Learning (CAJL). First, we jointly model AES as classification and regression tasks to achieve combined performance improvement. Second, unlike existing methods that rely on prompt-agnostic features to enhance model generalization, our approach introduces a category adversarial strategy. By aligning category level features across different prompts, we can learn invariant feature representations of different prompt and further enhance model generalization. We evaluate our proposed method on the Automated Student Assessment Prize (ASAP) and ASAP++ datasets, predicting both overall essay scores and trait scores. Experimental results demonstrate that our method outperforms six classical methods in terms of the quadratic weighted kappa metric.

Abstract:

With the surge of streaming data, concept drift has become an important and challenging problem in streaming data mining. At present, most ensemble learning methods do not specifically identify the types of concept drift and adopt efficient ensemble adaptation strategies, resulting in uneven performance of models on different concept drift types. To address this, this paper proposes an elastic gradient ensemble for concept drift adaptation (EGE_CD). Firstly, the gradient boosting residual was extracted and the flow residual ratio was calculated to detect the drift site, and then the residual volatility was calculated to identify the type of drift. Then, the drift learners are extracted by using the change of learner loss, and the corresponding learners are deleted by combining different drift types and residual distribution characteristics to realize elastic gradient pruning. Finally, the incremental learning method was combined with the sliding sampling method to optimize the fitting process of the learner by calculating the optimal fitting rate, and then the incremental gradient growth was realized according to the change of the residual of the learner. The experimental results show that the proposed method improves the stability and adaptability of the model to different concept drift types and achieves good generalization performance.

Abstract:

In the information stage, the importance of data storage lies in ensuring the reliability, consistency, security, and real-time accessibility of information. Erasure codes (EC) play a crucial role in data storage systems due to their ability to minimize storage overhead and handle multiple component failures. However, the process of encoding and decoding EC involves intensive computation, impacting storage system efficiency. This paper focuses on optimizing EC, with a special emphasis on the Galois field (GF) multiplication within multi-layer loops, a time-consuming aspect of EC. We first evaluate the pros and cons of three methods for GF multiplication calculation: the log table searching method, the complete multiplication table searching method, and the shift decomposition method. Subsequently, a 4 b splitting (SP) method is proposed to reduce memory access overhead during table searching in GF(28). We delve into the SP’s analysis and leverage the 64 b modern processor architecture and vector instruction set characteristics to introduce data-level parallelism in multi-layer loops. This involves amplifying data access granularity and implementing single instruction multiple data (SIMD) vectorization. Based on the open-source Intel storage acceleration library (ISA-L), all optimization methods are implemented and tested on the Sunway processor and the x86 processor. The experimental results show the effectiveness of proposed optimization in improving EC performance across different data scalability scenarios. When compared to the original ISA-L, our optimizations exhibit an average performance speedup of 3.28x on the Sunway, 2.36x on the x86.

Abstract:

Enumerating minimal unsatisfiable subsets (MUS) is an important issue in theoretical computer science. Given an unsatisfiable Boolean formula, the number of its MUS is exponentially related to the formula’s scale. Contemporary methods aim to identify as many MUSes as possible within appropriate time limits. Dealing with the huge search space, choosing a suitable node to expand can markedly reduce the time consumption on shrink and grow operations, thereby the algorithm could obtain better performance. This paper introduces an incremental information interaction-based MUS solving method, denoted as MARCO-MSS4MUS, which utilizes the duality and complementary relationships among MUS, minimal correction sets (MCS), and maximal satisfiable subsets (MSS). Based on the framework of MARCO algorithm, the proposed method selects a more suitable node to expand via intersection and union information of previously identified MSSes during the search, i.e., the incremental MSS information is employed as a heuristic for node selection to accelerate the enumeration of MUS. This process also benefits in identifying more MSSes, in turn, the incremental MSS information help select a better node for next exploration, thereby achieving an interaction of incremental information. The paper presents two theorems and two corollaries regarding interactive incremental information, analyzing the feasibility of our MARCO-MSS4MUS algorithm theoretically. Experiments on standard MUS benchmark instances show the superiority of the proposed algorithm over state-of-the-art methods. Both the enumeration efficiency and the number of enumerated wins of the proposed method are significantly improved compared to existing methods.

Abstract:

Multi-view subspace clustering aims to explore rich information across views to guide the clustering process. The key lies in effectively learning the unified representation and subspace representation between views. Recently, deep clustering methods have achieved promising effects due to the powerful representation capability of neural networks. However, the multi-source heterogeneity inherent in multi-view data allows existing methods to encode each view independently with a unimodal encoder, increasing the number of model parameters and limiting the model's generalization capability. Besides, low-rank subspace representation has been shown to facilitate clustering performance, while traditional nuclear norm regularization does not consider the difference between different singular values, leading to biased estimation. To tackle these two problems, we propose a novel multi-view unified representation learning network (namely, MURLN) for subspace clustering. Specifically, MURLN first uses the Transformer as the encoder architecture, which projects different views into the low-dimensional feature space with the same mapping rule by sharing parameters. In addition, a weighted fusion strategy for intra-view samples is conducted to learn a unified representation rationally. Finally, the weighted Schatten p-norm is introduced as the low-rank constraint of the subspace representation matrix. Extensive experiments on seven multi-view datasets verify the effectiveness and superiority of our proposed method.

Abstract:

Network packet processing is a fundamental function of network devices, involving tasks such as packet modification, checksum and hash computation, mirroring, filtering, and packet metering. As a domain-specific processor, Network Processors (NP) can provide line-rate performance and programmability for network packet processing. However, due to different design requirement, architecture of NP differs, including single-phase NP and multi-phase NP, posing challenges for NP designers. Existing simulation methods mainly target single NP or single architecture and are not available to explore both of the architectures. This paper proposes Neptune, an analyzing framework for generic network processor microarchitecture modeling and performance simulation. Based on detailed analysis, Neptune adopts multi-phase NP architecture as the hardware model while providing ability to simulate single-phase architecture. Besides, Neptune employs event list mechanism and inter-core queues to support simulation of different data paths and various scheduling strategies in multi-phase NP. Furthermore, Neptune utilizes bulk synchronous parallel graph computing mechanism and takes advantage of both event-driven and time-driven simulation, ensuring accuracy and efficiency. Our experiment shows that Neptune achieves over 95% accuracy in simulating both of the architectures and simulates network processors at a performance of 3.31MIPS, achieving an order of magnitude improvement over PFPSim. We illustrate the universality and capability of the Neptune simulation framework through three specific cases. Firstly, we evaluated multi-phase and single-phase NP, showing that single-phase NP can achieve up to a 1.167x performance improvement. Secondly, we optimize the packet parsing module using a programmable pipeline and analyze its performance differences. Finally, we use Neptune to test the performance of the packet processing engine under different thread counts, providing insights for software and hardware multi-threading optimization.

Abstract:

The Chiplet-integrated chip based on advanced packaging technology offers a number of advantages in terms of manufacturing cost, design efficiency and special customization, etc. This represents a new and effective method of maintaining the performance growth of the chip in the post-Moore era. As an important method of quantitative analysis of architectural design, design space exploration (DSE) can assist designers in comprehending and evaluating the intricate interrelationships between design parameters. However, when applying the traditional DSE method directly to the Chiplet design, it gives rise to issues such as incomplete evaluation, inaccurate simulation, and low efficiency. The solution to these problems is FireLink, an evaluation framework for Chiplet design space exploration. FireLink supports the modelling and simulation of Chiplet microarchitectures and interconnection networks, and is capable of efficiently evaluating performance, power, area and cost metrics. Furthermore, experiments were conducted using the Iterative Dichotomiser 3 (ID3) machine learning algorithm in this framework, which has been demonstrated to effectively improve the efficiency of DSE. In comparison to existing DSE methodologies, FireLink exhibits notable advantages in comprehensiveness of evaluating, completeness of modeling and efficiency of DSE, therefore designers can explore a wider range of design space in a shorter time, so as to select a better Chiplet design scheme.

Abstract:

Transient execution attacks (TEAs) exploit processor optimizations to bypass security checks and exfiltrate sensitive information through covert channels. Among them, Meltdown and Spectre attacks have become prominent, affecting mainstream commercial processors such as Intel, ARM, and AMD. Despite the defensive measures implemented by processor manufacturers, variants of these attacks continue to be discovered and disclosed by researchers. To improve the understanding of TEAs and deploy robust defenses, this paper comprehensively analyzes TEAs under various covert channels. Initially, the common characteristics of TEAs are extracted, and a novel model for TEAs is systematically constructed. Subsequently, we summarize the various types of covert channels involved in existing research, classify the TEAs into three types: Meltdown type attacks driven by out-of-order execution (OoOE), Spectre type attacks driven by branch misprediction, and microarchitecture data sampling (MDS) type attacks driven by data misprediction, and delineate the key aspects and relationships of each type of attack. Notably, this paper systematically compiles and categorizes MDS type attacks for the first time. Then, the capabilities of each attack variant were meticulously analyzed and evaluated from three dimensions: covert channel, attack applicable scenarios, and microarchitecture immunity status, which aids security researchers in developing new, more destructive attack types based on the deficiencies of the existing attack-related research. Finally, combined with the above-mentioned comprehensive and in-depth analysis and summary of processor microarchitecture and covert channels, this paper anticipates the future trajectory of TEAs research, hoping to provide strong support for subsequent research work.

Abstract:

LoongArch ISA (instruction set architecture) introduces new memory accessing instructions with bound-checking to decrease the overhead of memory security check. However, as a new type of memory accessing instruction, the existing GCC (GNU compiler collection) compiler tools cannot support it and thus LoongArch based hardware remains underutilized. Therefore, in this paper, we revise the GCC compiler with the LoongArch memory accessing instructions to optimize the memory security check. Specifically, our work is divided into three parts: 1) designs built-in functions for the memory accessing instructions; 2) improves the RTL (register transfer language) optimizer of GCC to recognize two kinds of semantic patterns of memory accessing instructions with bound-checking, which are non-exception handling and exception handling; 3) implements a new exception signal SIGBCE for the bound check exception BCE that is raised by CPU in Linux kernel, and implements the corresponding signal handling function in glibc (GNU C library) to deal with the bound check exception. The experiments on GCC 12.2.0 and Loongson 3C5000L server show that the revised compiler is able to correctly employ the new memory accessing instructions and bring an acceleration of approximately 20% in some security routines. Our work improves the ecosystem of LoongArch and boost the development of LoongArch ISA. It will also be referential to GCC optimization for the specialized instructions.

Abstract:

Ensuring deadlock-free data transmission in the Network-on-Chip (NoC) is a prerequisite for providing reliable communication services for Multi-processor System-on-Chip (MPSoC), directly determining the availability of NoC and even MPSoC. Existing general-purpose deadlock-free strategies are oriented to arbitrary topologies, making it challenging to utilize the features and advantages of a specific topology. Moreover, these strategies may even increase network latency, power consumption, and hardware complexity. In addition, due to significant differences in the regular network between routing-level and protocol-level deadlocks, existing solutions struggle to simultaneously address both types of deadlock issues, affecting the MPSoC reliability. This paper proposes a deadlock-free strategy with synchronous Hamiltonian rings based on the inherent Hamiltonian characteristics of the Triplet-based many-core architecture (TriBA). This method uses the topology's symmetric axes and Hamiltonian edge to allocate independent store-and-forward buffers for data transmission, preventing protocol-level deadlocks and improving data transfer speed. Additionally, we designed a directional determination method for data transmission within the same buffer using cyclic linked-list technology. This method ensures data independence and synchronous forward transmission, eliminates routing-level deadlocks, and reduces data transfer latency. Based on optimizing redundant calculations in look-ahead routing algorithms, we propose a deadlock-free routing mechanism called Hamiltonian Shortest Path Routing (HamSPR) based on a synchronous Hamiltonian ring. GEM5 simulation results show that, compared with existing solutions in the TriBA, HamSPR reduces average packet latency and power consumption in synthetic traffic patterns by 18.78%~65.40% and 6.94%~34.15%, respectively, while improving throughput by 8.00%~59.17%. In the PARSEC benchmark, HamSPR achieved maximum reductions of 16.51% in application runtime and 42.75% in average packet latency, respectively. Moreover, compared to the 2D-Mesh, TriBA demonstrated an application performance improvement of 1%~10% in the PARSEC benchmark.

Abstract:

Constructing a software and hardware system-level prototype platform for accelerating data center services requires the consideration of factors such as high computing power, scalability, flexibility, and low cost. To enhance data center capabilities, research from the perspective of software-hardware synergy has been conducted on the innovation of heterogeneous computing in cloud platform architecture, hardware implementation, high-speed interconnection, and applications. A reconfigurable and combinable software-hardware acceleration prototype system is designed and built to simplify existing processor-centric system-level computing platform construction methods, enabling rapid deployment and system-level prototype validation of target software-hardware designs. To achieve these objectives, methods such as decoupled reconfigurable architecture device virtualization and remote mapping are utilized to uncover the potential of independent computing units. An ISOF (independent system of FPGA) software-hardware computing platform system is constructed to surpass the capabilities of conventional server designs, enabling low-cost and efficient expansion of computing units while allowing clients to flexibly utilize peripheral resources. To address system-level communication challenges, a communication hardware platform and interaction mechanism between computing units are designed. Additionally, to enhance the agility of the software-hardware system-level platform, ISOF provides a flexible and unified invocation interface. Finally, through the analysis and evaluation of the system-level objectives of the platform, it has been verified that the platform meets the current computing and acceleration requirements, ensuring high-speed, low-latency communication, as well as good throughput and efficient elastic scalability. In addition, improvements have been made in congestion avoidance and packet recovery mechanisms based on high-speed communication, meeting the stability requirements of communication at data center scale.

Abstract:

With the advancement of modern computer technology, the memory wall problem is getting more and more severe. Under this background, the last-level cache in multi-level memory hierarchy becomes a key resource affecting system performance. In recent years various researches have optimized the last-level cache by means of size expansion, and dynamic resource management. Way-partitioning technique is the main method of cache resource management, which optimizes system performance by partitioning the cache into ways and allocating them to each application. However, it is coarse-grained and requires all sets of caches to follow the same way partitioning strategy. In fact, applications may have different space demand on different sets, and the way-partitioning technique restricts the space utilization of the cache, resulting in a waste of cache resources. In this paper, we propose an on-demand fine-grained cache resource management technique, GroupUCP, whose design idea is to aggregate individual cache sets into groups based on the different space demand of each application on each set, using dynamic grouping and real-time evaluation. Each group can be allocated space on demand independently, thus improving cache utilization and overall system performance. Experiments demonstrate that GroupUCP achieves finer-grained on-demand resource allocation using less hardware resources than the traditional UCP approach and achieves higher system performance improvement in cache-sensitive application combinations which shows imbalance space demand of cache.

Abstract:

Aspect sentiment triplet extraction (ASTE) is a challenging subtask within aspect-based sentiment analysis. It aims to extract triplets consisting of aspect terms, opinion terms, and sentiment polarities from texts. In the recent past, generative extraction techniques have demonstrated remarkable efficacy through the sequential concatenation of target triplets, thereby enabling the autoregressive generation of triplets. However, this concatenation method may lead to sequential dependencies among unrelated triplets, introducing error accumulation during decoding. To address this issue, we propose a term-prompted and dual-path text generation (TePDuP) method. This method first utilizes machine reading comprehension (MRC) to extract aspect and opinion term in parallel, and then uses them as prompt prefixes to guide conditional triplet generation, forming a dual-path text generation framework. Meanwhile, during the training phase, we incorporate scheduled sampling as a corrective measure to mitigate the bias stemming from MRC extraction. Furthermore, in order to enhance performance to an even greater extent, we incorporate generation probabilities to merge outcomes guided by aspect and opinion terms, thereby augmenting the resilience of the model. Experimental results on the ASTE-DATA-V2 dataset show that the proposed method is effective and significantly outperforms other baseline models, and provide case studies to demonstrate that the method solves the aforementioned problem to some extent.

Abstract:

In traditional question-answering tasks, models generally require extensive data for training, which entails considerable time and manpower costs for data annotation. Unsupervised question generation represents an effective solution to address the scarcity of training data in question-answering tasks. However, the questions generated using this approach currently suffer from issues such as being difficult to answer, lacking variety, and having unclear semantics. To address these issues, this paper proposes an adaptive multi-module pipeline model named ADVICE, with modules improving existing methods in answerability, question diversity and grammatical correctness. Within the question answerability module, the paper employs coreference resolution and named entity recognition techniques to improve the answerability of questions. For question diversity, the paper designs specific rules for various question types to enhance the diversity of question and answer types. In the grammatical correctness module, a grammar error correction model targeted at questions is trained based on T5 model, and a filtering module is designed to refine the generated question-answer data. Finally, a classifier is trained to automatically select the necessary modules. Experiments demonstrate that the improved question generation method enhances the performance of downstream question-answering models on the SQuAD dataset, with the EM (exact match) score increasing by an average of 2.9% and the F1 score by an average of 4.4%.

Abstract:

WiFi-based respiratory monitoring becomes a hot spot in the sensing layer of IoT benefiting from non contact, low cost and high privacy protection. However, current WiFi-Based respiratory monitoring methods relay on sensitive channel state information (CSI) samples which requires that single monitoring target keeps static without any moving non-target person and closing to the WiFi transceiver device. These requirements limit the large-scale applications of WiFi-based respiratory monitoring. Therefore, we propose a respiratory monitoring range extension method named FDRadio, which is able to work under dynamic interference scenes. In FDRadio, we improve the accuracy and robustness of respiratory monitoring from three aspects: separating dynamic interference sources, eliminating ambient noise and enhancing power of dynamic reflected signal. Specifically, we first expand the channel bandwidth by combining multiple WiFi channels to improve the spatial resolution of WiFi sensing, and employ wired direct channel to remove the accumulated hardware noise caused by combining channels. Second, we analyze the relationship between monitoring range and ambient noise, and then adopt time diversity techniques to design a two-stage ambient noise deduction process for FDRadio. In addition, we design a novel weight allocation algorithm, which maximizes the dynamic reflected signal power, and enhances the ability to sensing weak chest fluctuation caused by breath. Finally, the processed CSI samples are converted to power delay spectrum (PDP) in time domain. By this, the respiratory signal can be directly extracted from the target person using the distance difference. We have implemented FDRadio on a commercial embedded devices and conduct a series of experiments. The experimental results show that detection error is less than 0.5 bpm under the 7m available monitoring range, even if multiple moving non-target person exist.

Abstract:

With the continuous development of cloud computing technology, the reversible data hiding in encrypted images (RDHEI) has received more and more attention. But most of the reversible data hiding in encrypted images are based on grey-scale, which have great limitations in application scenario compared to color images. Moreover, since the current reversible data hiding methods in the encrypted domain mainly focus on grayscale images, and there are few optimizations for the characteristics of color images, it is hard to obtain better performance by applying these algorithms, so it is of high value to further investigate the reversible data hiding algorithm in color encrypted images. In this paper, we propose a high-performance RDHEI of color images algorithm for the first time based on color channels correlation and entropy encoding (RDHEI-CE) for cloud computing. First, the RGB channels of the color image are separated and the prediction errors are derived separately. Next, the embedding space is generated by adaptive entropy encoding and prediction errors histogram. The correlation between color channels is then used to further expand the embedding space and embed secret message on the encrypted image. Finally, the marked encrypted image must be scrambled in order to resist a ciphertext-only attack. Compared to most state-of-the-art RDHEI methods, experimental results show that the RDHEI-CE method provides a greater embedding rate and better security and broadens the application scene of reversible data hiding in the cloud.

Category
Content Cover
Vol.62 No.2 2025     Date of publication:2025-02-13      
Artificial Intelligence
Abstract:

The recommender system has a significant role in alleviating information overload, allowing users to conveniently obtain products and services on various application platforms like Tmall, TikTok, and Xiaohongshu. However, most of the recommendation systems focus on the accuracy rate as the center, which leads to adverse effects such as the limitation of users’ vision, fewer display opportunities for some merchants, a single content ecosystem of the platform, and an unbalanced allocation of resources and information, such as triggering the filter bubble and the Matthew effect. As a result, strengthening the diversity of the recommendation system has become a key research point to fulfill the increasingly diversified material demands in people’s lives. In recent years, research on diversified recommendations has developed rapidly. However, this aspect needs to be more systematic in organization and summarization. This paper systematically reviews the issue of diversified recommendations within recommendation systems. Firstly, we put forward the problem definition, technical framework, classification, and application scenarios of diversified recommendations. Secondly, we make comparisons and analyses of models and algorithms from four perspectives. Subsequently, we summarize the commonly used datasets and metrics for diversified recommendations. Finally, we deliberate on the problems and challenges in this field to inspire future innovation and promote development.

Abstract:

Knowledge base question answering is aimed to retrieval relevant information from the knowledge base for model inference, and return accurate answers. In recent years, with the development of deep learning and large language models, knowledge base question answering based on information retrieval has become the research focus, and many novel research methods have emerged. We summarize and analyze the methods of knowledge base question answering based on information retrieval from different aspects such as model methods and datasets. Firstly, we introduce the research significance and related definitions of knowledge base question answering. Then, according to the model processing stages, we explain the key problems and typical solutions faced in each stage from four stages: question parsing, information retrieval, model inference, and answer generation, and summarize the common network modules used in each stage. Then we analyze and sort out the inexplicability of knowledge base question answering based on information retrieval methods. In addition, relevant datasets with different characteristics and baseline models at different stages are classified and summarized. Finally, the summary and outlook are provided on each stage of knowledge base question answering based on information retrieval, as well as the overall development direction of the field.

Abstract:

Currently, deep learning has achieved significant success in the field of synthetic speech detection. However, deep models commonly attain high accuracy on test sets that closely match their training distribution but exhibit a substantial drop in accuracy in cross-dataset scenarios. To enhance the generalization capability of models on new datasets, they are often fine-tuned with new data, but this leads to catastrophic forgetting, where the model’s knowledge learned from old data is impaired, resulting in deteriorated performance on the old data. Continuous learning is a prevalent approach to mitigate catastrophic forgetting. In this paper, we propose a continuous learning algorithm called elastic orthogonal weight modification (EOWM) to address catastrophic forgetting for synthetic speech detection. EOWM mitigates knowledge degradation by adjusting the direction and magnitude of parameter updates when the model learns new knowledge. Specifically, it enforces the updates’ direction to be orthogonal to the data distribution of the old tasks while constraining the magnitude of updates for important parameters in the old tasks. Our proposed algorithm demonstrates promising results in cross-dataset experiments within the domain of synthetic speech detection. Compared with fine-tuning, EOWM reduces the equal error rate (EER) on the old dataset from 7.334% to 0.821%, representing a relative improvement of 90%, and on the new dataset, it decreases EER from 0.513% to 0.315%, corresponding to a relative improvement of 40%.

Abstract:

Traffic data missing is one of the unavoidable problems in intelligent transportation systems. Completing and quantifying the uncertainty of missing values can improve the performance and reliability of traffic data mining tasks in intelligent transportation systems. However, most existing traffic data imputation models mainly focus on point estimation without quantifying the uncertainty, so they cannot meet the need for traffic data reliability in the transportation field. Besides, these methods only focus on modeling spatial-temporal correlation of traffic data, failing to consider the impact of missing values on spatial-temporal correlation. In addition, the uncertainty of traffic data is affected by time, spatial location, and the state of the data, but existing methods cannot comprehensively consider these factors. To address these challenges, we propose a spatial-temporal uncertainty guided traffic data imputation network (STUIN), which simultaneously realizes the imputation of spatial-temporal traffic data and the uncertainty quantification of the imputation results by self-supervised training. Specifically, we innovatively model the hidden states of the neural network as random variables subject to Gaussian distributions, use the variances of Gaussian distributions to model the uncertainty of the hidden states, and introduce a variance-based attention mechanism to characterize the effect of uncertainty on modeling spatio-temporal correlations. In addition, we design a novel spatial-temporal uncertainty initialization module, which incorporates the influence of time, space and missing values when initializing the means and variances of the Gaussian distributions. Experiments on two traffic flow datasets show that STUIN achieves state-of-the-art performance on both the data imputation and uncertainty quantification tasks.

Abstract:

Knowledge tracing is a pivotal technique for modeling students’ knowledge level, and typically relies on their past learning interactions to predict their future performance on exercises. These interactions represent a student’s process of answering a sequence of questions. Current knowledge tracing methods ignore times a skill has been practiced when modeling student’s forgetting behaviors. Also, few models consider the relation between skills and its influence on performance prediction. To address these questions, we propose a deep knowledge tracing model with the integration of skills relation and forgetting degree. Firstly, a relation matrix is constructed using statistical methods to capture the relation between skills. Secondly, the time intervals between interactions and the times a student practices the same skill are used to compute the forgetting degree of each skill for better modeling of students’ forgetting behaviors. Finally, skills relation and forgetting degrees are integrated into an attention module to obtain the influence of each past interaction on future performance prediction. Based on new attention weights, students’ performance on future exercises and knowledge level can be predicted. Experiments on two real-world online education datasets, algebra2005-2006 and ASSISTment2012, demonstrate that the proposed model achieves better prediction results compared with existing mainstream methods.

Abstract:

Multimodal sentiment analysis is a multimodal task that uses multiple modalities of subjective information to analyze sentiment. In some scenarios, the sentimental expression in different modalities is inconsistent, even contradictory, which will weaken the effect of multimodal collaborative decision-making. In this paper, a multimodal learning method is proposed to learn the modal feature representations with consistent sentimental semantics. In order to improve the common feature representation of different modalities and learn the dynamic interaction between modalities without affecting the original information, we first learn the common feature representation of each modality, and then use cross attention to enable one modality to effectively obtain auxiliary information from the common feature representations of other modalities. In multimodal fusion, we propose a multimodal attention, which is used to weighted concatenate modal feature representations, in order to increase the expression of contributed modalities and suppress the influence of weak modalities. The experimental results of the proposed method on the sentiment analysis datasets MOSI, MOSEI, CH-SIMS are better than those of the compared models, indicating the necessity and rationality of considering the problem of sentimental semantic inconsistency in multimodal sentiment analysis.

Abstract:

With the rapid development of global informationization, data mining and knowledge discovery of high-dimensional data have been a hotspot in the field of artificial intelligence and data science. However, the sparse sample and redundant feature issues of high-dimensional data make it challenging to ensure the generalization and interpretability of traditional statistical models and machine learning methods. Hence, we present fuzzy-based concept-cognitive learning with robustness for the imbalance problem between high-dimensional data and weak knowledge evolution ability. The main idea is to explore the knowledge structure and cognitive learning mechanism of high-dimensional data from the concept perspective. We propose a high-dimensional data classification method based on the concept-cognitive learning mechanism in the fuzzy formal context. Furthermore, the cognitive learning process of fuzzy concepts is described from two different perspectives by the positive and negative cognitive learning operators of fuzzy three-way concepts. Finally, the fusion of fuzzy three-way concepts completes the task of concept identification and data classification. Extensive experiments performed on 12 real data sets compared with 12 state-of-the-art classification methods also verify the robustness and effectiveness of the proposed method. The considered framework can provide a convenient novel tool for high-dimensional data knowledge discovery research and fuzzy-based concept-cognitive learning.

Abstract:

In the subway scene, small pedestrian targets contain less feature information due to their low resolution, and it is still challenging for object detectors to detect such objects at this stage. SSD target detection algorithm uses the multi-scale detection head of the pyramid network, which can improve the pedestrian target detection performance to a certain extent, but it still has certain limitations in small pedestrian target detection application in complex environments such as subways. In view of the above problems, we propose an improved SSD algorithm to enhance the detection effect of small pedestrian targets in subway scenes, construct a dataset of pedestrian targets in subway scenes, mark the corresponding labels, and perform data preprocessing operations at the same time. In this study, a pyramid feature enhancement module is added to the feature extraction network, and the multi-branch residual unit, sub-pixel convolution and feature pyramid are combined to obtain image multi-scale and multi-receptive field fusion features. We use the context information fusion module to fuse the low-level features of the image with the context features to generate an extended feature layer for detecting small pedestrian targets, and design an Anchor-free dynamic positive and negative sample allocation strategy to generate optimal positive samples for small pedestrian targets. A dynamic positive and negative sample allocation strategy based on Anchor-free is designed to generate optimal positive samples for small pedestrian targets. The experimental results show that the proposed improved SSD algorithm can effectively improve the performance of small pedestrian target detection in subway scenes, and the effect of small pedestrian target detection with severe occlusion is more obvious.

Abstract:

Model-based diagnosis (MBD) finds a growing number of uses in different settings. It includes software fault localization, debugging of spread sheets, Web services, and hardware designs, and also the analysis of biological systems, among many others. Motivated by these different uses, there have been significant improvements made to MBD algorithms in recent years. Nevertheless, the analysis of larger and more complex systems motivates further improvements to existing approaches. Since computing diagnosis is computationally challenging, some MBD algorithms by compacting the model are presented successively, such as dominated-based compacted model with multiple observations (D-CMMO) approach. In this paper, we propose one new diagnosis model, namely, cardinality-constrained compacted (CCM) model, to solve the problem in which a considerable amount of time is needed when multiple observations are given and more than one fault is injected. CCM uses two methods to optimize the process of solving MBD. Firstly, we propose to utilize relationship of faulty system-outs and faulty components to limit the scope of target solution. Secondly, the performance of the MaxSAT (maximum satisfiability) solver is effectively improved by enqueueing all assumptions at once. Furthermore, experiment evaluations on ISCAS85 and ITC99 benchmarks show that compared with D-CMMO, the latest encoding algorithms for MBD, the above two optimization methods effectively reduce the scope of MBD problem, reduce the difficulty of searching for the target solution by the MaxSAT solver, and thus return diagnostic solution in a shorter time. On average, the solving efficiency of CCM method is improved by 64.5% and 92.8% compared with D-CMMO method respectively.

Abstract:

In recent years, the rapid urbanization and development of the social economy have led to a growing focus on public safety issues. Governments across the world are increasingly promoting the construction of smart cities and intelligent security systems to safeguard the lives and property of citizens and maintain social stability. Person re-identification (ReID) is an essential technology for building smart cities, with significant implications for security monitoring and criminal investigation applications. The goal of person re-identification is to accurately identify specific individuals captured under different cameras. However, due to intra-class differences resulting from various factors such as illumination, viewpoint, occlusion, and pose, person re-identification remains a challenging task in the field of computer vision. Although existing fully supervised person re-identification methods have made significant progress, the scarcity of data and labels poses a bottleneck for further improving model performance. To address this challenge, we introduce a more complex and diverse synthetic dataset with easy-to-obtain labels for auxiliary training, and propose a novel camera-aware asymmetric adversarial learning (CAAL) method that overcomes intra-class variation among multiple cameras and the domain-shift between real data and synthetic data, enabling the learning of camera-invariant feature representations from diverse data sources. Furthermore, to mitigate the impact of misleading information carried by synthetic datasets and prevent the model from overfitting to synthetic data during adversarial training, we propose using an auxiliary network trained on real-world data to constrain the training of the backbone network. Finally, we conduct extensive experiments on two public datasets to demonstrate the effectiveness of the proposed method.

Abstract:

Semi-supervised multi-label learning employs labeled and unlabeled data to train a model, which effectively achieves good results and reduces the labeling cost of multi-label data. Therefore, semi-supervised multi-label learning has attracted many researchers dedicated to this field. However, in the semi-supervised annotation process, due to the large number of labels, it is a common situation that some labels lack of samples, and these labels are called open vocabulary. It is difficult for the model to learn the label information of the open vocabulary, which leads to the degradation of its performance. To address the above problem, we propose a semi-supervised open vocabulary multi-label learning method based on graph prompting. Specifically, this method uses a graph neural network via prompt to fine-tune the pre-trained model and explore the relationship between open vocabulary and supervised samples. By using images and text, we construct a graph neural network as the input of text for the pre-trained model. Furthermore, by leveraging the generalization ability of the pre-trained model on open vocabulary, pseudo-labels are generated for unsupervised samples. Then we use pseudo-labels to train the classification layer and to enable the model to achieve better performance in classifying open vocabulary. Experimental results on multiple benchmark datasets, including VOC, COCO, CUB, and NUS, consistently demonstrate that the proposed method outperforms existing methods and achieves state-of-the-art performance.

Network and Information Security
Abstract:

Price manipulation attacks manipulate the on-chain prices of decentralized financial (DeFi) projects by altering the digital asset stock, thereby attacking their liquidation mechanisms to achieve improper profits. Nowadays, price manipulation attacks have emerged as the most significant security threats to the current decentralized financial ecosystem. To defend from the price manipulation attacks, the oracle obtains the exchange prices from the real world, which are difficult to manipulate. However, the maintenance expense of the oracle is very high due to frequent on-chain data update, making it challenging to meet industrial demand. To address these issues, we propose a defense mechanism against price manipulation attacks. This mechanism utilizes off-chain prices to guide the identification of on-chain price manipulation behaviors and intercepts price manipulative transactions through a contract proxy. The mechanism reduces the frequency of price submissions and the cost of updating off-chain data on-chain through low-frequency price feeding. This compromise aims to balance the cost of defense against price manipulation attacks with the precision of identification. Based on the experimental findings, we have conclusively demonstrated that our innovative method substantially diminishes the overall maintenance cost by over 30%, concurrently achieving an outstanding success rate of 97.5% in effectively safeguarding against price manipulation attacks.

Abstract:

Multi-objective security games (MOSGs) aim to simultaneously optimize the defender’s payoff against multiple heterogeneous attackers, which is of great importance in practical applications. Recently, the space discretization based evolutionary search (SDES) framework has been proposed to transform the constrained high-dimensional step function optimization problem in MOSG into a low-dimensional combinatorial optimization problem and to solve the combinatorial optimization task using a greedy strategy. Although SDES can address large-scale MOSG tasks in time, it is difficult for SDES to converge to the optimal Pareto front, especially in the large-scale scenario. On the one hand, the convergence assumption of the greedy strategy of SDES becomes difficult to be satisfied with the scale of MOSG tasks. On the other hand, SDES uses multiple number of stage components, including spatial discretization, evolutionary optimization, evaluation, and refinement components. This poses a risk of stage coupling, where upstream components’ optimization quality directly affects downstream components’ performance. To address these issues, we exploit and utilize the priority prior of the protected targets in MOSG task to improve the quality of solutions and simplify the SDES framework, resulting in the SDES-P framework. SDES-P redesigns the evaluation component, which is the core component of SDES, and removes the refinement component. Specifically, SDES-P starts from the infeasible solution with the maximum resources. Then, SDES-P divides the protected targets into two groups based on the priority prior, and the higher-priority group gradually releases resources to find feasible solutions. Finally, SDES-P contains an evolutionary local search strategy combined with priority prior knowledge to enhance the quality of the final Pareto front. We reveal that SDES-P can maintain the advantages of low sample complexity and strong scalability of SDES, and the experimental results show that regardless of whether the convergence assumption is satisfied, SDES-P can find high-quality Pareto fronts with better convergence and diversity compared with SDES.

Abstract:

Federated learning with user-level local differential privacy (ULDP) has attracted considerable research attention in recent years. The trade-off among federated data types, the mechanism of clipping local updates, the allocation of privacy budget, and user dropout directly constrain the accuracy of the global learning model. All existing federated learning methods are vulnerable to handling these problems. To remedy the deficiency caused by the current methods, we employ ULDP to propose an efficient algorithm, called ULDP-FED, to achieve global federated optimization. ULDP-FED can simultaneously handle IID and non-IID federated data types. Compared with those methods with fixed clipping thresholds, ULDP-FED uses a threshold dynamic decay strategy to balance the noise error caused by the Gauss mechanism and the bias caused by update clipping. To allocate the privacy budget of each user carefully, in each round, ULDP-FED relies on the similarity to replace the current local update with the historical noise updates. If the historical updates are obtainted, the user only sends the index of the historical update to the server, which can reduce the communication cost. ULDP-FED is compared with existing methods over MNIST and CIFAR 10 datasets. The experimental results show that our algorithm outperforms its competitors, and achieves the accurate results of federated learning.

Abstract:

Recent economic advancements have significantly supported the popularity of indoor positioning systems (IPS) and indoor localization-based services (ILBS). This trend is particularly obvious as global navigation satellite systems (GNSS) are ineffective in indoor environments. Traditional IPS, such as WIFI and Bluetooth positioning, face challenges like low accuracy and are prone to non-line-of-sight (NLOS) and noise interference. In response to this issue, we propose a novel near-ultrasonic robust indoor localization method based on the stacking ensemble model. Initially, the method employs an optimized enhanced cross-correlation technique to effectively mitigate multipath interference in acoustic ranging. Compared with the conventional methods based on peak extraction or fixed thresholding, this approach significantly improves ranging accuracy in reverberant environments. Subsequently, time difference of arrival (TDOA) is extracted as a feature. Finally, we utilize a stacking ensemble learning model, incorporating optimized machine learning models, to train a pre-set dataset. This method, integrating the extracted feature, enables to achieve correct localization results in NLOS and large ranging error. Numerical simulations, ray-tracing acoustic analyses, and empirical validations suggest that our approach notably mitigates errors prevalent in NLOS and acoustically noisy indoor environments and yielding localization accuracy significantly exceeds current methods by 50%−90%. The core dataset available at https://github.com/ChirsJia/JSJYF.

High Performance Computing
Abstract:

Serverless computing provides developers a cloud computing paradigm, which does not require that developers focus on the server operation and hardware resource management in the context of the popularity of container technology and micro-service framework. At the same time, serverless computing can adapt to dynamic load changes in real time through elastic expansion and contraction, which can effectively reduce the request response delay and the service cost, and meet the customer’s demand for pay-as-you-go cloud service expense. However, serverless computing faces the issue of cold start delay caused by the demand for elastic expansion and contraction. Creating the instances of warm-up function in advance can reduce the frequency and delay of cold start effectively. Nevertheless, the traffic burst problem in the cloud environment greatly increases the difficulty of predicting the number of warm-up function instances. To solve the above-mentioned challenges, a probability distribution based auto-scaling algorithm (PDBAA) is proposed. By using the historical data of monitoring indicators to predict the probability distribution of future requests, the optimal number of warm-up function instances is calculated for minimizing the request response delay. PDBAA can effectively combine the powerful prediction capability of deep learning technology to further improve performance. Under the Knative framework, the performance of PDBAA is verified by NASA and WSAL datasets. The simulation results show that, compared with the Knative auto-scaling algorithm and other prediction algorithms, PDBAA improves the elastic performance by over 31%, and reduces the average response time by over 16%, which can better solve the traffic burst problem, and effectively reduce the response delay of serverless computing requests.

Abstract:

TRSM (triangular matrix equation solver) is a commonly used algorithm for solving systems of linear equations, and is the core algorithm of various scientific computing libraries and mathematical software, which is widely used in the fields of scientific computing, engineering computing and machine learning. The small-scale irregular TRSM algorithm limits the scope of problem-solving and is an algorithm for efficiently handling smaller-scale, irregular data inputs. With the development of personalization and refinement in the field of high-performance computing, the demand for small-scale irregular TRSM computation in the scientific and industrial communities is becoming more and more obvious. While traditional algorithms are better suited for large-scale and regular TRSM computation, there is still room for improvement in the computational efficiency of small-scale and irregular TRSM. In this paper, we propose a small-scale irregular TRSM optimization scheme by combining hardware architecture and application scenario characteristics, designing a high-performance kernel from the perspectives of register chunking, boundary processing, and vectorization computation, and constructing an algorithmic library of small-scale irregular SI_TRSM (small-scale irregular TRSM) covering double-precision real numbers and double-precision complex numbers based on which the performance of this algorithm is greatly improved. Based on experimental results, the double-precision small-scale irregular TRSM algorithm library developed in this paper has shown to enhance the average performance of double-precision small-scale irregular real numbers by 29.4 times, and double-precision small-scale irregular complex numbers by 24.6 times in comparison with similar algorithms available in the MKL (Intel math kernel library).

Abstract:

Convolutional neural network (CNN) has become one of the most important machine learning technologies in the field of image recognition. In recent years, with the increasing demand for CNN deployment at the mobile edge, the lightweight of CNN has become a research hotspot. The mainstream CNN lightweight methods include pruning and quantization, both of which can effectively reduce the calculation and storage overhead for CNN inference. However, none of these methods fully exploits the bilateral sparsity (weight sparsity and activation sparsity) and potential data-reuse in CNN. Therefore, in order to solve these problems, we propose a new neural network lightweight method, and the k-means algorithm is used to cluster the non-zero values of convolution kernel and feature map, and the CNN inference only uses limited cluster values as multipliers to complete all convolutional calculations. Compared with the computational complexity of original convolutional layer O(n3), the computational complexity of the convolutional layer after lightweight is O(n2), effectively reducing the amount of computation. Similarly, the non-zero weights of the fully connected layer are also clustered, and only the cluster values and corresponding index vectors are stored on chip, which significantly reduces the storage overhead. Finally, a customized architecture KCNN is designed for this lightweight method. The architecture modularizes the different processes of the CNN, and compared with the previous accelerator, a non-zero clustering module is added. In addition, some caches are added to make use of the data-reuse in the CNN after clustering. The experiment results show that without losing the derivation accuracy, the overall calculation of AlexNet network is reduced by 66%, and the storage expense is reduced by 85%.

More+

Journal Dynamics More+

Top ViewMore+

Top DownloadMore+

Top CitedMore+