• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Publish Online Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes /issues, but are citable by Digital Object Identifier (DOI).
Abstract:

Research on knowledge-grounded dialogue often suffers from the problem of external knowledge containing redundant or even noisy information irrelevant to the conversation topic, which leads to a degradation in the performance of the dialogue system. Knowledge selection becomes an important approach to solve this issue. However, existing work has not yet investigated in depth some issues involving it such as how to design a knowledge selector, how to exploit the selected knowledge, what are the suitable scenarios for the knowledge selection conversation methods, etc. In this paper, we propose a new neural conversation method based on conditional variational attention knowledge selection and a pre-trained language model. This method employs a knowledge selection algorithm based on conditional variational autoencoder (CVAE) and a multi-layer attention mechanism to pick up the most relevant textual knowledge collection to the current conversation, which effectively exploits the dialogue response in training data to improve the efficiency of knowledge selection. Our novel model adopts the pre-trained language model Bart as encoder-decoder architecture and incorporates selected textual knowledge into the Bart model to fine-tune it during the training process. The experimental results show that the model proposed, in contrast to the current representative dialog models, can generate more diverse and coherent dialogue responses with higher accuracy.

Abstract:

Personalized learning resource recommendation is derived from identifying learners’ interests, recommending interesting and relevant learning resources accordingly. However, learners’ interests are influenced by various factors such as knowledge points, learning resources, and courses, which makes it a challenging task to accurately represent their interests. Additionally, these interests evolve dynamically over time, complicating the task of identifying learning interest patterns. To address this challenge, we propose a learning resource recommendation method based on spatio-temporal multi-granularity interest modeling, which is characterized as follow: An innovative architecture is designed and implemented for learning interest representation that integrates the learning space and temporal dimension in a heterogeneous graph-based learning space and the multi-granularity interest representation. The nodes in this graph represent entities, such as knowledge points, learning resources, courses, teachers, and schools; and the edges of the graph represent the inter-entity relationships. A graph neural network is utilized to express the multi-granularity interest in these nodes. Moreover, we propose a temporal multi-granularity interest pattern representation method by combining multi-dimensionality of time, learning space, and course preference, and slicing through the sequence of learner’s historical behaviors is used to mine the learner’s different granularity of interest patterns in the near-term within-course, mid-term across-course, and long-term across-course. Then, a multi-granularity interest adaptive fusion layer is proposed to fuse multi-granularity interest representations and interest patterns. Based on this method, a multi-granularity interest self-supervision task is designed to solve the problem of lack of supervised signaling for spatio-temporal multi-granularity interests, and recommend relevant learning resources for learners via prediction layer. Our experimental results show that the proposed method outperforms the optimal comparison algorithms HinCRec in Recall@20 and NDCG@20 metrics by 3.13% and 7.45% on MOOCCube dataset, respectively, and outperforms optimal comparison algorithm HinCRec in Recall@20 and NDCG@20 metrics by 4.87% and 7.03% on MOOPer dataset, respectively.

Abstract:

Airborne systems serve a pivotal function in the aerospace industry. Their exceptional safety requirements make the formal verification of software requirements a critical and pressing issue. However, with the ever-increasing complexity of airborne system requirements and the growing number of onboard devices, formal verification has encountered the serious challenge of state space explosion. To alleviate this problem, this paper proposes a novel approach for the modeling and compositional verification of airborne system requirements based on time partitioning. This approach utilizes the time dimension to decompose a complex verification system into mutually independent components, enabling the independent verification of each component and subsequently synthesizing the overall verification results of the entire system. The feasibility and practicality of the proposed method are demonstrated through a real-world case study. The evaluation further shows that the proposed method not only enables the verification of systems that are beyond the capability of traditional monolithic verification approaches and effectively alleviates state space explosion, but also avoids false alarms that may arise from neglecting time-based partitioning. This innovative method provides a new and promising technical path for the formal verification of software requirements in airborne systems, contributing to enhanced verification accuracy and efficiency, and ultimately ensuring the operational safety and reliability of airborne systems.

Abstract:

In order to resist attacks of quantum computing and protect the privacy and data security of underwater nodes, a multi-party key encapsulation mechanism based on the Internet of underwater things is proposed using the difficulty assumption of NTRU cryptographic system.Firstly, the pseudo-identities of device serial numbers are generated by combining ocean sensor acoustic sequences and underwater acoustic waveform factorization, and a verifiable identity ocean acoustic message code is designed.Secondly, a key generation algorithm suitable for underwater communication is designed using orthogonal frequency division multiplexing (OFDM) frequency domain oversampling technique and number theoretic transform (NTT).On this basis, a multi-party public key encryption algorithm with indistinguishability and anonymity under chosen plaintext attack (IND-Anon-CPA) security is constructed using identity-bound hybrid encryption and ocean noise-based obfuscation operation, Thirdly, a SeaFO transform based on ocean noise is introduced to develop a multi-party key encapsulation algorithm with indistinguishability and anonymity under chosen ciphertext attack (IND-Anon-CCA) security without full re-encryption process.Finally, a novel session key update mechanism is devised where autonomous underwater vehicles verify ciphertext components and assess pseudo-identity and timestamp validity using OFDM subcarriers. The new session keys are decapsulated using SeaFO transform, which not only achieve implicit rejection in multi-party environments and but also thwart adversaries’ adaptive corruption on SeaNTRU.Security analysis demonstrates that SeaNTRU has the characteristics of resistance to key replacement attack, replay attack, and man-in-the-middle attack.Experimental results show that SeaNTRU has lower computational cost and communication overhead than the existing schemes.

Abstract:

Congestion control is one of the key technologies for realizing high-performance data center networks, and it affects important network performance indicators such as throughput, latency, and packet loss rate. Over the past 20 years, with the continuous expansion of the scale of data centers and the increasing requirements of upper-layer applications for network performance, the deployment of Remote Direct Memory Access (RDMA) technology based on lossless underlying networks has received widespread attention within the industry. However, the Priority-Based Flow Control (PFC) mechanism, while maintaining a lossless network, will introduce problems such as head-of-line blocking, leading to a decline in network performance or even network paralysis. As a crucial auxiliary means for achieving a lossless network, how to design a practical RDMA congestion control mechanism has become a hot issue. By dividing the congestion control process into congestion awareness and congestion regulation, this paper comprehensively reviews the research achievements in this field: Firstly, from the perspectives of explicit feedback and latency, different representative algorithms for congestion awareness are elaborated and summarized in detail; Secondly, representative algorithms for congestion regulation are introduced in detail from the dimensions of rate and window, and their advantages and disadvantages are summarized; Some optimization work of algorithms and congestion control algorithms based on reinforcement learning methods are supplemented; Finally, the existing challenges in this field are summarized and discussed.

Abstract:

Aiming at the problems of Pareto dominance failure, weak optimization ability, and slow convergence speed of multi-objective firefly algorithm in solving many-objective optimization problems, we propose a many-objective firefly algorithm based on reference point guidance and multiple cooperation strategies (MaOFA-RR). This algorithm presets a set of uniformly distributed reference points in the objective space. By examining the distance relationship between fireflies and reference points, it distinguishes between guide fireflies and ordinary fireflies to replace Pareto dominance, thereby increasing selection pressure. Three evolutionary strategies are employed to update the positions of the fireflies: guide fireflies explore the local space, while ordinary fireflies learn from guide fireflies or explore the global space based on a distance threshold, enhancing the algorithm’s optimization capability and convergence speed. Finally, the algorithm ingeniously integrates the idea of opposition-based learning. In the process of opposition-based learning, for each firefly in the population, its opposite position is calculated in the solution space. By adding these opposite solutions to the original population, the algorithm effectively expands the scope of population search, significantly improving the possibility of discovering better solutions. We conduct comprehensive experiments by comparing MaOFA-RR with 8 recent many-objective evolutionary algorithms. The experimental results show that MaOFA-RR exhibits efficient performance in handing many-objective optimization problems.

Abstract:

In recent years, the rapid advancement of deep learning technology has introduced innovative solutions to the field of facial de-identification. Compared to traditional image processing techniques, deep generative models have demonstrated significant advantages in this domain, including high-quality image generation and robust model performance. This article reviews and synthesizes the theoretical explorations and research outcomes of deep learning technology in addressing facial de-identification challenges. It begins by outlining the network architectures and fundamental principles employed in deep learning for facial de-identification. It then delves into the de-identification methods based on these technologies, covering key techniques such as facial swapping and feature perturbation, and introduces the standard experimental metrics used to evaluate these methods. Furthermore, the article summarizes the main challenges currently faced by the technology, such as the stability of posture and expression, attribute disentanglement, and the adaptability to video applications, and looks forward to the pressing issues that future research needs to address. Ultimately, this article emphasizes the importance of deep learning technology in the field of facial de-identification and points out the direction for future research. It aims to provide readers with in-depth insights into the field of facial de-identification and inspire new ideas and directions for future studies.

Abstract:

Feature selection is an effective technique of dimensionality reduction in the field of machine learning. In the era of big data, data security has become an issue of great concern nowadays, and how to perform the feature selection task under the premise of privacy protection is a challenging scientific problem that needs to be solved urgently. Rough hypercuboids is an uncertainty approximation computational model combining rough set theory and hypercuboid learning, which provides an efficient feature selection method for numerical approximate classification problems by introducing supervised information granulation technique and multiple feature evaluation criteria. In this paper, we propose a novel multi-party federated feature selection algorithm under privacy protection, based on the rough hypercuboids model and particle swarm optimization algorithm. Firstly, a centralized (client/server) federated feature selection architecture for multi-party participation is established. Based on the architecture, the rough hypercuboid model and the particle swarm optimization algorithm are used to search the optimal feature subset on the client, and a novel global feature subset evaluation strategy for multiple participants is proposed on the server. Then, the ability of the proposed algorithm to select features in collaboration with multiple participants is improved by designing a particle initialization strategy in a federated environment. Finally, experimental results on the twelve UCI benchmark datasets show that compared with the other six traditional feature selection algorithms, the subset of features selected by the proposed algorithm has a higher classification performance on each participant under the premise of satisfying the data privacy protection.

Abstract:

Existed methods for skeleton-based human action recognition always ignore motion domain knowledge, resulting in the lack of interpretability of logical decision-making that human can understand. In this paper, we propose a novel skeleton-based human action recognition method by fusion of domain knowledge and adaptive spatio-temporal transformer, to improve recognition performance and interpretability. Firstly, inspired by the short-term motion knowledge, a temporal multi-branch structure is designed to learn and capture the characteristics of short-term sub-acitons. Secondly, a dynamic information fusion module is proposed to learn the weight vectors of different temporal branches, and then fuse multiscale short-term motion features. Finally, to learn the relationship between different sub-actions and facilitate the motion information interaction between skeleton joints, a multiscale temporal convolution feature fusion module is proposed to capture the long-term motion correlations, by integrating the domain knowledge of the long-term motion. Experimental evaluations are conducted on four large action datasets, including NTU RGB+D, NTU RGB+D 120, FineGym, and InHARD. The experimental results show that the recognition performance of the proposed method is superior to several data-driven methods, effectively improving the modelling ability of short-term motion feature learning and information interaction between skeleton joints, with the interpretability.

Abstract:

As the number of parameters in deep learning models continues to increase, the cost of training also keeps rising. To reduce training costs, using spot instances provided by cloud service providers has become a viable solution. Spot instances are priced at only 30% of normal instances, which can significantly lower training costs. However, despite the low cost of elastic instances, there is a risk of them being reclaimed at any time, posing new challenges to the stability of the model training system. To address the fault tolerance issue in the context of elastic instances, existing work mainly falls into two categories: checkpoint-based and redundancy-based fault tolerance. Checkpoint-based solutions incur substantial overhead, while redundancy-based solutions impose certain limitations on the model’s parallelism strategy, leading to suboptimal training efficiency. This paper proposes a solution for training in the context of spot instances that leverages the grace period of spot instances to back up training progress, thereby reducing fault tolerance overhead. It also employs the bottleneck alleviation approach to adjust the parallelism strategy, maximizing the use of available cluster resources and enhancing training efficiency. Experimental results show that this solution not only achieves low-cost fault tolerance but also ensures training efficiency, allowing for efficient completion of model training tasks in the context of spot instances and reducing overall training costs.

Abstract:

Diffusion models have gained significant attention in recent years due to their potential in various generation tasks, including image and text generation. However, the widespread use of these models has also raised concerns regarding data privacy, particularly the vulnerability of these models to membership inference attacks (MIA). These attacks aim to determine whether a specific data point was part of the model’s training set, posing significant risks to privacy. This paper provides an overview of the latest developments in privacy protection for diffusion models, with a specific focus on MIAs and their challenges. Existing MIA methods often struggle with balancing the effectiveness of short-term and long-term attacks, and their applicability to diffusion models has not been thoroughly explored. To address these issues, we propose a novel temporal membership inference attack method designed to enhance the attack success rate (ASR) for both short-term and long-term attacks. The proposed method leverages gradient information from noise during short-term attacks and temporal noise patterns to bolster the effectiveness of long-term attacks. Experimental results demonstrate that our method improves the ASR by approximately 5% for short-term attacks and 1% for long-term attacks compared to conventional approaches on common diffusion models. This work contributes to the ongoing efforts to understand and mitigate privacy risks in diffusion model applications.

Abstract:

Artificial intelligence applications require highly advanced high-speed video imaging technologies to perceive the surrounding environment better. Deep learning-based snapshot compressive imaging (SCI) offers a promising solution. How to reconstruct high-speed videos from observed data using deep learning techniques is a frontier hotspot in the field. However, existing reconstruction methods focus on mining prior information, neglecting the direct influence of masks and image textures on reconstruction difficulty. There is still room for further improvement in reconstruction quality. To address this issue, we propose a reconstruction difficulty perception-based SCI (RdpSCI) method. Based on the observation that masks and image textures jointly determine the information contained in the observed data, RdpSCI first proposes to explore the correlation between masks, image textures, and reconstruction difficulty, guiding deep networks for reconstruction. Specifically, it introduces an improved residual dense network (I-ResDNet) module, innovatively incorporating channel shuffling operations into ResDNet to reduce the dependency of feature fusion effects on channel partitioning methods. The proposed I-ResDNet also introduces a reconstruction difficulty weight vector to guide feature fusion, enhancing feature fusion capability without significantly increasing model parameters. Experiments show that compared to the state-of-the-art methods STFormer and EfficientSCI, RdpSCI achieves improvements of 0.68 dB and 0.54 dB in reconstruction quality on benchmark grayscale and colour datasets, respectively.

Abstract:

Knowledge distillation, as a key technique in deep learning, achieves model compression and acceleration by transferring knowledge from a large teacher model to a smaller student model. Under the premise of maintaining performance, this technology significantly reduces the requirements of computational resources and storage, and facilitates the deployment of high-performance models on resource-constrained edge devices. Firstly, this paper provides a systematic review of the recent research in knowledge distillation and categorizes it from two perspectives: the type of knowledge and teacher-student model architectures. We comprehensively summarize the distillation methods based on three typical types of knowledge: output feature knowledge, intermediate feature knowledge, and relational feature knowledge, as well as distillation methods based on CNN to CNN architecture, CNN to ViT (vision Transformer) architecture, ViT to CNN architecture, and ViT to ViT architecture. Next, the paper explores various learning paradigms such as offline distillation, online distillation, self-distillation, data-free distillation, multi-teacher distillation, and assistant distillation. Then, the paper summarizes distillation optimization methods based on the distillation process, knowledge structure, temperature coefficient, and loss functions. It analyzes improvements in distillation brought by adversarial techniques, automated machine learning, reinforcement learning, and diffusion models, and concludes with the implementation of distillation technology in common applications. Despite significant advancements in knowledge distillation, numerous challenges remain in both practical applications and theoretical research. Finally, the paper provides an in-depth analysis of these issues and offers insights into future development directions.

Abstract:

Software vulnerabilities pose a serious threat to the safe and stable operation of computer systems and software, so the research related to their automatic detection has been receiving extensive attention. Unlike traditional static vulnerability detection tools that use predefined rules provided by human experts to analyze the code, graph neural network (GNN)-based vulnerability detection methods have surpassed the traditional methods in some datasets by automatically learning the vulnerable code patterns. However, in the current GNN-based vulnerability detection methods, the design of GNN model is not combined with the characteristics of the code itself, which leads to poor detection effect on the real vulnerability code dataset. In this paper, we propose a learnable hierarchical graph representation vulnerability detection method LHG-VD, which is characterized by proposing a learnable readout function for the limitation of the traditional readout function, and designing a cross-granularity loss function based on the idea of comparative learning for the problem of maintaining the local structural information of the code in the process of graph pooling. Experimental results on real vulnerability datasets show that the F1 value of LHG-VD is 71.5%, which is improved by 4.9% compared with DeepWukong, a slice-level detection method, and 8.9% compared with AMPLE, a function-level detection method.

Abstract:

As one of the most distinctive representations for distributed computing, an end-edge-cloud collaborative computing system is capable of effectively bringing such applications as Internet of Things, large-scale language models, and digital twins into real life. By mimicking the neural activities that take place in the human brain, brain-inspired computing has various advantages, including energy efficiency, high speed, high error-tolerance, and desirable scalability. Through leveraging the event-driven mechanism and the sparsity in spike generation of spiking neural networks, the real-time processing capability and energy efficiency of an end-edge-cloud collaborative computing system can be significantly improved. In this paper, a distributed computing-oriented ISA design for brain-inspired intelligence CPU is studied. Bearing in mind the requirements of delay-sensitive, low-power, and high diversity for end devices, we focus on the software-hardware interface, i.e., ISA, and propose a hardware design that is rooted in the current systems, easy to implement and upgrade, safety-aware and self-controllable, and compatible with heterogeneous architectures. Along with a corresponding CPU micro-architecture design, a dozen of instructions specifically conceived for brain-inspired computing are proposed based on a well established ISA, which sets the stage for empowering distributed computing systems with brain-inspired computing. We firmly believe that this paper has discovered a fruitful research field, and sincerely hope that more interests can be stimulated in ISA designs for brain-inspired intelligence CPU in both academic and industrial communities.

Abstract:

The high-performance global routing results can effectively meet the design specifications and greatly improve the efficiency of the detailed routing phase. As the signal transmitted in the chip increases through the bus, the bus gradually becomes a key factor affecting the performance of the chip. If the bus topology structure is not considered during the global routing stage, coupling phenomena will occur when the bus transmission information is transmitted, resulting in a large timing deviation of signal transmission. Therefore, in order to optimize the consistency of bus topology structure in 2D global routing, an effective multi-strategy bus topology aware global routing algorithm is proposed. First, a topology reconstruction strategy based on congestion is designed to optimize the two-pin net and improve the utilization rate of routing space effectively. Second, a heuristic reroute model is constructed to realize the reroute of multi-signal bit bus. Third, a routing algorithm consider topology structure is added to the rip-up and reroute model to adjust the topology structure of the same bus net group and improve the consistency of bus topology. Finally, an iterative method to adjust the cost of bus topology is designed to further optimize the consistency of bus topology. Experimental results show that the proposed algorithm can effectively optimize the bus topology consistency of 2D routing solution.

Abstract:

Online mental health forums have become an important carrier of mental health services. Detecting psychological distress from a vast number of posts is the basis for psychological intervention. Fully utilizing the social relationships of seekers is conducive to judging their mental health status. However, most existing methods rely on explicit social relationships. They fail to pay attention to the psychological support relationships between patients and doctors (seekers and supporters). These relationships are based on the patient’s personal experiences, symptom causes, self-cognition and psychological support expertise. Thus, this paper takes suicidal ideation as the detection target and proposes Post-User Psychological Support Heterogeneous Graph (PU-PSHG). PU-PSHG is used to represent the semantics of posts and the doctor-patient relationships between seekers and supporters in online mental health forums. According to PU-PSHG, Graph-enhanced Suicide Ideation Detection (GSID) model is proposed. Firstly, based on the definition of psychological support relationships, the semantics of two meta-paths (user-to-user and user-to-post) are defined, and PU-PSHG containing users and posts are constructed. DeepWalk algorithm is used to learn doctor-patient relationships or community relationships from PU-PSHG. Then, the representation of psychological support relationships is learned through relational features. Besides the post semantics and doctor-patient relationships are fused based on heterogeneous relationships. Finally, suicide ideation detection is performed based on the representation of posts. Experimental results on CLPsych2017 shaRed task show that GSID has better performance compaRed with existing methods. CompaRed to C-GraphSAGE, GSID improved by 7.8%, 4.8%,1.4% in Non-green F1, All F1, All Acc, respectively. Ablation experiments found that removing three different types of relationships (the reply relationship between posts, the psychological support relationship between users and posts, the psychological support relationship between users and users) from PU-PSAG led to decreases in Non-green F1 of 3.04%, 3.80%, 6.17% respectively.

Abstract:

Self-supervised learning has emerged as a promising approach in addressing the issues of label dependency and poor generalization performance in traditional graph neural networks (GNNs). This method leverages the inherent structure and properties of the data to generate supervisory information without relying on extensive labeled datasets. However, most existing self-supervised learning methods hold the assumption that the graphs are homophilic and consequently fail to generalize well to graphs with high heterophily, where connected nodes may have different class labels and dissimilar features. In this paper, we study the problem by developing an asymmetric self-supervised learning on non-homophilic graphs (MIRROR) framework, which is easy to implement and does not rely on random graph augmentations and homophily assumptions. Inspired by the designs of existing heterophilic graph neural networks, MIRROR learns the node representations by capturing one-hop local neighborhood information and informative distant neighbors. Such two properties are obtained through carefully designed pretext tasks that are optimized based on predicted neighborhood context and estimated high-order mutual information. Extensive experiments on various types of graph benchmarks demonstrate that our proposed framework can achieve better performance compared with competitive baselines on non-homophilic graphs. The superiority in multiple downstream tasks also demonstrates the generalization capability of MIRROR.

Abstract:

Digital signatures play a critical role in information security; however, traditional digital signature algorithms are at risk of becoming obsolete in the post-quantum era. SPHINCS+, as a digital signature framework resistant to quantum computing attacks, is expected to become increasingly important in this new era. Nevertheless, the relatively slow computational speed of SPHINCS+ poses challenges in meeting the high throughput and low latency demands of modern cryptographic applications, significantly limiting its practicality. This paper presents an efficient optimization strategy based on a domestic DCU (Deep Computing Unit) to accelerate the SPHINCS+ algorithm instantiated with the domestic SM3 hash function. By enhancing memory copy efficiency, optimizing the computational processes of SM3 and SPHINCS+, and employing optimal computational parallelism, we implemented the 128-f mode of SPHINCS+-SM3 on the DCU. Experimental results demonstrate that, compared to traditional CPU implementations, our DCU-based implementation achieves a significant increase in throughput, improving signature generation and verification by 2603.87 times and 1281.98 times, respectively. This substantial improvement in computational efficiency and practicality enhances the feasibility of SPHINCS+ and advances the domestic adoption of post-quantum cryptographic algorithms. In scenarios involving high data traffic and large volumes of signature requests, the DCU implementation exhibits significant performance advantages over CPU implementations.

Abstract:

Relation reasoning, an important task in natural language processing, aims to infer possible semantic relations between two or more entities. The reasoning process typically involves deriving new relations from known relations between entities, and the results can be widely applied in various downstream tasks such as knowledge graph completion, relation extraction, and commonsense knowledge question answering. Previous studies often face two main limitations: First, they are primarily based on the closed-world assumption, meaning the relation types are predefined and difficult to expand. Second, even if some methods focus on open domains, they typically only handle 1-hop reasoning, which is insufficient for complex multi-hop reasoning scenarios. To address these issues, we propose and define an open domain 2-hop relation reasoning task and construct a dataset for evaluating this task. Furthermore, we introduce an open domain 2-hop relation reasoning framework, named ORANGE (open domain relation reasoning method on generative model), which includes 3 key modules: entity generation, relation generation, and result aggregation. Firstly, the entity generation module generates unknown entities. Secondly, the relation generation module proposes potential new relations. Finally, the result aggregation module integrates the outputs of the preceding modules to determine the final result. Experimental results demonstrate that, when compared to the best existing methods, our approach achieves a 10.36% improvement in the average score. Moreover, when employing ORANGE’s 3-module relation reasoning framework with large language models, it surpasses the conventional in-context learning prompt strategy, showcasing a 9.58% enhancement in the average score.

Abstract:

In crowd-sensing systems, users complete sensing tasks by providing data, but variations in sensor precision, user behavior, and environmental factors lead to significant differences in data quality. Truth discovery techniques mitigate the impact of low-quality data by aggregating multiple user inputs with weighted mechanisms. However, existing methods often overlook personalized privacy needs, exposing sensitive user information. Meanwhile, encryption-based privacy protection mechanisms, though secure, suffer from high computational and communication overhead, making them impractical for large-scale crowd-sensing applications. To address these challenges, we propose Personalized Differential Privacy Truth Discovery (PDPTD), a framework that integrates local differential privacy (LDP) with truth discovery, ensuring both strong privacy protection and high-quality data aggregation. Specifically, PDPTD employs a randomized response mechanism, enabling users to dynamically adjust their data perturbation levels based on personalized privacy budgets. This flexible approach balances privacy protection and data utility, allowing users to contribute valuable data while safeguarding sensitive information. On the server side, PDPTD incorporates a weighted aggregation strategy that compensates for information loss caused by perturbation, effectively improving inference accuracy. Additionally, PDPTD introduces a dynamic user weighting mechanism, which assigns weights based on data quality. This ensures that even if some users select higher perturbation levels, the system can still infer results close to the true values, maintaining data reliability and consistency.Theoretical analysis and experimental results demonstrate that PDPTD complies with LDP principles while ensuring high accuracy in final inferred results. This makes it a practical and efficient solution for large-scale crowd-sensing applications, where balancing privacy, data utility, and computational efficiency is crucial.

Abstract:

In recent years, it has become a trend to introduce deep neural network (DNN) into mobile devices. Many applications that facilitate daily life, such as voice assistants and activity recognition, have been integrated into smartphones, wearable devices, and embedded systems. However, it is challenging to deploy CPU-bound DNN on mobile devices with limited resources, such as computing power, storage, and battery. Existing methods, such as manually designed DNN compression techniques and automated on-demand DNN compression techniques, are limited to optimizing model structures. It restricts the upper limit of performance optimization for DNN deployment and makes it difficult to adapt to devices with extremely constrained resources. In addition, these statically pre-designed optimization methods do not consider the resource contention and dynamic demand characteristics of the deployment environment in mobile applications. The inability to adjust strategies in real-time under dynamic environment results in suboptimal accuracy-efficiency performance. To address these challenges, we propose AdaInfer, a runtime-scalable cross-layer optimization method for DNN. AdaInfer adaptively selects the optimal comprehensive deployment strategy for model layers, computational graph layers, and memory layers based on current hardware resource constraints and user performance requirements to optimize multiple performance metrics. It also adjusts the optimal strategy in real-time as the scenario changes. Specifically, we designed a scalable graph-computation structure that is model-agnostic and a corresponding cross-layer optimization strategy. These are capable of automatically adjusting to maximize deployment efficiency on heterogeneous devices. Then, we model the runtime adjustment problem of the algorithm-system cross-layer optimization strategy as a dynamic optimization problem and represent the dynamic environment through a set of runtime-varying resource constraints. We also propose an efficient search strategy to enhance the efficiency and quality of local online searches. In evaluations conducted on three types of mobile and edge devices, five models, and four continuously changing mobile scenarios, experimental results show that compared to previous work, AdaInfer reduces memory usage by up to 42.35% and latency by up to 73.89% without significantly affecting accuracy.

Abstract:

Object detection technology, as a pivotal component in computer vision, plays a vital role in diverse practical applications. Over decades of evolution, the field has progressed from early methods relying on handcrafted feature extraction to the widespread adoption of deep learning models. Currently, there remains a lack of systematic reviews tracing the developmental trajectory of object detection through improvements in deep learning foundation models. Addressing this gap, this paper organizes the technological evolution around the progression of foundation models in artificial intelligence. We systematically survey detection models built upon various foundation models, compare their strengths and weaknesses, and analyze improvement strategies. The paper also surveys evaluation metrics and technological advancements across different eras, with particular emphasis on how deep learning has driven remarkable performance gains. We discuss persistent challenges in handling diverse scenarios, improving real-time efficiency, and enhancing accuracy. Furthermore, we explore prospective research directions, including model generalization capabilities, computational efficiency, and integration with complex tasks, proposing potential enhancement strategies. This work aims to provide a clear perspective on technological evolution to facilitate further research and applications in object detection.

Abstract:
Anonymous credentials, as a privacy-enhancing identity authentication technology, can verify the validity of identities while protecting user privacy. They are widely used in digital identity management systems, e-government, and digital banking. The research on anonymous credential schemes that comply with Chinese commercial cryptography standards has also garnered attention. However, current anonymous credential schemes often rely on centralized issuing authorities, which not only limit their application in decentralized networks but also risk system failures and privacy breaches due to single points of failure. To address these issues, this paper proposes a decentralized anonymous credential system based on SM2 commercial cryptography. The proposed scheme leverages blockchain networks to replace credential issuing authorities and employs zero-knowledge proof algorithms to ensure the secure and reliable distribution of credentials. Additionally, the scheme allows users to disclose their attributes in a fine-grained manner to access resources or request services, thereby avoiding excessive privacy disclosure. This paper also explores the construction methods of zero-knowledge proofs that comply with Chinese commercial cryptography standards and proposes a set membership proof scheme based on SM2, providing a foundational tool for designing the SM2-based decentralized anonymous credential system. Security analysis shows that the SM2-based decentralized anonymous credential scheme meets unforgeability and anonymity. Experimental results indicate that the proposed scheme is efficient.
Abstract:
With the rapid progress of deep learning technology and the continuous exploration of massive datasets, the self-attention module has been widely applied in various fields such as natural language processing, computer vision, and large language models. Although the self-attention module significantly improves the detection accuracy of deep learning models, its huge computational demand makes it particularly difficult to deploy on computing devices with limited computing power. Integer quantization, as one of the key technologies for deploying models on low-power computing chips, faces the problem of high precision loss caused by the structural characteristics of the self-attention module. To address this issue, a thorough analysis of the integer quantization error in the self-attention module is conducted, and two methods, pseudo-softmax vector quantization and block-wise pseudo-softmax vector quantization, are proposed. These two methods aim to significantly improve inference speed while effectively reducing the error caused by integer quantization by performing special integer quantization on the softmax vectors in the self-attention module. Experimental results show that compared with traditional direct quantization methods, the pseudo-softmax vector quantization method can reduce the quantization accuracy loss by 50%, while the block-wise pseudo-softmax vector quantization method can further reduce the accuracy loss by approximately 90%. These results fully demonstrate the effectiveness of the two quantization methods in reducing precision loss, providing strong support for the efficient deployment of the self-attention module on devices with limited computing power.
Abstract:

LVLMs (Large Vision-Language Models) represent a significant advancement in the intersection of natural language processing and computer vision. By integrating pre-trained visual encoders, vision-language adapters, and large language models, LVLMs can understand both visual and textual information, and generate responses in natural language, making them suitable for a range of downstream vision-language tasks such as image captioning and visual question answering. However, these models commonly exhibit hallucinations — generating inaccurate perceptions of image contents. Such hallucinations significantly limit the application of LVLMs in high-stakes domains like medical image diagnosis and autonomous driving. This survey aims to systematically organize and analyze the causes, evaluations, and mitigation strategies of hallucinations to guide research in the field and enhance the safety and reliability of LVLMs in practical applications. It begins with an introduction to the basic concepts of LVLMs and the definition and classification of hallucinations within them. It then explores the causes of hallucinations from four perspectives: training data, training task, visual encoding, and text generation, while also discussing the interactions among these factors. Following this, it discusses mainstream benchmarks for assessing LVLM hallucinations in terms of task setting, data construction, and assessment metrics. Additionally, it examines hallucination mitigating techniques across five aspects: training data, visual perception, training strategy, model inference, and post-hoc corrections. Finally, the review provides directions for future research in the areas of cause analysis, evaluation, and mitigation of hallucinations in LVLMs.

Abstract:

The increasing capabilities of large language models (LLMs) in knowledge storage have underscored their potential utility as knowledge bases. However, it’s important to note that any given prompt can merely offer a lower-bound estimate of the knowledge encompassed by the language model. Prior prompt learning methods in the context of Language Models as Knowledge Bases (LMs-as-KBs) have overlooked the influence of query style. We have unveiled a significant revelation - there are indeed learnable preference within LLMs pertaining to query style. Leveraging this distinctive model characteristic, we introduce the Adaptive query style transfer (ARES) method to improve the performance of LMs-as-KBs by adapting LLM’s preference. ARES initiates by presenting a candidate set of queries, achieved through paraphrasing to incorporate various expression styles. Subsequently, an evaluator is trained to learn and discern LLM’s preferences for query styles, ultimately evaluating the candidate set and selecting the potentially optimal query. Experiments conducted across multiple datasets have convincingly demonstrated the efficacy of our approach in enhancing question answering accuracy on LMs-as-KBs scenarios. Furthermore, Incremental comparisons with the original model and three baseline methods show an average improvement of 2.26%, 1.68%, 1.19%, and 1.17%, respectively, indicating ARES can be effectively utilized in conjunction with other approaches, leading to enhanced performance and optimization across different dimensions.

Abstract:

In the era of large models, the training and inference of large models need the support of arithmetic resources, in which the anomaly detection of arithmetic resource data can effectively guarantee the training and inference of large models. As the parameters of the large model increase, the scale of the arithmetic resources used by the large model grows, in which the data of multiple types of metrics reflecting the operating state of computility show more complex temporal changes over time. Existing multivariate time series anomaly detection methods typically use a preset window size to perform sliding slicing on multivariate time series data. However, a unified window that ignores the periodic characteristics of different dimensions may truncate the complete periodic patterns of time series data in some dimensions, hindering the anomaly detection model from learning the normal patterns of multivariate time series data and resulting in poor anomaly detection performance. To address this issue, this study proposes an unsupervised multivariate time series anomaly detection method SELAD based on ensemble learning with multi-window extraction. Specifically, this method first extracts the periodic patterns of each dimension in the multivariate time series data based on the Fourier frequency method, and then performs multi-window extraction to preserve the complete periodic patterns of each dimension. In the process of model training, the huge number of parameters of the large model can solve the problem that the traditional model has a memory bottleneck when the sliding window increases, which leads to the deterioration of the learning effect. Subsequently, by designing a Mixed Expert Models (MoEs), the time series data from multiple partitioned windows are input into an ensemble learning framework that integrates large models and LSTM models for training, in order to learn and identify the normal temporal patterns of each dimension. Finally, anomaly detection is performed based on reconstruction scores. In this study, experiment results on four real-world multivariate time series datasets demonstrate that SELAD improves the F1 score by 17.87% to 90.77% compared to existing methods.

Abstract:

Named Entity Recognition (NER) is a traditional task in natural language processing. The mainstream approach for nested Named Entity Recognition is the span-based classification method, which involves concatenating representations of entity boundaries to obtain spans in general. However, long entities tend to lead to weakened semantic associations between two entity boundaries. In addition, single-scale spans cannot completely capture how entities behave in different contexts. To address this, an entity semantic enhancement method based on multi-scale box fusion is proposed. The method represents spans as boxes with boundary position information. At first, multi-scale boxes will be obtained by fusing features of different scales to enhance the semantic features in the boxes and make the boxes more context-dependent. Then, the boundary positions of the boxes are further refined by a position-weighted attention mechanism to make the box information more accurate. Finally, the entity category of the boxes and the offset of the boxes relative to the true entities are simultaneously predicted to effectively support the recognition and localization of nested named entities. The method achieves a new state-of-the-art F1 scores of 88.63% on the ACE04 English, 88.53% on ACE05 English and 73.86% on Weibo Chinese datasets, which demonstrates the effectiveness of the model.

Abstract:

Cancer is an exceptionally complex and highly heterogeneous disease with dynamic changes. Its occurrence and development are accompanied by a large number of gene mutations and functional disorders. Identifying biomarkers related to cancer stages is crucial for understanding the pathogenic and developmental mechanisms of cancer. However, the existing research on cancer biomarker recognition often treat individual genes as isolated nodes and usually only focused on the binary classification of cancer, ignoring the significant differences among different stages of cancer. To overcome the above issues, this study first constructs a RRN (regression residual network) for each cancer stage, and then analyzes the nodes and edges of RRN in each stage. After that, the multi-source data mining were conducted in biological pathways, and the entire process of cancer evolution was characterized along with stages. By doing this, both biomarkers for cancer binary classification and multi-stage classification were obtained, and they were validated on the GSE10072 and GSE42171, respectively. The experimental results showed that the obtained biomarkers ALDOA and NME1 achieved competitive accuracy like existing methods by use only two genes for lung adenocarcinoma, and the biomarkers consist of 17 edges achieved the improved accuracy by 14.86% by comparing with existing methods in terms of multi-stage classification.

Abstract:

Researchers have proposed and implemented many distributed runtime systems to help users build distributed applications. These distributed runtime systems are usually only good at processing certain types of loads in specific scenarios. However, in the things-edge-cloud scenario, the components of things-edge-cloud collaborative applications have heterogeneous quality requirements, runtime environments, and heterogeneous communication protocols, making it difficult to use one runtime to build high-performance and robust things-edge-cloud collaborative applications. Deploying application components independently to different runtimes will increase the difficulty of application management and lack of unified performance and fault-tolerance support. The Grip system is proposed to address the problem. The Grip system supports the unified access and utilization of multiple runtimes by introducing a virtual runtime adapter layer and a virtual runtime API layer. These two virtual layers specify the interfaces that need to be implemented when accessing a runtime. The Grip system supports the unified management of multi-runtime applications through Griplet and Grip abstractions. It utilizes ownership methods to provide mechanisms for supporting user-defined fault tolerance and scaling policy. Experiments show that in the things-edge-cloud environment, compared with using a single runtime such as Ray, Docker, and Kubernetes, the Grip system reduces the average end-to-end latency by 31% to 77%, the 90th percentile tail latency by 25% to 78%, and the 95th percentile tail latency by 22% to 78%.

Abstract:

Sequential recommendation systems aim to predict users’ next actions based on the preferences learned from their historical behaviors. There are still fundamental challenges for sequential recommender. First, with the popularization of online services, recommender need serve both the warm-start and cold-start users. However, the most existing models depending on user-item interactions lose merits due to the difficulty of learning sequential dependencies with limited interactions. Second, users’ behaviors in their historical sequences are often implicit and complex due to the objective variability of reality and the subjective randomness of users’ intentions. It is difficult to capture the dynamic transition patterns from these user-item interactions. In this work, we propose a graph-based interpolation enhanced sequential recommender with deformable convolutional network (GISDCN). For cold-start users, we reconstruct item sequences into a graph to infer users’ possible preferences. To capture the complex sequential dependencies, we employ the deformable convolutional network to generate more robust and flexible filters. Finally, we conduct comprehensive experiments and verify the effectiveness of our model. The experimental results demonstrate that GISDCN outperforms most of the state-of-the-art models at cold-start conditions.

Abstract:

Few-shot learning is the mainstream method for few-shot object detection, but it has serious shortcomings. 1) The extreme lack of new class samples leads to biased distribution of new class features; 2) Due to the robustness assumption during the fine-tuning process not necessarily applicable to new class samples, the feature extraction network is unable to extract unbiased new class sample features. To deal with the above two issues, three-stage fine-tuning few-shot object detection method based on cross-module knowledge distillation is here proposed. Firstly, the feature distribution calibration strategy is designed to calibrate the feature distribution of new class during the two-step fine-tuning process. Secondly, the proposed first bias reduction strategy, effectively alleviate the bias estimation problem of weight parameters in the linear probing (the first stage of the fine-tuning process), and the proposed inverse first bias reduction effectively alleviates the over-fitting problem of feature extraction network during overall fine-tuning (the second stage of the fine-tuning process). Finally, the proposed cross-module knowledge distillation strategy is utilized to guide the shallow modules of the model to learn deep features to capture more discriminative new class features. A large number of experimental results show that the proposed three-stage fine-tuning few-shot object detection method effectively improves the accuracy and robustness of few-shot object detection.

Abstract:

Pretraining data detection aims to determine whether a piece of text belongs to the pretraining data of a large language model (LLM) when the model’s pretraining data is not publicly disclosed, which can be used to audit whether the pretraining data usage complies with legal regulations. Existing methods generally assume that an LLM tends to assign higher token probabilities to training texts compared to non-training texts, and thus identify texts with high probabilities as training texts. However, due to the significant overlap in fragments between training and non-training texts, an LLM may also assign relatively high token probabilities to non-training texts, which makes existing methods prone to misclassifying non-training texts as training texts. Inspired by research on the memorization capabilities of LLMs, we propose a novel method to mitigate this issue by comparing the token probabilities given the full context with those given the short-range context, the contribution of long-range context to token probability increase is then computed, with a higher contribution indicating a greater likelihood of the text being part of the pretraining data. The key idea is that when an LLM predicts the token probabilities for training texts, the contribution of long-range context to token probability increase is greater than that for non-training texts. Experimental results on multiple datasets demonstrate the effectiveness of the proposed method. The code is available at https://github.com/zhang-wei-chao/Long-Range-Context-for-PDD.

Abstract:

Federated learning, as an emerging distributed neural network training paradigm in edge computing, faces a key challenge of data heterogeneity over clients, which significantly impacts global model performance. While clustering federated learning, serving as a promising solution, has shown improvements in model accuracy, its effectiveness is limited by the insufficient utilization of local data distribution statistics. To address this issue, this paper proposes a novel federated learning framework named HS-CFA (Hierarchical Sinkhorn Distance-Based Clustering Federated Algorithm), which is designed to optimize the aggregation weights of local models under the heterogeneous data environments, thereby enhancing the performance of the global model. It utilizes the entropy-regularized optimal transport cost to capture the characteristics of local data distribution and dynamically adjusts the aggregation weights of local models with a hierarchical clustering strategy. Specially, HS-CFA employs the Sinkhorn distance as a lightweight optimal transport cost metric to measure distributional dissimilarities across clients. Furthermore, it adopts a hierarchical two-layer clustering mechanism, combining density-based spatial clustering and average aggregation during the server training phase, facilitating the dynamic and adaptive adjustment of local model aggregation weights. Experimental results on multiple benchmark datasets demonstrate that HS-CFA significantly enhances the accuracy and robustness of the global model in scenarios with highly heterogeneous distribution settings.

Abstract:

Transformer has gradually become the preferred solution for computer vision tasks, which has promoted the development of its interpretability methods. Traditional interpretation methods mostly use the perturbation mask generated by the Transformer encoder’s final layer to generate an interpretable map. However, these methods ignore uncertain information on the mask and the information loss in the upsampling and downsampling processes, which can result in rough and incomplete positioning of the object area. To overcome the mentioned problems, a Transformer explanation method based on sequential three-way and attention fusion (SAF-Explainer) is proposed. SAF-Explainer mainly includes the sequential three-way mask (S3WM) module and attention fusion (AF) module. The S3WM module processes the mask by applying strict threshold conditions to avoid the uncertainty information in the mask from damaging the interpretation results, so as to effectively locate the object position. Subsequently, the AF module uses attention matrix aggregation to generate a relationship matrix for cross-layer information interaction, which is used to optimize the detailed information in the interpretation results and generate clear and complete interpretation results. To verify the effectiveness of the proposed SAF-Explainer, comparative experiments were conducted on three natural image datasets and one medical image dataset. The results showed that SAF-Explainer has better explainability. This work advances visual explanation techniques by providing more accurate and clinically relevant interpretability for Transformer-based vision systems, particularly in medical diagnostic applications where precise region identification is crucial.

Abstract:

At present, end-to-end speech neural codecs, represented by SoundStream, have demonstrated outstanding performance in reconstructed speech quality. However, these methods require extensive convolutional computations, leading to lengthy encoding times. To address this issue, this paper introduces a neural speech codec method based on Mel spectrogram and squeezed excitation-weighted quantization. This method aims to maintain high speech perceptual quality while reducing computational costs and increasing operational speed, thereby minimizing latency. Specifically, this paper utilizes Mel spectrogram features as input, capitalizes on the temporal compression properties during Mel spectrogram extraction, and combines a lower-layer convolutional encoder to simplify the computation process. Additionally, inspired by squeezed excitation network concepts, this paper extracts excitation weights for each dimension of the output features from the encoder’s final layer. These weights are used as the weighting coefficients for each dimension of the compressed features when calculating codebook distances in the quantizer, thus enabling the learning of correlations between features and enhancing the performance of quantization. Experimental results on the LibriTTS and VCTK datasets indicate that this method significantly enhances the computational speed of the encoder and improves the reconstructed speech quality at lower bit rates (≤3 Kbps). For instance, at a bitrate of 1.5 Kbps, the Real-Time Factor (RTF) of encoding computations can increase by up to 4.6 times. Regarding perceptual quality, at a bitrate of 0.75 Kbps, objective metrics such as Short-Time Objective Intelligibility (STOI) and Virtual Speech Quality Objective Listener (VISQOL) show an average improvement of 8.72% compared to the baseline. Additionally, ablation studies not only demonstrate that the optimization effect of compressed excitation weight methods is inversely correlated with bit rate, but also reveal that, compared to the periodic activation function Snake, the Relu activation function can significantly speed up processing while maintaining comparable speech perceptual quality.

Abstract:

Collaborative filtering-based recommender systems that only rely on single-behavior data often encounter serious sparsity problems in practical applications, resulting in poor performance. Multi-behavior recommendation (MBR) is a method that seeks to learn user preferences, represented as vector embeddings, from auxiliary behavior interaction data. By leveraging these preferences for target behavior recommendations, MBR can mitigate the data sparsity challenge and enhances predictive precision for recommendations. This research introduces MB-HGCN, a novel recommendation method designed to exploit multi-behavior data. The method leverages a hierarchical graph convolutional network to learn user and item embeddings from a coarse-grained global level to a fine-grained behavior-specific level. Our method learns global embeddings from a unified homogeneous graph constructed by the interactions of all behaviors, which are then used as initialized embeddings for behavior-specific embedding learning in each behavior graph. Moreover, we also emphasize the distinct of the user and item behavior-specific embeddings and design two simple-yet-effective strategies to aggregate the behavior-specific embeddings for users and items, respectively. Finally, we adopt multi-task learning for optimization. Extensive experimental results on three real-world benchmark datasets show that our MB-HGCN method can substantially outperform the state-of-the-art methods, achieving a relative improvement of 73.93% and 74.21% for HR@10 and NDCG@10, respectively, on the Tmall datasets.

Abstract:

The Industrial Internet of Things (IIoT) faces increasingly severe security threats, and traditional perimeter-based security models are no longer adequate to address evolving and complex demands. Zero trust, an emerging security model centered on the core principle of “never trust, always verify,” has gradually gained attention. However, the research and application of zero trust in the IIoT domain are still in their early stages, necessitating more comprehensive and systematic exploration. This paper provides a systematic review of the development and applications of zero trust in the industrial sector, with a focus on analyzing its core technologies and practical scenarios while identifying current research trends and future directions. The paper introduces the basic concepts and principles of industrial zero trust, establishing a theoretical foundation for subsequent discussions. It then systematically outlines the migration strategies and evaluation methods for industrial zero trust architectures and summarizes key technologies, including authentication, software-defined perimeters, micro-segmentation, secure communication channels, and trust evaluation, collectively forming the core supporting framework of industrial zero trust. Furthermore, this paper delves into the critical role of access control within the zero trust model and its value in fine-grained permission management. By examining typical IIoT application scenarios, the paper further explores the practical advantages of zero trust in complex environments. Finally, it identifies existing challenges in industrial zero trust and discusses potential future development directions.

Abstract:

The multi-level blockchains architecture is an architecture that organizes multiple blockchains into a tree, where each blockchain can control and manage part of the functions and on-chain data of the next level of the blockchains to which it is connected by cross-chain technology. However, the cross-chain transfer of assets under this architecture is a multi-hop cross-chain problem, where the evidence of successful execution of a cross-chain transaction needs to be transmitted and verified in multiple hops on the path from the source chain to the target chain, resulting in longer execution latency of the cross-chain transaction, higher evidence transmission overhead and verification overhead. Therefore, this paper proposes a lightweight and efficiently verified assets cross-chain transfer method for multi-level blockchains architecture, which introduces a top-level witness chain connecting each multi-level architecture and deploys a witness contract on each chain, so that the parent chains of the source and target chains in a cross-chain transaction act as witness chains to drive the completion of the cross-chain transaction. This paper also introduces a cross-chain transaction verification evidence based on Verkle tree, the method organizes the cross-chain transaction information to be processed in the same block in a Verkle tree employing KZG polynomial commitment, and adds the KZG commitment and the proof data into the evidence, and proves the execution state of the cross-chain transaction by verifying the evidence, so as to optimize the transmission and verification of the evidence. Theoretical analysis and experiments on the prototype of the method prove that the method reduces the execution latency of the cross-chain transaction and reduces the evidence verification overhead compared to the scheme using SPV without increasing the evidence transmission overhead, which is lightweight and efficiently verified.

Abstract:

The rise and development of large-scale neuromorphic platforms require network-on-chip to support efficient data transmission mechanisms. Although many efforts have been made to develop high performance topology architectures and routing schemes, they still suffer from single transmission mode or poor scalability, making them stay on a low efficiency in neuromorphic computing. Inspired by the small-world properties of human brain networks, this brief proposes an efficient region-broadcast (ReB) routing scheme to support unicast, multicast, and broadcast transmission modes. Besides, a synaptic connections indexing method is deployed to accommodate the region-broadcast routing scheme and support this hybrid-mode packet transmission. This method replaces the traditional multicast routing table, effectively improving network scalability and reducing power consumption. Experimental results show that compared to existing work, the ReB routing scheme reduces the peak spike traffic and link load standard deviation by 11.5% and 20.4%, respectively. The ReB scheme brings improvements in latency, throughput, and energy under the validation of synthetic traffic, spiking neural network applications and brain cortical networks. Various synthetic traffic patterns are used in the experiments. The datasets used in spiking neural network applications include MNIST, QTDB, Ev-object, and DVS-Gesture. Finally, the proposed ReB router has an excellent bandwidth of 0.24 spike/cycle and only consumes an area of 0.014 mm2.

Abstract:

In the field of complex action recognition in videos, the structural design of the model plays a crucial role in its final performance. However, manually designed network structures often rely heavily on the knowledge and experience of researchers. Therefore, neural architecture search (NAS) has received widespread attention from researchers in the field of image processing because of its automated network structure design. Currently, neural architecture search has achieved tremendous development in the image field. Some NAS methods even reduce the number of graphics processing unit (GPU) days required for automated model design to single digits, and the model structures they search show strong competitive potential. This encourages us to extend automated model structure design to the video domain. But it faces two serious challenges: 1) How to capture the long-range contextual temporal association in video as much as possible; 2) How to reduce the computational surge caused by 3D convolution as much as possible. To address the above challenges, we propose a novel Neural Architecture Search on Temporal Convolutions for Complex Action Recognition (NAS-TC). NAS- TC is a two-stage framework. In the first stage, we use the classic convolutional neural network (CNN) network as the backbone network to complete the computationally intensive feature extraction task. In the second stage, we propose a neural architecture search layer temporal convolutional layer (NAS-TC) to accomplish relatively lightweight long-range temporal model design and information extraction. This ensures that our method will have a more reasonable parameter allocation and can handle minute-level videos. Finally, the method we proposed achieved an average performance gain of 2.3% mAP on three complex action recognition benchmark data sets compared with similar methods, and the number of parameters was reduced by 28.5%.

Abstract:

Routing is considered an essential component in the design of printed circuit board (PCB). Current existing PCB designs mostly rely on the process results from electronic design automation tools, and traditional automatic routing research often focuses on only general bus routing without considering bus groups to be determined during the routing process. Due to the absence of general bus grouping, there may be situations where there are more nets in one group than in other groups, resulting in larger line width and line clearance occupied by this group when compared to other bus groups in the original bus routing, thereby posing new challenges to effective and efficient routing. To overcome this drawback, this paper focuses on studying PCB group routing. In this study, a group routing algorithm based on a weighted directed graph is proposed. A Hanan grid graph is constructed, containing only merged edges and their adjacent relationships. Following this, a weighted directed graph is developed using the merged edge information to represent the routing resources on the circuit board. For routing planning, a heuristic search algorithm equipped with multi-wire avoidance features is utilized. The routing situations are then classified into several potential scenarios, with each considered separately, to accomplish detailed routing and obtain a final result of group routing. Results from experiments demonstrate that a 100% routability is consistently achieved by using the algorithm on complex industrial examples that have been previously tested, and that the design rule constraints of all benchmark industrial PCB cases are not violated.

Abstract:

Instruction-Level Parallelism (ILP) is a classic challenge in processor architecture research. Domain-specific architectures, such as the Ascend processor, expose more pipeline details to upper-layer software, and compilers/programmers explicitly control the synchronization between pipelines to optimize ILP. However, the physical synchronization resources between pipelines are limited, which limits the improvement of ILP. To address this issue, a high-performance automatic synchronization primitive insertion method for the Ascend processor is proposed. By introducing the abstraction of "virtual synchronization resources," this method decouples the insertion of synchronization primitives from the selection of physical synchronization resources. Firstly, a heuristic algorithm is proposed to insert virtual synchronization primitives in complex control flow graphs. Then, a significant number of virtual synchronization resources are mapped to an extremely limited number of physical synchronization resources through virtual synchronization primitive merging and other techniques. At the same time, redundant synchronization primitives in the program are removed based on the partial order relationship between instructions, while ensuring program correctness and stringent hardware resource constraints. Experiments on the Ascend 910A platform using instruction-level and operator-level benchmark programs show that the programs with automatically inserted synchronization primitives achieve performance comparable to or on par with those manually inserted by expert programmers, while ensuring correctness.

Abstract:

Softwareization of network function (NF) provides flexibility for the implementation and deployment of new network applications. However, duo to more complex program structure and running environment compared with NF hardware, NF software introduces various performance issues, such as, short-term throughput anomalies and long-tail delays, degrades user experience. Once NF performance problem occurs, it is necessary to quickly locate problematic modules and determine the cause of the problems through performance measurement. Facing to NF's complex operating environments, increasingly expanding code size, and diverse root causes of problems, coarse-grained performance measurement cannot meet the requirement of problem location and analysis. More efficient fine-grained NF performance measurement is necessary. For the two types of widely used NF performance measurement methods: sampling-based and instrumentation-based, we first prove through actual measurement analysis that, the sampling-based performance measurement method is not suitable for fine-grained NF performance measurement, and the instrumentation-based method will generate a large amount of additional measurement overhead, affecting the measurement results. To this end, we propose a function-level dynamic instrumentation method that combines dynamic library piling and function-level fast breakpoints. Compared with static instrumentation, dynamic instrumentation can execute instrumentation on demand in runtime. It is more suitable for use in the production environment. Our dynamic instrumentation method reduces the instrumentation overhead by an average of 70% compared to baseline fast breakpoints. On this basis, we design and implement the packet-level NF performance measurement method LProfile, based on lightweight probes and storage optimization. Compared with TAU, a general-purpose performance measurement tool, LProfile reduces the single-point measurement overhead by 82%.

Abstract:

As the breaker of Moore’s Law, Chiplet technology has high expectations in the integrated circuit industry. Chiplet technology can combine multiple small chips with specific functions into a Chiplet integrated chip through high-speed interconnection technology, whose core technology is the Chiplet interconnection technology that can achieve Chiplet combination and expansion. This paper analyzes and discusses the Chiplet interconnection protocol, interconnection architecture, typical interconnection Chiplets, and testability design based on the interconnection Chiplet. Firstly, this paper provides a detailed comparison and analysis of domestic and foreign Chiplet interconnection protocols, and provides the layers and functions of each protocol. Secondly, this paper introduces three typical Chiplet interconnection architectures, and analyzes the characteristics and advantages of each architecture. Afterwards, the Chiplet fault-tolerant mechanism is introduced, including fault-tolerant encoding of interconnection interfaces, fault-tolerant topology, and fault-tolerant routing. Then, three types of interconnection Chiplet design schemes are presented, including programmable interconnection Chiplets, path programmable interconnection Chiplets, and fully customized interconnection Chiplets. Finally, a testability design testing scheme based on the interconnection Chiplet is introduced. This paper focuses on Chiplet interconnection and aims to help readers help readers have a systematic understanding of Chiplet interconnection technology.

Abstract:

In recent years, Large Language Models (LLMs) have emerged as a critical branch of deep learning network technology, achieving a series of breakthrough accomplishments in the field of Natural Language Processing (NLP), and gaining widespread adoption. However, throughout their entire lifecycle, including pre-training, fine-tuning, and actual deployment, a variety of security threats and risks of privacy breaches have been discovered, drawing increasing attention from both the academic and industrial sectors. Navigating the development of the paradigm of using large language models to handle natural language processing tasks, as known as the pre-training and fine-tuning paradigm, the pre-training and prompt learning paradigm, and the pre-training and instruction-tuning paradigm, this article outline conventional security threats against large language models, specifically representative studies on the three types of traditional adversarial attacks (adversarial example attack, backdoor attack and poisoning attack). It then summarizes some of the novel security threats revealed by recent research, followed by a discussion on the privacy risks of large language models and the progress in their research. The content aids researchers and deployers of large language models in identifying, preventing, and mitigating these threats and risks during the model design, training, and application processes, while also achieving a balance between model performance, security, and privacy protection.

Abstract:

Malicious domain name detection is a critical component of network intrusion detection systems, enabling the rapid identification of network attacks through domain name requests. Machine learning methods overcome the limitations of blacklist mechanisms and improve detection accuracy. However, challenges such as the high variability of domain name structures and the complexity of real-world environments lead to low detection efficiency and poor robustness in practical applications. To address these issues, a malicious domain name detection technology based on domain name semantic graph learning is proposed, leveraging semantic graph association analysis for efficient detection. Specifically, 12 months of domain request data from China Science and Technology Network is first collected, encompassing 3.33 billion access records, including more than 6.5 million malicious domain name entries across 284 attack types. Semantic analysis reveals significant differentiation between domain categories, yet considerable feature overlap in certain regions degrades classifier performance. To tackle this, a domain association graph model based on character-level semantic similarity is proposed. By integrating features of neighboring domains, the model enhances semantic representations in overlapping regions, thereby improving detection performance. The method includes filtering noise characters through structural similarity analysis, constructing a dynamic domain semantic graph using an online aggregation algorithm, and training a multi-head attention-based message-passing graph model with node-degree-weighted samples. Finally, a multi-layer neural network classifier is employed for malicious domain detection. Experimental results demonstrate that the proposed method achieves an average precision rate of 96% and a recall rate of 97% on the dataset of different types of malicious domain names. Furthermore, the model exhibits strong online adaptability, achieving high detection rate and robustness.

Abstract:

Data deduplication is a vital technology for efficiently managing big data, widely adopted in cloud storage systems to reduce redundancy and save space. To integrate deduplication with encryption, convergent encryption has become a common approach. This method allows for the encryption of data while still enabling deduplication by producing the same ciphertext for identical plaintexts. However, cloud service providers' outsourcing models and the deterministic nature of convergent encryption can introduce data security issues. The encryption patterns of data can become predictable, potentially exposing sensitive information to attackers, which may create serious security implications. As a result, encrypted data deduplication has emerged as an important research topic in cloud storage security. This paper firstly introduces the concept of data deduplication, encrypted deduplication algorithms, and discusses the security challenges associated with encrypting and deduplicating data in cloud storage. It then reviews the current research status from both attack and defense perspectives, covering three main types of attacks: brute force attacks, which try to decrypt data through extensive guessing; frequency analysis attacks, which exploit frequency characteristics in ciphertexts; and side-channel attacks, which leverage information from response or traffic characteristics. For each attack type, representative defense strategies are analyzed along with their strengths and weaknesses. Finally, the paper highlights the challenges faced by existing encrypted data deduplication defenses and suggests future research directions aimed at improving these techniques.

Abstract:

As the scale of AI grows rapidly, errors in deep learning applications are also increasing. Existing popular deep learning frameworks are mostly built on the dynamically-typed language Python, which lacks type checking mechanisms. This leads to many errors that cannot be eliminated through type checking at the compilation stage. This paper proposes a strongly-typed functional programming style deep learning framework based on the theorem prover Coq. The framework features typed tensor structures and a powerful static type checking capability. The experimental results demonstrate that the framework can automatically, quickly, and effectively detect shape mismatch errors in deep learning models, and it has greater advantages in terms of speed and detection capability compared to other checking tools. Furthermore, this paper designs and implements a set of rewriting rules that translate functional programming models into C code, realizing the translation from functional neural network operator expressions to multi-core parallel OpenMP C code. According to the results of multiple sets of experiments, the C code for operators generated by this method is on par with manually written code. Furthermore, the speed of the generated neural network operator C code with multi-core parallel optimization has been improved by 4-10 times compared with the sequentially executed operator C code. Additionally, the generated C operators are highly secure and can effectively avoid common issues such as out-of-bounds indexing and memory allocation errors in manually written code.

Abstract:

Multimodal sentiment analysis aims to utilize the multimodal customer comments and other data to identify users' sentimental tendencies. To realize cross-domain application with the domain bias, commonly used solutions are unsupervised domain adaptation methods. Nevertheless, this type of solutions focuses on the extraction of domain-invariant features, and it neglects the significance of domain-specific features at the target domain. Thus, a meta-optimization based domain-invariant and domain-specific feature disentanglement network is proposed. First, by embedding adapters into the pre-trained large model with fine-tuning fitting, the image-text fused sentiment feature encoder is accordingly constructed. Then, a feature disentanglement module is constructed on the basis of the factorization operation, which utilizes domain adversary and domain classification, together with collaborative independence constraints, respectively, to achieve knowledge-transferable domain-invariant feature embedding while extracting the domain-specific features to enhance the performance of sentiment classification at the target domain. To ensure the consistency of the overall optimization tendency for feature disentanglement and sentiment classification, a meta-learning-based meta-optimization training strategy is put forward to synergistically optimize the sentiment analysis network. Comparative experiments on bidirectional sentiment transfer tasks constructed by MVSA and Yelp datasets demonstrate that compared to other advanced image-text sentiment transfer algorithms, the proposed algorithm achieves superior performance on bidirectional sentiment transfer tasks in terms of three consensus metrics: Precision, Recall and F1 score.

Abstract:

To address the limitations in search expressiveness and the inadequacy of verification mechanisms in existing searchable encryption methods, this paper proposes a Verifiable Boolean Searchable Encryption scheme based on Blockchain Index (VBSE-BI). The scheme first constructs a security model supporting verifiable Boolean search and, based on this model, designs an incremental secure index construction method utilizing blockchain storage structures. This approach achieves efficient search while ensuring the tamper-proof nature of the index structure. Moreover, the scheme introduces an efficient dynamic update mechanism for the secure index, effectively avoiding the significant storage and update overhead caused by auxiliary update structures. It meet the integrity verification requirements of Boolean searches, the scheme defines the unforgeability of Boolean search results and proposes a Boolean operation integrity verification algorithm based on bilinear map accumulators and the extended Euclidean algorithm. Security analysis demonstrates that the VBSE-BI scheme can resist dynamic chosen keyword attacks in the random oracle model and satisfies unforgeability under the bilinear q-strong Diffie-Hellman assumption. Compared with similar schemes, VBSE-BI not only supports more expressive Boolean search statements but also significantly reduces the user's computational complexity to log n (where n is the number of keywords). Experimental results show that by optimizing the verification algorithm, the scheme keeps the user’s verification time consistently low (1.0-1.8 s), accounting for only 9.98%-14.03% of the server-side computation time. These findings indicate that VBSE-BI is highly suitable for resource-constrained mobile devices, providing a solid theoretical foundation and efficiency assurance for the practical application of searchable encryption.

Abstract:

Privacy auditing is a crucial issue of data governance, aiming to detect whether data privacy has been protected effectively. Typically, scholars would protect personal private data to meet differential privacy guarantees by perturbing data or adding noise to them. Especially in scenarios of machine learning, an increasing number of differential privacy algorithms have emerged, claiming a relatively stringent level of privacy protection. Although rigorous mathematical proofs of privacy have been conducted before the algorithms’ release, the actual effect on privacy in practice is hardly assured. Due to the complexity of the theory of differential privacy, the correctness of their proofs may not have been thoroughly examined, and imperceptible errors may occur during programming. All of these can undermine the extent of privacy protection to the claimed degree, leaking additional privacy. To tackle this issue, privacy auditing for differential privacy algorithms has emerged. This technique aims to obtain the actual degree of privacy-preserving of differential privacy algorithms, facilitating the discovery of mistakes and improving existing differential privacy algorithms. This paper surveys the scenarios and methods of privacy auditing, summarizing the methods from three aspects―data construction, data measurement, and result quantification, and evaluating them through experiments. Finally, this work presents the challenges of privacy auditing and its future direction.

Abstract:
Knowledge graphs often face the challenge of incompleteness, which can be alleviated by completing missing information through link prediction tasks. However, most knowledge graph completion methods overly focus on extracting embedding features without sufficiently considering the complex semantics contained in the predicted node neighborhood information, global feature information, and directional feature information, making it difficult to accurately predict the missing information. This paper proposes a general representation learning semantic enhancement framework, ASFR, which utilizes an attention mechanism to extract local association information of the knowledge graph and structural features of the knowledge graph, and enhances existing knowledge graph representation learning models by incorporating positional information. By embedding these three types of additional knowledge graph information into the entity vectors of the knowledge graph, the quality of the knowledge graph representation vectors is improved. Comparative experiments are conducted using five different categories of classical methods, and the results indicate that this framework can effectively enhance the predictive capability of models, achieving an improvement of 6.89% on three public datasets.
Abstract:

With the rapid development of large-scale model technology, these models have exhibited remarkable performance in fields such as natural language processing and computer vision, becoming essential tools for addressing complex issues and drawing significant interest from both the scientific community and the industry. Nonetheless, current cloud-platform-based schemes for training and inference of large models face multiple challenges, including high expenses, restricted scalability, and information security risks. As the scale of model parameters expands continually, the need for low-cost, efficient training and inference methods grows ever more pressing. Carrying out collaborative training and inference of large models on edge devices can dramatically decrease latency and bandwidth demands, concurrently reinforcing data privacy and operational efficiency. This strategy furnishes vital technological support for the economical deployment of large models across a variety of contexts, thereby evolving into one of the prominent research hotspots. This article conducts a thorough investigation of research pertinent to large models in the context of edge intelligence, with an in-depth analysis and discourse primarily focused on two aspects: edge-based training and inference of large models. Ultimately, it outlines the challenges confronted in the progression of large model technologies tailored for edge intelligence and delineates future prospects. The ambition is to stimulate a heightened comprehension and intensified attention from both academic and industrial sectors towards technologies involving large models for edge intelligence, thereby encouraging further scholarly exploration in this thriving domain.

Abstract:

Legal intelligence aims to analyze texts within the legal domain automatically by employing various natural language processing (NLP) technologies. This field has garnered significant attention from the NLP community. One of the most critical tasks in legal intelligence is Legal Judgment Prediction (LJP). This task seeks to forecast judgment outcomes, such as applicable law articles, charges, and penalties, based on the fact descriptions of legal cases, making it a promising application of artificial intelligence (AI) techniques. However, current LJP methods primarily address cases with a single defendant, neglecting the complexities of cases involving multiple defendants. In real-world criminal cases, multiple defendants are often involved, creating intricate interactions that single-defendant LJP technologies cannot accurately handle. These existing technologies struggle to distinguish judgment outcomes for different defendants in such scenarios. To advance research in LJP tasks involving multiple defendants, this paper presents a large-scale multi-defendant LJP dataset with three key characteristics: 1) It is the largest manually annotated dataset for multi-defendant LJP; 2) It necessitates distinguishing legal judgment predictions for each defendant; 3) It includes comprehensive judgment chains, covering criminal relationships, sentencing contexts, law articles, charges, and penalties. Furthermore, this paper conducts an extensive and detailed analysis of the dataset, examining the distribution of law articles, charges, penalties, criminal relationships, sentencing contexts, text length, and number of defendants. It also provides statistical insights into multi-defendant judgment results and the chain of judgment based outcomes. Additionally, this paper introduces a novel chain of judgment based method, featuring a strategy for generating judgment chains related to the crime facts and a comparison strategy to differentiate correct judgment chains from easily confused ones, enhancing overall effectiveness. Experimental results reveal that the multi-defendant LJP dataset presents a significant challenge to existing LJP methods and pre-trained models. However, the chain of judgment based LJP method significantly surpasses baseline methods, highlighting the crucial role of judgment chains in improving LJP.

Abstract:

Implicit discourse relation recognition aims at automatically identifying semantic relations (such as Comparison) between two arguments (sentence or clause) in the absence of explicit connectives. Existing methods have confirmed that the introduction of phrase information can effectively boost the performance. However, there are still the following shortcomings: 1) These models typically rely on syntactic parsers and do not fully capture the interactions between words, phrases, and arguments. 2) The problem of data sparsity often occurs during training when incorporating the phrase information. To address the above issues, we propose an implicit discourse relation recognition model based on multi-granularity information interaction (MGII) and develop a chain decoding-inspired data augmentation method (DAM). Specifically, our proposed model is designed to automatically acquire semantic representations of n-grams using a stacked convolutional neural network. It then explicitly models the interactions between words, phrases and arguments based on Transformer layers and ultimately predicts multi-level discourse relationships in a chain-decoding way. Our data augmentation method simultaneously pretrains both the encoding and decoding modules, enabling the effective utilization of massive explicit discourse data, which are naturally annotated by connectives, to mitigate the issue of data sparsity. The proposed method significantly outperforms recent benchmark models on the PDTB datasets. Furthermore, it does not rely on syntactic parsers, demonstrating strong applicability.

Abstract:

Stencil computations are widely adopted in scientific applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more complex in terms of stencil order, memory accesses, and computation patterns. To adapt stencil computations to GPU architectures, the academic community has proposed a variety of optimization techniques based on streaming and tiling. Due to the diversity of stencil computational patterns and GPU architectures, no single optimization technique fits all stencil instances. Therefore, researchers have proposed stencil auto-tuning mechanisms to conduct parameter searches for a given combination of optimization techniques. However, existing mechanisms introduce huge offline profiling costs and online prediction overhead, unable to be flexible to arbitrary stencil patterns. To address the above problems, this paper proposes a generalized stencil auto-tuning framework GeST, which achieves the ultimate performance optimization of stencil computations on GPU platforms. Specifically, GeST constructs the global search space through the zero-padding format, quantifying parameter correlations via the coefficient of variation to generate parameter groups. After that, GeST iteratively selects parameter values from the parameter groups, adjusting the sampling ratio according to the reward policy and avoiding redundant execution through hash coding. The experimental results show that GeST can identify better-performing parameter settings in a short time compared to other state-of-the-art auto-tuning works.

Abstract:

Public key encryption with keyword search (PEKS) over lattice plays an important role in ensuring the privacy, confidentiality, and flexibility of outsourced data while resisting quantum attacks. However, most existing lattice-based PEKS schemes are limited by the underlying preimage sampling algorithm, which suffers from high storage overhead or low efficiency issues. To address the above problems, an optimized public key encryption with keyword search scheme is first proposed. The scheme utilizes a new approximate trapdoor sampling algorithm to improve the computational efficiency. The algorithm outputs an approximate rather than an exact preimage. Then, a combination of non-spherical Gaussian sampling technique and an ideal extendable-output function is used to reduce key and trapdoor storage. Furthermore, an extended scheme with forward security and backward security is introduced to address the basic scheme’s update and search operation leakage. To avoid newly updated ciphertexts matching previous trapdoors, i.e., forward security, the key is periodically updated through a lattice-based delegation mechanism. To prevent subsequent searches from leaking information about deleted files, i.e., backward security, the addition and deletion of files is achieved by combining the bitmap index and lattice-based homomorphic encryption scheme. Theoretical analysis and experimental results exhibit that, compared with the efficient PEKS scheme, the proposed scheme reduces the public key storage overhead by 4.6% and the trapdoor storage overhead by 50.1%, and improves the efficiency of encryption, trapdoor generation, and search by 11.11%, 2.5%, and 26.15%, respectively.

Abstract:

Supercomputing has rapidly developed from traditional CPU clusters to heterogeneous platforms. With the type conversion of hardware platforms, it faces significant challenges in optimizing computing software programs and performance evaluation. Currently, some international mainstream parallel program performance analysis tools and software generally have low compatibility with domestic supercomputing heterogeneous system processor products, often requiring instrumentation and recompilation of code, and low accuracy in single node performance data collection. To improve these shortcomings, this article proposes a floating-point performance data collection method for heterogeneous system computing software. This method is based on the domestic supercomputing system verification platform to develop and verify the floating-point performance collection prototype. At present, effective collection of single node and multi node performance indicator data has been achieved, and it is non-invasive to the original program. There is no need to modify the code of the monitored program for monitoring in a plug-in manner, making it highly versatile. Finally, we conducted comparative experimental analysis with three types of programs: rocHPL, Cannon, and mixbench, and conducted performance data collection monitoring research on ResNet (residual network, ResNet) program for AI computing. We have demonstrated that the collection method proposed in this article has high accuracy, achieves the expected collection effect in experiments, and has good reference value for program optimization, verifying the effectiveness of the proposed method.

Abstract:

Software Code Cache is widely used in dynamic binary translators to manage the dynamically generated code blocks. The translation, refresh, and memory occupancy of code blocks are key metrics for software code cache. There has been little research on software code cache for system-level dynamic binary translators. Existing system-level dynamic binary translators use state label scheme to achieve correct and efficient instruction semantic simulation, but this scheme introduces additional problems for software code cache management. Through in-depth analysis of the state label scheme, two types of problems are summarized: conflicts and redundancies. To address these two problems, two code cache optimization schemes based on fine-grained state label are proposed, including multi-state code cache scheme and weak state label scheme. These two schemes are implemented in LATX-SYS and evaluated with Ubuntu/x86 16.04 and Windows XP/x86 system booting on LoongArch platform. The evaluation results show that the code block refresh and translation are reduced by 43% and 18% respectively. The code block similarity ratio is decreased from 59.63% to 5.06%. The translation overhead and memory occupancy are both reduced. Overall, the system boot time was reduced by 20%. Finally, testing of the weak state label scheme on SPEC CPU2000 shows that the number of code blocks is reduced by an average of 13%, with only 2%-3% performance overhead introduced.

Abstract:

With the rapid advancement of artificial intelligence generation models and deepfakes, the techniques for generating talking face videos using various methods have become increasingly mature. Among them, audio-driven talking face video generation methods have attracted significant attention due to their remarkably realistic and natural output. Such methods utilize audio as a driving source to synthesize videos where the target character’s mouth movements synchronize with the audio, often combining image or video materials. Currently, these technologies are widely applied in fields such as virtual anchors, gaming animation, and film and television production, demonstrating vast prospects for development. However, the potential negative impacts of this technology are also becoming apparent. Improper or abusive use could lead to serious political and economic consequences. In this context, research on identifying various types of facial forgery videos has emerged. This research primarily assesses the authenticity of videos by detecting the veracity of individual video frames or the spatio-temporal consistency of video sequences. Firstly, this paper systematically analyzes the classic algorithms and latest advancements in audio-driven talking face video generation tasks based on the timeline and the development history of foundational models. Secondly, it exhaustively lists the commonly used datasets and evaluation criteria for this task, conducting comprehensive comparisons across multiple dimensions. Subsequently, the paper meticulously analyzes and summarizes the forgery facial video identification task, categorizing it based on whether the discrimination technology focuses on individual video frames or multiple frames, and also summarizes its commonly used datasets and evaluation criteria. Finally, the paper outlines the challenges and future directions in this research field, aiming to provide valuable references and support for subsequent related research.

Abstract:

NTRU lattice is an important choice for building a practical post-quantum lattice-based key encapsulation mechanism. The software optimization engineering implementation of lattice cryptography is of great significance for the subsequent application deployment of post-quantum cryptography. CTRU is a lattice-based key encapsulation mechanism based on NTRU lattice proposed by Chinese scholars. At present, there only exists CTRU-768 scheme C and AVX2 implementation, and there is room for further optimization. In addition, the implementation of CTRU-768 cannot be directly extended to the CTRU-512 and CTRU-1024 schemes. This paper completes the first optimized reference C implementation of CTRU-512 and CTRU-1024 schemes and its variant CNTR-512 and CNTR-1024 with and the corresponding AVX2 parallel optimization implementation, and optimizes the existing CTRU-768 reference implementation and AVX2 implementation. It employs mixed radix number theoretic transformation (NTT) to accelerate polynomial multiplication and uses the Karatsuba algorithm to speed up the decomposition of low-degree polynomial ring multiplication. In addition, combined with the central Barrett reduction, this paper proposes index-based delay reduction in reverse NTT. For the time-consuming polynomial inversion under CTRU-1024 scheme, we employ the Bernstein fast inversion algorithm. Furthermore, this paper provides a more efficient AVX2 optimization implementation scheme, which uses the single instruction multiple data (SIMD) instruction set AVX2 proposed by Intel to accelerate the performance bottleneck in CTRU. This paper uses layer merging and coefficient permutation to reduce the load/store instructions. In addition, the Bernstein fast polynomial inversion algorithm is vectorized and optimized using AVX2. We also implement the time-consuming SHA-3 hash module in AVX2 assembly. Compared with the latest CTRU-768 scheme AVX2 implementation, the AVX2 optimized implementation in this paper improves by 8%-11%. For the CTRU scheme, compared with the reference implementation, the performance improvement of the AVX2 optimized implementation in this paper on three sets of parameters is significant. The key generation, key encapsulation, and key decapsulation improvements are 56%-91%, 74%-90%, and 70%-83% respectively.

Abstract:

Given the risk of adversarial attacks on tracking models and the lack of relevant adversarial detection methods, this paper addresses the problem from the perspective of frequency domain. Combined with the visual invisible property of perturbation noise, this paper first theoretically proves that perturbation noise mainly exists in the mid-to-high frequency bands of images. Then we quantitatively analyze that the low-frequency components of the video sequence contribute the most to tracking performance and are least affected by adversarial attacks. Finally, based on the above theoretical proof and qualitative analysis, this paper proposes a detection framework based on the tracking performance difference of frequency bands, in which the frequency domain decomposition module for extracting the low-frequency components of the video sequence. The target tracker and its mirror tracker with the same structure and parameters respectively take the full-frequency and low-frequency components of the video sequence as input. The discriminator module determines whether the input video sequence is an adversarial input by comparing the output differences of the two trackers. This detection framework uses a tracker as a carrier and does not require adversarial training. It can achieve adversarial detection only by comparing the tracking performance difference across different frequency bands. Extensive experimental results show that the detection framework can not only effectively detect current mainstream adversarial attacks, such as CSA, TTP, and Spark with a detection precision of 97.55%, but also has little negative impact on the original tracking performance of the tracker. In addition, this framework is generalizable and can be flexibly integrated into multiple trackers, such as SiamRPNpp, SiamMask, SiamCAR, and SiamBAN.

Abstract:

Pre-trained models have mitigated the challenges posed by extensive training data and computational resources, and also give birth to the new paradigm of model development and application, which we refer to as model supply chain. In this framework, a pre-trained model is uploaded by its publisher and subsequently transferred, compressed, and deployed by secondary developers to meet various application needs. This emerging model supply chain introduces additional stages and multiple elements, inevitably leading to security concerns and privacy risks. Despite the widespread adoption of model supply chains, there is currently a lack of systematic review of security threats in them. To address this research gap, in this paper, we provide a comprehensive overview of the deep learning model supply chain, introducing its concept and fundamental structure. We conduct an in-depth analysis of vulnerabilities at various stages of the model’s lifecycle, including design, development, deployment, and usage. Furthermore, we compare and summarize prevalent attack methods, alongside introducing corresponding security protection strategies. To assist readers in effectively utilizing pre-trained models, we review and compare publicly available model repositories. Finally, we discuss potential future research avenues in areas such as security checks, real-time detection, and problem tracing. It aims to offer insights for safer and more reliable development and use of pre-training models. For the benefit of ongoing research, related papers and open-source codes of the methods discussed are accessible at https://github.com/Dipsy0830/DNN-supply-chain-survey.

Abstract:

Model-based diagnosis mainly models the behavior of the system, and once the abnormal behavior is observed, a diagnosis algorithm is run on the system model to return a possible explanation. The existing diagnosis algorithm computes a minimal hitting set (MHS) each time a conflict set is identified, and then verifies whether this MHS satisfies the system observations. While this approach reduces the generation of redundant solution sets, the difficulty of computing the MHSs of conflict sets increases exponentially with the number of conflict sets. Since computing the MHS of a partial conflict set is not necessarily a diagnosis, it is also time-consuming to check whether the MHS satisfies the system observations. We have designed a filtering function to remove low-quality conflict sets based on the diagnosis cardinality and quantity, while ensuring that the obtained hitting sets are as diagnosis as possible. In addition, to facilitate the rapid verification of hitting sets for diagnosis, we have proposed an efficient decision algorithm based on the logical relationships of the circuit. In the experimental section, we conducted a detailed analysis comparing the runtime and diagnosis yield under varying numbers of fault conditions. Compared to state-of-the-art algorithms, our approach showed efficiency improvements of up to 2-40 times in runtime and diagnosis yield enhancements ranging from 5-200 times.

Abstract:

Multi-anchor graph approaches have attracted more and more attention for their potential in addressing the challenges of large-scale multi-view clustering. However, existing methods leveraging multi-anchor graphs encounter several hurdles when it comes to tackling this challenge. The consistency-anchored graph learning methods struggle with handling misaligned anchor graphs and necessitates additional post-processing with consistency graph, thereby constraining the accuracy and reliability of clustering outcomes. And the anchor graph ensemble clustering method fails to harness the complementary information from different views during the independent generation of candidate base clustering and overlooks the original anchor graphs during fusion, thus impacting the effectiveness and stability of clustering results. To address these challenges, we propose a novel approach based on double-ended joint learning for multi-view clustering. The method fully considers the duality between multi-anchor information and samples in multi-anchor graphs, achieving synchronized clustering between anchor-end and sample-end. Moreover, under the guidance of multi-anchor information, it achieves joint alignment between sample-end clustering and multiple anchor-end clustering. Unlike existing methods, the approach does not require direct learning of consistent anchor graph, thus can handle any type of anchor misalignment issues and mitigating the negative impact of separate graph learning and partitioning on clustering performance. Additionally, it utilizes multiple anchor graphs for anchor-end clustering and sample-end clustering within a unified optimization framework, effectively addressing the limitations of base clustering and the ensemble stage in leveraging multiple anchor graphs. Experimental results demonstrate that the proposed method outperforms several comparative methods in terms of clustering performance and time consumption, effectively enhancing the clustering performance of multi-view data. The relevant code for the proposed method and comparative methods is provided in the supplementary material: http://github.com/lxd1204/DLMC.

Abstract:

Open-vocabulary multi-label action recognition tasks aim to identify various human actions in videos that were not seen during the training phase. Compared to traditional action recognition, this task is more practical as it closely mirrors real-world scenarios and has broader application prospects. However, it poses significant challenges in effectively generalizing models to unseen action categories. To address this issue, this paper proposes an open-vocabulary multi-label action recognition method enhanced by the knowledge of large language models knowledge. This method extracts rich co-occurrence knowledge of action categories implicit in large language models and incorporates this co-occurrence knowledge into prompt learning of visual-language models, facilitating information transfer between base classes and novel classes to improve the recognition performance of novel classes. We set up two ratios of base action classes to novel action classes in experiments, namely 3꞉1 and 1꞉1, represented as "75% seen" and "50% seen" respectively. Experimental results on the AVA and MovieNet datasets show that compared to existing methods, when the base action classes are "75% seen", our method improves the mAP metric for novel action recognition by 1.95% and 1.21% on the AVA and MovieNet datasets, respectively. When faced with the more challenging scenario of "50% seen", our method improves the mAP metric for novel action recognition by 2.59% and 1.06% on the two datasets, respectively.

Abstract:

Deep learning-based object detection algorithms have been widely applied, while recent research indicates that these algorithms are vulnerable to adversarial attacks, causing detectors to either misidentify or miss the target. Nonetheless, research focusing on the transferability of adversarial attacks in autonomous driving is limited, and few studies address the stealthiness of such attacks in this scenario. To address these limitations in current research, an algorithmic module to enhance attack transferability is designed by drawing an analogy between optimizing adversarial examples and the training process of machine learning models. Additionally, through employing style transfer techniques and neural rendering, a transferable and stealthy attack method (TSA) is proposed and implemented. Specifically, the adversarial examples are first repeatedly stitched together and combined with masks to generate the final texture, which is then applied to the entire vehicle surface. To simulate real-world conditions, a physical transformation function is used to embed the rendered camouflaged vehicle into realistic scenes. Finally, the adversarial examples are optimized using a designed loss function. Simulation experiments demonstrate that the TSA method surpasses existing methods in attack transferability and exhibits a certain level of stealthiness in appearance. Furthermore, physical domain experiments validate that the TSA method maintains effective attack performance in real-world scenarios.

Abstract:

Software systems play an indispensable role across various industries, handling large-scale and high-density data. However, the numerous defects within these systems have troubled developers for a long time, constantly threatening the security of data elements. Automated Program Repair (APR) technology aims to assist developers in automatically fixing defects in code during software development process, thereby saving costs in software system development and maintenance, enhancing the confidentiality, availability, and integrity of data elements within software systems. With the development of Large Language Model (LLM) technology, many powerful code large language models have emerged. These models have demonstrated strong repair capabilities in the APR field, while also addressing shortcomings of traditional approaches in code comprehension and patch generation capabilities, further elevating the level of program repair tools. We thoroughly survey high-quality papers related to APR in recent years, summarizing the latest developments in the field. We then systematically categorize two types of LLM-based APR techniques: cloze style and neural machine translation style. We also conduct an in-depth comparison from various perspectives such as model usage, model size, types of defects repaired, programming languages involved, and the pros and cons of repair approaches. Additionally, we discuss the widely adopted APR datasets and metrics, and outline existing empirical studies. Finally, we summarize current challenges in the APR field along with future research directions.

Abstract:

Transient execution attacks (TEAs) exploit processor optimizations to bypass security checks and exfiltrate sensitive information through covert channels. Among them, Meltdown and Spectre attacks have become prominent, affecting mainstream commercial processors such as Intel, ARM, and AMD. Despite the defensive measures implemented by processor manufacturers, variants of these attacks continue to be discovered and disclosed by researchers. To improve the understanding of TEAs and deploy robust defenses, this paper comprehensively analyzes TEAs under various covert channels. Initially, the common characteristics of TEAs are extracted, and a novel model for TEAs is systematically constructed. Subsequently, we summarize the various types of covert channels involved in existing research, classify the TEAs into three types: Meltdown type attacks driven by out-of-order execution (OoOE), Spectre type attacks driven by branch misprediction, and microarchitecture data sampling (MDS) type attacks driven by data misprediction, and delineate the key aspects and relationships of each type of attack. Notably, this paper systematically compiles and categorizes MDS type attacks for the first time. Then, the capabilities of each attack variant were meticulously analyzed and evaluated from three dimensions: covert channel, attack applicable scenarios, and microarchitecture immunity status, which aids security researchers in developing new, more destructive attack types based on the deficiencies of the existing attack-related research. Finally, combined with the above-mentioned comprehensive and in-depth analysis and summary of processor microarchitecture and covert channels, this paper anticipates the future trajectory of TEAs research, hoping to provide strong support for subsequent research work.

Content Cover
Vol.62 No.7 2025     Date of publication:2025-07-06      
Special Issue on Generative AI-driven Information Systems
Abstract:

Leveraging a sliding window strategy, this study presents an innovative retrieval-augmented generation system aimed at enhancing the factual accuracy and reliability of outputs from large language models (LLMs). By applying a sliding window mechanism during the indexing phase, the project effectively addresses the limitations of fixed context window sizes and static retrieval methods. Three specific sliding window strategies are proposed to efficiently process and segment texts, such as fixed window size and fixed step length split (FFS), dynamic window size and fixed step length split (DFS), and dynamic window size and dynamic step length split (DDS). To further enhance retrieval accuracy and relevance, the project employs multiple advanced query techniques, including query expansion and reformulation. Rigorous experimental evaluations are conducted using the state-of-the-art LLaMA-3 model across multiple diverse datasets, encompassing both general knowledge and domain-specific corpora. Results demonstrate optimal performance with a carefully calibrated block size of 1 024 tokens and a step size of 3, significantly improving F1 score across various tasks. This configuration highlights the critical importance of balancing document segment length and sliding window step size to maximize information retention and retrieval efficacy.The sliding window strategy effectively preserves contextual information, reduces information loss, and exhibits adaptability across different datasets and query types.

Abstract:

Sequential recommendation is centered on mining users’ preferences and behavior patterns from their interaction sequences. Existing work has recognized the inadequacy of single-modal interaction data, and has utilized a large amount of multi-modal data, including item reviews, homepage images, and other sources, to complement interaction data and improve recommendation performance. However, these multi-modal data are often interspersed with unavoidable noise that may limit the exploration of personalized user preferences. While suppressing inter-modal inconsistent information can reduce noise interference, it is almost impossible to completely eliminate noise from user-generated multi-modal content. To address the above challenges, we propose a large language model-based trusted multi-modal recommendation (Large-TR) algorithm, which aims to provide the trustworthy recommendation in noisy multi-modal data scenarios. Specifically, the algorithm relies on the excellent natural language understanding capability of the large language model, which efficiently filters the noise in multi-modal data and achieves more accurate and detailed modelling of user preferences. Additionally, we design a trustworthy decision mechanism that dynamically evaluates the uncertainty of recommendation results and ensures the usability of recommendation results in high-risk scenarios. Experimental results on four widely used public datasets show that the algorithm proposed in this paper has better performance compared with other baseline algorithms. Our source code is available at https://github.com/ hhbray/Large-TR.

Special Issue on Generative AI-driven Information Systems
Abstract:

According to statistics, the number of people suffering from cardiovascular diseases in China is about 330 million, and the number of deaths caused by cardiovascular diseases accounts for 40% of the total number of deaths each year. Under this circumstance, the development of heart disease assisted diagnosis systems is particularly important, but its development is limited by the lack of a large amount of electrocardiogram (ECG) clinical data that don’t contain patient privacy information and need to be annotated by medical experts. As an emerging discipline, quantum computing can explore larger and more complex state spaces by utilizing quantum superposition and entanglement properties, which is beneficial for generating high-quality and diverse electrocardiogram data similar to real clinical data. Therefore, we propose an electrocardiogram generative information system based on quantum generative adversarial networks, abbreviated as ECG-QGAN. The quantum generative adversarial network consists of a quantum bidirectional gated recurrent unit (QBiGRU) and a quantum convolutional neural network (QCNN). The system utilizes the entanglement property of quantum to improve the generative capability to produce ECG data that are consistent with the existing clinical data so that the heartbeat characteristics of cardiac patients can be preserved. The generator and discriminator of this system use QBiGRU and QCNN, respectively. The variational quantum circuit (VQC) designed based on matrix product state (MPS) and tree tensor network (TTN) is adopted, which enables the system to capture ECG data information more efficiently and generate qualified ECG data with fewer quantum resources. In addition, the system applies quantum Dropout technology to avoid overfitting issues during the training process. Finally, the experimental results show that the ECGs generated by ECG-QGAN have a higher average classification accuracy compared with other models for generating ECGs. It is also friendly to the current noise intermediate scale quantum (NISQ) computers in terms of the number of quantum bits and circuit depth.

Abstract:

With the explosive growth of scientific literature and the continuous deepening of research fields, researchers face significant information processing challenges when attempting to formulate novel scientific hypotheses. Although large language models (LLMs) possess considerable potential for data processing and knowledge integration, they remain limited in their ability to generate original and insightful scientific hypotheses. Existing research predominantly emphasizes utilizing LLMs to expedite and refine established theories and technologies, often overlooking the initial stages of scientific inquiry where novel hypotheses are proposed and new theories are developed—a stage vital to scientific advancement. This study, grounded in the principles of divergent and convergent thinking from the theory of structured intelligence, proposes an innovative human-in-the-loop multi-agent framework (HILMA) for the reliable generation of scientific hypotheses. HILMA framework incorporates a real-time, systematic knowledge retrieval enhancement mechanism, dynamically integrating the latest research advancements to construct citation network subgraphs, providing LLMs with comprehensive and up-to-date scientific knowledge surveys. Additionally, the framework enhances hypothesis generation through a multi-agent argumentation approach that simulates the scientific peer review process, while also leveraging the intuition and expertise of human experts to further refine and diversify the generated hypotheses. A series of human-machine evaluations has shown that this method demonstrates significant advantages over existing baselines in generating high-quality scientific hypotheses and holds promise as a key facilitator for driving technological innovation.

Special Issue on Generative AI-driven Information Systems
Abstract:

With the global population aging and lifestyle changing, the management and treatment of chronic diseases become increasingly important. Chronic diseases include cardiovascular diseases, diabetes, chronic respiratory diseases, etc. They require long-term or even lifelong health management, the core of which is to design and implement long-term health plans, including balanced dieting, appropriate exercising, regular inspection, and medication management. In recent years, large language models have made progress in the medical field but haven’t focused on chronic disease health management. Therefore, they lack understanding of Chinese dietary habits and culture. These medical large language models also have limited capabilities in handling numerical information. To address these issues, we construct a chronic disease health management information system based on large language model. By integrating foundational knowledge of chronic diseases, health management guidelines, and actual health management plans as domain data, we train QingTing large language model as the core of the system for effectively answering health-related questions. Additionally, the system introduces a tool enhancement strategy, improving QingTing’s ability to handle numerical information in health data by invoking tools. The system also adopts a retrieval-augmented generation technology based on uncertain knowledge graph to enhance the accuracy and reliability of QingTing. Experiments on the chronic disease health management information system based on a large language model demonstrate that QingTing significantly outperforms other baseline large language models in health management dialogues, and verifies the effectiveness of the designed tool enhancement and retrieval-augmented methods.

Abstract:

The recent popularity of large language models (LLMs) has brought a significant impact to boundless fields, particularly through their open-ended ecosystem such as the APIs, open-sourced models, and plugins. However, with their widespread deployment, there is a general lack of research that thoroughly discusses and analyzes the potential risks concealed. In that case, we intend to conduct a preliminary but pioneering study covering the robustness, consistency, and credibility of LLMs systems. With most of the related literature in the era of LLMs uncharted, we propose an automated workflow that copes with an upscaled number of queries/responses. Overall, we conduct over a million queries to the mainstream LLMs including ChatGPT, LLaMA, and OPT. Core to our workflow consists of a data primitive, followed by an automated interpreter that evaluates these LLMs under different adversarial metrical systems. As a result, we draw several and perhaps unfortunate conclusions that are quite uncommon from this trendy community. Briefly, they are: 1) the minor but inevitable error occurrence in the user-generated query input may, by chance, cause the LLM to respond unexpectedly; 2) LLMs possess poor consistency when processing semantically similar query input. In addition, as a side finding, we find that ChatGPT is still capable to yield the correct answer even when the input is polluted at an extreme level. While this phenomenon demonstrates the powerful memorization of the LLMs, it raises serious concerns about using such data for LLM-involved evaluation in academic development. To deal with it, we propose a novel index associated with a dataset that roughly decides the feasibility of using such data for LLM-involved evaluation. Extensive empirical studies are tagged to support the aforementioned claims.

Network and Information Security
Abstract:

The domain name system (DNS) recursive resolving service acts as a bridge between users and upstream DNS authoritative servers to enable users to conveniently resolve domain names through local DNS servers. However, as the first gateway for communication with users, DNS recursive resolving services have become a significant target for attacks on Internet infrastructure. Given the vast scale and variety of DNS recursive service deployments, current DNS security enhancements struggle with implementation complexity and compatibility issues. Despite its importance, there is a noticeable lack of research focused on the deployment of security protection mechanisms for DNS recursive services, as well as the comprehensive assessment of the associated security threats. To bridge this gap, we categorize the security risks associated with DNS recursive services into five main types: cache poisoning, DNS hijacking, direct attacks on recursive servers, leveraging recursive servers to target other servers, and exploiting software vulnerabilities. Additionally, we provide a summary of the latest research on DNS recursive service security threats and DNS security enhancement mechanisms. Our review also summarizes measurement methods for assessing the security risks. Finally, we analyze the current state of DNS recursive service security and offer insights into future research directions for improving the security monitoring and governance of DNS recursive services.

Abstract:

With the rapid development of deep learning, signal modulation recognition based on deep neural networks has gained popularity in wireless communications research. However, it has been observed that the deep neural network model is vulnerable to adversarial perturbations, rendering the modulation identification task ineffective. Currently, there are theoretical gaps and bottlenecks in wireless communication security research. Due to the multidimensional nature of wireless communication, including factors such as experimental environments, data structures, and signal characteristics, it is not feasible to transfer the established attack and defense methods from other domains to signal countermeasures. In this paper, we comprehensively summarize the research on adversarial attack and defense technology in the field of signal modulation recognition. We propose a generic classification framework and threat model for adversarial attacks in this field. And we classify the research in this field into two categories: physical self-defense attacks and digital direct access attacks. Then, we systematically integrate and visualize the research as two-dimensional diagrams to demonstratively showcase the methods, models, and techniques of adversarial attack. Additionally, we provide details on the methods and models of adversarial attack. We present the existing research on adversarial attack methods, adversarial examples generation techniques, theoretical formulas, and adversarial detection and defense techniques. We systematically refine the characteristics of the three dimensions of adversarial attacks on wireless communications and summarize the corresponding processing methods. Finally, we summarize the future research and development direction of the attack and defense security field oriented towards signal modulation recognition.

Abstract:

With the rapid development of cloud computing, quantum computing and other advanced technologies, data privacy is facing increasingly severe threats. Especially in recent years, more and more users have been storing their sensitive data and applications in the cloud to take advantage of convenient services and powerful computing capabilities. However, traditional security technologies can not fully guarantee the security of cloud computing. Introducing fully homomorphic encryption algorithms is one of the effective ways to address this issue. At the same time, fully homomorphic encryption technology based on lattice theory has the capabilities of natural resistance to quantum attacks and arbitrary calculations on data in an encrypted state, effectively guaranteeing data security in the quantum computing era. Although fully homomorphic encryption shows significant potential, it suffers the problem of the volume explosion of computing and storage. To address the above problem and speed up the widespread adoption of fully homomorphic encryption algorithms, researchers from the fields of algorithms and hardware have proposed a variety of solutions, and significant progress has been made. This work summarizes the progress of mainstream fully homomorphic encryption technology, analysis and compilation of algorithm libraries and fully homomorphic hardware accelerator in the past five years, and finally provides perspective of fully homomorphic encryption technology in the future.

Abstract:

The purpose of computing first network (CFN) is to deeply integrate the ubiquitous computation with network, in order to effectively allocate multi-dimensional basic resources such as computation and storage between clouds, edges and ends through the network, allowing users to use them as transparently as water and electricity resources. Computing resources can be requested on demand and used at any time. Due to heterogeneous computing resources, dynamic network and diverse user needs, it has become one of the core challenging problems to effectively schedule and route resources in computing first network. To address this problem, we design a multi-tier computing resource system (CRS). Different from the existing resource allocation, CRS is a complete computing first network technology solution based on the application layer, considering the computing resources awareness and computational routing. The computing resource system is composed of computing and network resource awareness strategy and computing resource routing protocol. The computing and network resource awareness strategy defines the intra-domain awareness rules within the jurisdiction and the inter-domain awareness rules between different jurisdictions. Based on this, we proposed a greedy-based resource routing algorithm (GBRA), which can dynamically generate a search tree for each task. The computing resource routing protocol completes the allocation of resources through CRS request message, authorization notification message, notification confirmation message and CRS response message. Through extensive simulation experiments, compared with other algorithms, it is demonstrated that CRS can complete the resource allocation of more tasks within the maximum response latency tolerated. In addition, better load balancing can be achieved among the computing nodes within the jurisdiction.

Graphics and Image Processing
Abstract:

3D shape reconstruction aims to recover the 3D structure information of the scene by using image sequences with different focus levels. Most of the existing 3D shape reconstruction methods evaluate the focus level of the image sequence from a single scale, and guide the reconstruction process by introducing regularization or post-processing methods. Due to the limitation of the selection space of depth information, the reconstruction results often cannot converge effectively. To address this issue, we propose a multi-scale cost aggregation framework for shape from focus (MSCAS). Firstly, non-downsampling multi-scale transformation is introduced to increase the depth information selection space of the input image sequence, and then the cost aggregation is performed by combining the intra-scale sequence correlation and the inter-scale information constraint. Through this expansion-aggregation mode, the doubling of scene depth representation information and the effective fusion of cross-scale and cross-sequence representation information are realized. As a general framework, MSCAS framework can embed existing model design methods and deep learning methods to achieve performance improvement. The experimental results show that MSCAS framework in this paper reduces the root mean square error (RMSE) on average by 14.91% and improves the structural similarity (SSIM) by 56.69% in the four datasets after embedding the model design class SFF method. After embedding the deep learning class SFF method, RMSE in the four datasets decreases by an average of 1.55% and SSIM increases by an average of 1.61%. These results verify the effectiveness of MSCAS framework.

Abstract:

With the rapid development of multimedia and network technology, the security of digital image content is becoming more and more prominent. In this paper, we propose a deep perceptual image authentication hashing scheme based on window self-attention feature fusion, which can effectively detect whether the perceptual content of the original image has changed. It can be applied to content authentication, tampering recognition, copy detection, and other similar scenarios. This scheme uses a convolutional neural network architecture that integrates a window self-attention mechanism to build a hashing model that encompasses global and local image features. The scheme chunks the shallow features obtained from the backbone network and extracts the corresponding window features, then calculates the correlation between each intermediate local feature and the global feature to filter out the final local features, and finally inputs the local features and global features into the Hash generation module for fusion and compression to obtain the final image Hash code. In the training process, an integrated loss function based on Hash loss and classification loss is used to constrain the model to improve the robustness and discrimination. The experimental results show that this scheme can achieve superior image content authentication performance compared with existing typical perceptual authentication hashing schemes.

High Performance Computing
Abstract:

The magnetic confinement fusion particle in cell (PIC) gyrokinetic simulation code, VirtEx, has been capable of studying the confinement and transport of the fusion product Alpha, which is the key to fusion energy realization. Alpha particle simulation relies heavily on the computational code of the kinetic ion, which has more complex memory access than the electron, and contains both non-regular accesses and atomic write-back operations, belong to memory-intensive application. MT-3000 as a new heterogeneous acceleration device provided by Tianhe’s new-generation supercomputing platform, has powerful computational performance with its extremely high computational density. Heterogeneous porting of Alpha particle simulations for this device is a great challenge. In order to fully exploit the computational power of the acceleration array in MT-3000, we combine application characteristics and propose some optimization methods, such as recalculation of intermediate variables, customized software cache design, memory locality optimization, and hotspot function merging, which are designed and implemented to reduce the total amount of memory accesses in the program. The medium scale benchmark with gyrokinetic ion shows an overall speedup of 4.2 times, with 10.9, 13.3 and 16.2 times of speedup on hotspot functions Push, Locate and Charge, respectively, meanwhile it shows a good scaling of scalability with 88.4% efficiency with 5 898 240 accelerator cores in 3 840 nodes.

Abstract:

Graph partitioning is a key technique for parallel processing of big graphs. Existing graph partitioning algorithms struggle to balance partition quality and efficiency. Offline partitioning algorithms tend to achieve high partition quality but are time-consuming, while online (or streaming) partitioning algorithms are relatively efficient but suffer from suboptimal partition quality. To address the above problem, we propose a distributed streaming partitioning algorithm with a buffering mechanism in this paper. The algorithm utilizes a multi-loader and multi-partitioner architecture, where multiple loaders read graph data in parallel to improve loading efficiency. Each partitioner maintains a buffer to store graph vertices received from the corresponding loader and sorts them in descending order based on vertex degree, providing better decision-making data for partitioning. Four streaming heuristic rules are pre-configured in the partitioners, which perform parallel partitioning on vertices in the buffer with different goals in mind. A reflow mechanism is employed to fine-tune the partitioning results and improve quality. Experimental results in a distributed system environment show that the proposed algorithm improves partition quality (edge-cut ratio) by more than 18.8% compared with the best existing online partitioning algorithm. Additionally, it reduces the proportion of graph data loading time in the total partitioning time from an average of 30.8% in a single-loader single-partitioner architecture to an average of 20.1%.

Abstract:

Deploying Transformer models under the conventional pre-train-then-fine-tune paradigm is challenging for multi-task serving, because a full model copy for each downstream task must be maintained, quickly exhausting the storage budget. Recent algorithmic advances in parameter-efficient Transformer (PET) have shown enormous potential to mitigate the storage overhead. They share the pre-trained model among tasks and only fine-tune a small portion of task-specific parameters. Unfortunately, existing serving systems neither have flexible PET task management mechanisms nor can efficiently serve queries to different tasks in batches. Therefore, we propose PetS, a unified framework for multi-task PET serving. Specifically, different PET tasks are expressed by a unified representation in the same framework, which enables flexible PET task management. Based on the unified representation, we design a specialized PET inference engine to batch different tasks’ queries together and execute them with task-agnostic shared operators and task-specific PET operators. Equipped with the PET inference engine, PetS is more scalable with respect to the number of tasks on a single GPU device. To further improve system throughput, we propose a coordinated batching strategy taking query length, PET task type as well as system load balancing together into consideration. To improve the throughput on multiple GPU instances, we also propose a PET-migration based load balancing strategy. We evaluate PetS on platforms with single GPU, including Edge/Desktop/Server GPUs. Comprehensive experiments demonstrate that PetS supports up to 26 times more concurrent tasks and improves the serving throughput by 1.53 times and 1.63 times on desktop and server GPU nodes, respectively. On multiple GPUs, our load-balancing strategy also provides up to 29% speedup.

More+

Journal Dynamics More+

Top ViewMore+

Top DownloadMore+

Top CitedMore+