2025 Vol. 62 No. 3
In recent years, LLM (large language model) has exhibited remarkable performance, profoundly transforming various aspects of human life. As these models grow in size and user demand for long-context inference increases, LLM inference systems face significant storage challenges. These challenges stem primarily from the vast number of model parameters and the key value cache required for efficient inference, both of which strain GPU memory resources. Additionally, inefficiencies in storage usage in distributed systems often result in over-provisioning and fault tolerance issues, further complicating resource management. Researchers explore memory optimization, heterogeneous storage, and distributed storage, synthesizing various research efforts to address GPU memory constraints and enhance resource utilization. Memory-optimized LLM inference systems improve GPU memory efficiency and reduce memory footprint through techniques like efficient key value cache management, compression, and attention operator optimization. Heterogeneous storage based LLM inference systems expand storage capacity by integrating various storage resources, thereby minimizing I/O overhead via tensor placement strategies, asynchronous data transfer, and intelligent memory allocation and prefetching methods. Distributed LLM systems enhance the utilization of multi-machine resources, boosting execution efficiency and fault tolerance in LLM inference tasks through batching, multi-level scheduling, and redundant replication. Finally, we review existing research and outline future research directions to further optimize storage solutions for LLM inference systems.
With the rapid development of natural language processing and deep learning technologies, large language models (LLMs) have been increasingly applied in various fields such as text processing, language understanding, image generation, and code auditing. These models have become a research hotspot of common interest in both academia and industry. However, adversarial attack methods allow attackers to manipulate large language models into generating erroneous, unethical, or false content, posing increasingly severe security threats to these models and their wide-ranging applications. This paper systematically reviews recent advancements in adversarial attack methods and defense strategies for large language models. It provides a detailed summary of fundamental principles, implementation techniques, and major findings from relevant studies. Building on this foundation, the paper delves into technical discussions of four mainstream attack modes: prompt injection attacks, indirect prompt injection attacks, jailbreak attacks, and backdoor attacks. Each is analyzed in terms of its mechanisms, impacts, and potential risks. Furthermore, the paper discusses the current research status and future directions of large language models security, and outlooks the application prospects of large language models combined with multimodal data analysis and integration technologies. This review aims to enhance understanding of the field and foster more secure, reliable applications of large language models.
In recent years, large language models (LLMs) represented by ChatGPT have developed rapidly. As the scale of model parameters continues to grow, building and deploying LLMs puts forward higher requirement for data scale and storage access efficiency, which poses significant challenges to traditional storage systems. This study first analyzes the storage access characteristics across the three critical stages of LLM workflows: data preparation, model training, and inference. It also explores in depth the major issues and bottlenecks faced by traditional storage systems in LLM scenarios. To address these challenges, the study proposes and implements ScaleFS, a high-performance and scalable distributed metadata design. ScaleFS adopts a decoupled design for directory tree metadata and attribute metadata, and combines with a hierarchical partitioning strategy that balances depth and breadth in the directory tree. This design enables efficient path resolution, load balancing, and system scalability, thereby making it capable of effectively managing hundreds of billions of files. Additionally, ScaleFS introduces fine-grained metadata structures, optimizes metadata access patterns, and develops a metadata key-value store tailored for file semantics. These innovations significantly improve metadata access efficiency while reducing disk I/O operations. The experimental results demonstrate that ScaleFS achieves operations per second (OPS) rates 1.04 to 7.12 times higher than HDFS, with latency reduced to only 12.67% to 99.55% of HDFS. Furthermore, at a scale of hundreds of billions of files, ScaleFS outperforms HDFS in most operations, even when HDFS operates at a billion-file scale. This demonstrates its superior scalability and access efficiency. ScaleFS is thus well-suited to meet the demands of LLM scenarios for managing and efficiently accessing massive file datasets.
Recent advances in large language models (LLMs) have significantly elevated requirements for data quality in practical applications. Real-world scenarios often involve heterogeneous data from multiple correlated domains. Yet cross-domain data integration remains challenging due to privacy and security concerns that prohibit centralized sharing, thereby limiting LLM’s effective utilization. To address this critical issue, we propose a novel framework integrating LLM with knowledge graphs (KGs) for cross-domain heterogeneous data query. Our approach presents a systematic governance solution under the LLM-KG paradigm. First, we employ domain adapters to fuse cross-domain heterogeneous data and construct corresponding KG. To enhance query efficiency, we introduce knowledge line graphs and develop a homogeneous knowledge graph extraction (HKGE) algorithm for graph reconstruction, significantly improving cross-domain data governance performance. Subsequently, we propose a trusted subgraph matching algorithm TrustHKGM to ensure high-confidence multi-domain queries through confidence computation and low-quality node filtering. Finally, we design a multi-domain knowledge line graph prompting (MKLGP) algorithm to enable efficient and trustworthy cross-domain query answering within the LLM-KG framework. Extensive experiments on multiple real-world datasets demonstrate the superior effectiveness and efficiency of our approach compared with state-of-the-art solutions.
By merging the functions of Boolean logic and non-volatile memory, memristive stateful logic can achieve the real sense of in-memory computing through eliminating data movement during computation, which breaks the “memory wall” and “energy wall” of traditional von Neumann computing system. In recent years, a series of the memristor-based in-memory stateful logic gates have been proposed by linking the conditional switching process and mathematical logic function, whose functions cover multiple logic functions such as IMP, NAND, NOR, and NIMP etc. However, the automated synthesis and mapping method for implementing the in-memory complex stateful logic computation by cascading the stateful logic gates is still embryonic, especially lacking the investigations on the device wear, which limits the application of in-memory stateful logic in edge computing scenarios. To reduce the device wear (toggle rate) in a complex in-memory stateful logic computation process, we propose a stateful logic synthesis and mapping process based on multiple stateful logic gates for low-wear in-memory computing. Compared with the two state-of-art stateful logic synthesis and mapping tools of SIMPLER MAGIC and LOSSS, the proposed low-wear logic synthesis and mapping process achieves an average improvement of over 35.55% and 8.48% in the toggle rates respectively under the EPFL combinational benchmark circuits. Moreover, the proposed tool achieves an average improvement of over 47.26% and 6.72% in the toggle rates respectively under the LGSynth91 benchmark circuits.
For the past few years, the storage industry has undergone tremendous changes. Semiconductor storage devices, like solid state drives (SSDs), have flourished and are able to completely outperform traditional hard disk drives (HDDs), addressing data by moving magnetic head. Nowadays, the mainstream protocols supporting SSDs are NVMe and SAS. NVMe is a high-performance storage protocol designed specifically for SSDs that can maximize the performance of SSDs; while the SAS protocol fully considers the requirements of data centers, providing high reliability and high scalability while considering the balance between system performance and cost. Compared with the increasingly fast storage media, the time overhead of the software stack designed for slow storage devices in an I/O process is becoming increasingly significant. To address this issue, numerous excellent works have been proposed by academia and industry. For example, Intel’s SPDK (storage performance development kit) has greatly shortened the response time of NVMe SSD to applications by implementing device drivers in user space and polling I/O completion, extremely improving the performance of the entire system. However, previous research on the optimization of SAS SSD storage software stack is very limited. Therefore the SAS software stack optimization for SSD is implemented in user space. Experimental result shows that this optimization can effectively improve the data access efficiency with applications and storage devices. Besides, aiming to accurately evaluate the time cost of storage devices in I/O stack, a hardware performance testing tool HwPerfIO is proposed, which can eliminate the impact of most software overhead to measure the more accurate storage equipment performance.
To facilitate researchers’ understanding of the application, acceptance, and funding processes for projects in the artificial intelligence discipline under the National Natural Science Foundation of China (NSFC), this paper provides a statistical analysis of the discipline’s projects in 2024. It first introduces the significant reform measures implemented by the NSFC in 2024. Subsequently, it summarizes and analyzes the application and funding status of projects for both the research and scholar series within the artificial intelligence discipline (F06) during the current year. Special attention is given to the changes in project applications and funding, shifts in the age distribution of applicants, and the distribution of host institutions, under the new reform measures. Finally, the paper provides an outlook on priority development directions in the field of artificial intelligence.
Image-text cross-modal entity linking is an extension of traditional named entity linking. The inputs are images containing entities, which are linked to textual entities in the knowledge base. Existing models usually adopt a dual-encoder architecture which encodes entities of visual and textual modality into separate vectors, then calculates their similarities using dot product, and links the image entities to the most similar text entities. The training process usually adopts the cross-modal contrastive learning task. For a given modality of entity vectors, this task pulls closer the vector of another modality that corresponds to itself, and pushes away the vector of another modality corresponding to other entities. However, this approach overlooks the differences in representation difficulty within the two modalities: visually similar entities are often more difficult to distinguish than textual similar entities, resulting in the incorrect linking of the former ones. To solve this problem, we propose two new contrastive learning tasks, which can enhance the discriminative power of the vectors. The first is self-contrastive learning, which aims to improve the distinction between visual vectors. The second is hard-negative contrastive learning, which helps a textual vectors to distinguish similar visual vectors. We conduct experiments on the open-source dataset WikiPerson. With a knowledge base of
With the continuous development and rapid popularization of 5G networks, the number of user devices and potential demand is increasing sharply. However, the high frequency of 5G signals leads to significant propagation losses. In order to achieve broader coverage of user devices, it is necessary to optimize existing 5G base station sites or guide the selection of new base station sites with low cost and high efficiency. The state-of-the-art methods for site selection mostly use heuristic algorithms to optimize the sites. However, the convergence time increases exponentially with the increase of the number of possible 5G base station sites, bringing many challenges for the site optimization. Therefore, we propose the method of selecting 5G base station sites based on user demand points to sufficiently consider the communications among users. Specifically, the planning area gridding method is proposed to reduce the time complexity of computation for user demand points covered by base stations. Then, the concept of separate degree among base stations is proposed and measured based on the number of user demand points covered by the base station. We give the objective function that satisfies the submodularity and the greedy algorithm to obtain the optimal scheme of base station site selection. Experimental results show that the proposed method outperforms the comparative algorithms on all evaluation metrics, and can effectively improve the coverage of 5G base station signals. In the same base station planning area, our proposed method achieves the maximum coverage rate with the minimum number of 5G base stations, thereby effectively reducing the construction cost of 5G base stations.
In recent years, large-scale autoregressive Chinese pre-trained language models (PLMs) have demonstrated outstanding performance on various natural language processing (NLP) tasks. However, these models are computationally expensive, and their word-based vocabulary poses significant challenges for practical applications. In addition, most of them use only unidirectional context information, which may result in performance degradation on many tasks, especially tasks requiring a nuanced understanding of context. To address these challenges, we introduce LingLong, a high-quality small-scale Chinese pre-trained language model. LingLong stands out due to its modest scale, comprising only 317 million parameters, making it highly deployable and resource-efficient. We tokenize the training corpus with a character-based vocabulary to mitigate the negative impacts of unknown tokens and word segmentation errors. Moreover, we go beyond the conventional unidirectional context by introducing a novel backward model. This model is trained by reversing the input order of the training data. Combining LingLong and its backward version allows for the use of bidirectional information on downstream tasks. Extensive experimental results validate the effectiveness of LingLong across a diverse set of NLP tasks. LingLong outperforms similar-sized Chinese PLMs on six downstream tasks and surpasses popular large-scale Chinese PLMs on four downstream tasks. These findings underscore the versatility and efficiency of LingLong, opening up possibilities for practical applications and advancements in the Chinese NLP field.
Subgraph matching is an optimization problem in graph, which is to find all matching subgraphs of the query graph in a large target graph. Although subgraph matching is an NP-Hard problem, the problem is common in many fields such as social networks, biochemistry, and cognitive science. Backtracking searching algorithms for subgraph matching have high time complexity, and the pruning strategy is essential to reduce the operating time. However, complex expanding in the existing pruning strategy leads to high complexity of time and space. To balance the efficiency and the effectiveness, only limited neighborhood structure information can be used in conflicting judging, which lets lots of useless states pass the pruning judging and wastes time. An efficient, accurate, and adaptive subgraph matching algorithm is proposed. The algorithm captures the detail structure of the whole graph by graph neural network, builds structure connections, and generates pruning possibilities for all candidate searching states. It replaces complex expanding pruning method with inferring by the neural network model to rapidly estimate the probability of pruning during searching. A data sampling mechanism is designed to alleviate the problem of network training collapse. Experiments show that using our pruning method in traditional backtracking search can improve search efficiency.
Graphs often have rich temporal information and evolve dynamically over time, which can be modeled as temporal graph streams. A temporal graph stream consists of a set of vertices and a series of timestamped and directed edges, where new vertices and edges arrive continuously over time. Temporal motifs are generalizations of subgraph patterns in static graphs which take into account edge orderings and durations in addition to topologies. Counting the number of occurrences of temporal motifs is a fundamental problem for temporal graph analysis. However, traditional streaming subgraph counting methods cannot support temporal matching, and are only suitable for simple graphs that do not contain temporal information. In addition, existing temporal motifs counting methods suffer from poor performance in temporal graph streams. We thus study approximate temporal motif counting via random sampling in temporal graph streams. We propose a generic streaming edge sampling (SES) algorithm to estimate the number of instances of any temporal motif in a given temporal graph stream. We then provide comprehensive analyses of the theoretical bounds and time complexities of SES. Finally, we perform extensive experimental evaluations for SES on four real world datasets. The results show that SES achieves up to three orders of magnitude speedups over the state-of-the-art sampling methods while having comparable estimation errors for temporal motif counting in the streaming setting.
With the increasing demand for people counting, the technology of human flow monitoring based on channel state information (CSI) attracts much attention because of its advantages such as easy deployment, privacy protection and strong applicability. However, in the existing human flow monitoring work, the accuracy of pedestrian recognition is easily affected by the density of the crowd. To ensure the monitoring accuracy, the monitoring can only be carried out when the crowd is sparse, which leads to the lack of practicability of the human flow monitoring technology based on CSI. In order to solve this problem, a monitoring method that can identify continuous flow of people is proposed. The method firstly uses phase unwrapping and linear phase correction algorithm to eliminate random phase offset and phase compensation for original data, then extracts valid data packets from continuous flow data by standard deviation and variance, and finally inputs phase difference information in the time domain as feature signals into the deep learning convolutional, long short-term memory, deep neural network (CLDNN) for pedestrian recognition. After actual testing, the method achieves outdoor accuracy of 96.7% and indoor accuracy of 94.1% under the condition that the distance between pedestrians in front and back is not less than 1 m, outperforming the existing method of human flow monitoring.
The existing multi-view clustering algorithms exhibit limitations in accurately capturing the high-order information and complementary information embedded in multi-view data during the low-dimensional representations learning process. Meanwhile, these algorithms fail to capture the local information of data, and their information extraction methods lack robustness to noise and outliers. To address these challenges, an adaptive tensor singular value shrinkage multi-view clustering algorithm named ATSVS is proposed. ATSVS proposes a novel tensor log-determinant function to enforce the low-rank constraint on the representation tensor, which can adaptively enable adaptive shrinkage of singular values based on their magnitude. Consequently, ATSVS effectively captures high-order information and complementary information within multi-view data from the global perspective. Then, ATSVS captures the local information of the data by using the l1,2 norm that combines the advantages of sparse representation and manifold regularization technology, while improving the robustness of the algorithm to noisy points by combining with l2,1 norms to impose sparse constraints on the noise. The experimental results with eleven comparison algorithms on nine different types of datasets show that our proposed algorithm ATSVS has the superior clustering performance, outperforming state-of-the-art baselines significantly. Consequently, ATSVS is an excellent algorithm that can effectively handle the task of clustering multi-view data.
Audio recognition has been widely applied in the typical scenarios, like auto-driving, Internet of things, and etc. In recent years, research on adversarial attacks in audio recognition has attracted extensive attention. However, most of the existing studies mainly rely on the coarse-grain audio features at the instance level, which leads to expensive generation time costs and weak universal attacking ability in real world. To address the problem, we propose a phonemic adversarial noise (PAN) generation paradigm, which exploits the audio features at the phoneme level to perform fast and universal adversarial attacks. Experiments are conducted using a variety of datasets commonly used in speech recognition tasks, such as LibriSpeech, to experimentally validate the effectiveness of the PAN proposed in this paper, its ability to generalize across datasets, its ability to migrate attacks across models, and its ability to migrate attacks across tasks, as well as further validating the effectiveness of the attack civilian-oriented Internet of things audio recognition application in the physical world devices. Extensive experiments demonstrate that the proposed PAN outperforms the comparative baselines by large margins (about 24 times speedup and 38% attacking ability improvement on average), and the sampling strategy and learning method proposed in this paper are significant in reducing the training time and improving the attack capability.
Given the frequent cybersecurity incidents, anomaly detection methods have been widely employed for the identification of malicious behaviors. However, these anomalous accesses often exhibit prominent characteristics only in certain attribute fields, rendering the detection results susceptible to interference from attributes where anomalies are less prominent. To address this issue, MNDetecctor, an anomaly access detection framework that introduces the multiplex network structure into this field is proposed. Through association analysis, closely associated attribute fields are constructed into single-layer networks, with cross-layer connections added to form a multiplex network. Subsequently, cross-layer walks are performed to obtain node sequences within the same layer and across layers, facilitating node embedding. Ultimately, a hierarchical generative adversarial network is employed to merge reconstruction losses and discriminative results across different layers, thereby achieving anomaly access detection. Experimental results demonstrate that MNDetector surpasses the performance of state-of-the-art detection methods on multiple public datasets, achieving an approximately 8% increase in
In order to improve the hiding capacity of information hiding algorithms while ensuring the quality of generated text, we propose a generative information hiding method based on couplet carrier. Firstly, we pre-train the couplet text data and build a couplet generation model based on a multi-flow pre-training and fine-tuning framework; secondly, we use the subject words as the input to generate the first line of a couplet, and the model can generate the first line of couplets on the same subject words; then we use the first line of a couplet as the input to generate the second line of a couplet. The method mitigates the semantic ambiguity in the current couplet generation model by utilizing the span-by-span learning approach, the padding generation mechanism and the noise perception mechanism to ensure that the generated couplets correspond to each other in terms of their metrical patterns. The secret information can be effectively hidden by different choices of subject words, candidate the first line of couplets and candidate words for generating the second line of a couplet. The experimental results show that the method can obtain high hiding capacity, and the average hiding capacity of 7-word couplets can reach 10.24B, and the generated couplets satisfy the strict form and content requirements of couplets, such as equal number of words, comparable lexicality, proportional structure and harmonious ping-ze. The overall performance of the proposed method is better than the current mainstream generative text information hiding schemes.
A large number of application practices have proven the effectiveness of fuzzy testing to detect program vulnerabilities. The existing fuzzy testing methods lack the analysis of differences in performance specific to the testing tasks and adjust testing policies appropriately. Instead, they mostly adopt a unified process, resulting in unsatisfactory testing results. It is necessary to modify the policy based on specific information during the testing process to achieve better testing performance, and a new program defect fuzzy testing method based on execution context orientation is proposed, which can break through the protection mechanism. By capturing and analyzing specific contextual information during the actual processing of input test cases by the tested program, and achieving rapid exploration of program structural features, the sample mutation policy can be optimized. Meanwhile, a prototype tool CBFuzzer for program defect fuzzy detection based on execution context orientation is implemented. The experimental results indicate that CBFuzzer can effectively explore the internal structure of programs (including breakthroughs in protection mechanisms), simulate unconventional program state transitions, and more efficiently expose vulnerability points. By comparison, CBFuzzer shows improvements ranging from 6.8% to 36.76% in terms of vulnerability exposure, with the highest increase in the number of actual vulnerabilities detected reaching up to 66.67%. With the investment of a small amount of additional testing resources within an acceptable range, CBFuzzer not only achieves improved detection performance for regular types of vulnerabilities but also exhibits higher detection capabilities for vulnerabilities with strong concealment. As of August 10, 2023, a total of 126 new vulnerabilities have been identified through the utilization of CBFuzzer in 13 testing tasks (reported to related software developers and submitted to CVE® organization).
Collective spatial keyword queries play an important role in the fields such as spatial databases, location services, intelligent recommendations, and group intelligence perception. The existing collective spatial keyword query methods do not consider the problem of requiring time-distance constrained and cost aware, and cannot meet the query needs of most users under time-distance constrained. Existing research results have significant limitations. To make up for the shortcomings of existing methods, collective spatial keyword query based on time-distance constrained and cost aware (called TDCCA-CoSKQ) is proposed. To address the issue of not being able to include both keyword information and time information in existing indexes, the TDCIR-Tree index is proposed, which combines inverted files and time attribute label files. TDCIR-Tree can reduce the cost of query calculation. TDCCA_PP algorithm is proposed to address the issue of subsequent screening of collections that meet query criteria for TDCCA-CoSKQ, including