Top Downloaded
- User statistics ranking in the last month (excluding this month)
- User statistics ranking within half a year (excluding this month)
- User statistics ranking within one year (excluding this month)
- User statistics ranking within two years (excluding this month)
- User statistics ranking within three years (excluding this month)
To facilitate researchers’ understanding of the application, acceptance, and funding processes for projects in the artificial intelligence discipline under the National Natural Science Foundation of China (NSFC), this paper provides a statistical analysis of the discipline’s projects in 2024. It first introduces the significant reform measures implemented by the NSFC in 2024. Subsequently, it summarizes and analyzes the application and funding status of projects for both the research and scholar series within the artificial intelligence discipline (F06) during the current year. Special attention is given to the changes in project applications and funding, shifts in the age distribution of applicants, and the distribution of host institutions, under the new reform measures. Finally, the paper provides an outlook on priority development directions in the field of artificial intelligence.
The recommender system has a significant role in alleviating information overload, allowing users to conveniently obtain products and services on various application platforms like Tmall, TikTok, and Xiaohongshu. However, most of the recommendation systems focus on the accuracy rate as the center, which leads to adverse effects such as the limitation of users’ vision, fewer display opportunities for some merchants, a single content ecosystem of the platform, and an unbalanced allocation of resources and information, such as triggering the filter bubble and the Matthew effect. As a result, strengthening the diversity of the recommendation system has become a key research point to fulfill the increasingly diversified material demands in people’s lives. In recent years, research on diversified recommendations has developed rapidly. However, this aspect needs to be more systematic in organization and summarization. This paper systematically reviews the issue of diversified recommendations within recommendation systems. Firstly, we put forward the problem definition, technical framework, classification, and application scenarios of diversified recommendations. Secondly, we make comparisons and analyses of models and algorithms from four perspectives. Subsequently, we summarize the commonly used datasets and metrics for diversified recommendations. Finally, we deliberate on the problems and challenges in this field to inspire future innovation and promote development.
Central processing unit is the most important computing infrastructure nowadays. To maximize the profit, architects design the processor microarchitecture by trading-off multiple objectives including performance, power, and area. However, because of the tremendous instructions of workloads running on the processors, the evaluation of individual microarchitecture design point costs minutes to hours. Furthermore, the design space of the microarchitecture is huge, which results that the exploration of comprehensive design space is unrealistic. Therefore, many machine-learning-assisted design space exploration acceleration methods are proposed to reduce the size of evaluated design space or accelerate the evaluation of a design point. However, a comprehensive survey summarizing and systematically classifying recent acceleration methods is missing. This survey paper systematically summarizes and classifies the five kinds of acceleration methods for the design space exploration of the processor microarchitecture, including the workload selection of software design space, the partial simulation of workload instructions, the design point selection, the simulation tools, and the performance models. This paper systematically compares the similarities and differences between papers in the acceleration methods, and covers the complete exploration process from the software workload selection to the hardware microarchitecture design. Finally, the research direction is summarized, and the future development trend is discussed.
Knowledge base question answering is aimed to retrieval relevant information from the knowledge base for model inference, and return accurate answers. In recent years, with the development of deep learning and large language models, knowledge base question answering based on information retrieval has become the research focus, and many novel research methods have emerged. We summarize and analyze the methods of knowledge base question answering based on information retrieval from different aspects such as model methods and datasets. Firstly, we introduce the research significance and related definitions of knowledge base question answering. Then, according to the model processing stages, we explain the key problems and typical solutions faced in each stage from four stages: question parsing, information retrieval, model inference, and answer generation, and summarize the common network modules used in each stage. Then we analyze and sort out the inexplicability of knowledge base question answering based on information retrieval methods. In addition, relevant datasets with different characteristics and baseline models at different stages are classified and summarized. Finally, the summary and outlook are provided on each stage of knowledge base question answering based on information retrieval, as well as the overall development direction of the field.
Currently, deep learning has achieved significant success in the field of synthetic speech detection. However, deep models commonly attain high accuracy on test sets that closely match their training distribution but exhibit a substantial drop in accuracy in cross-dataset scenarios. To enhance the generalization capability of models on new datasets, they are often fine-tuned with new data, but this leads to catastrophic forgetting, where the model’s knowledge learned from old data is impaired, resulting in deteriorated performance on the old data. Continuous learning is a prevalent approach to mitigate catastrophic forgetting. In this paper, we propose a continuous learning algorithm called elastic orthogonal weight modification (EOWM) to address catastrophic forgetting for synthetic speech detection. EOWM mitigates knowledge degradation by adjusting the direction and magnitude of parameter updates when the model learns new knowledge. Specifically, it enforces the updates’ direction to be orthogonal to the data distribution of the old tasks while constraining the magnitude of updates for important parameters in the old tasks. Our proposed algorithm demonstrates promising results in cross-dataset experiments within the domain of synthetic speech detection. Compared with fine-tuning, EOWM reduces the equal error rate (EER) on the old dataset from 7.334% to 0.821%, representing a relative improvement of 90%, and on the new dataset, it decreases EER from 0.513% to 0.315%, corresponding to a relative improvement of 40%.
Traffic data missing is one of the unavoidable problems in intelligent transportation systems. Completing and quantifying the uncertainty of missing values can improve the performance and reliability of traffic data mining tasks in intelligent transportation systems. However, most existing traffic data imputation models mainly focus on point estimation without quantifying the uncertainty, so they cannot meet the need for traffic data reliability in the transportation field. Besides, these methods only focus on modeling spatial-temporal correlation of traffic data, failing to consider the impact of missing values on spatial-temporal correlation. In addition, the uncertainty of traffic data is affected by time, spatial location, and the state of the data, but existing methods cannot comprehensively consider these factors. To address these challenges, we propose a spatial-temporal uncertainty guided traffic data imputation network (STUIN), which simultaneously realizes the imputation of spatial-temporal traffic data and the uncertainty quantification of the imputation results by self-supervised training. Specifically, we innovatively model the hidden states of the neural network as random variables subject to Gaussian distributions, use the variances of Gaussian distributions to model the uncertainty of the hidden states, and introduce a variance-based attention mechanism to characterize the effect of uncertainty on modeling spatio-temporal correlations. In addition, we design a novel spatial-temporal uncertainty initialization module, which incorporates the influence of time, space and missing values when initializing the means and variances of the Gaussian distributions. Experiments on two traffic flow datasets show that STUIN achieves state-of-the-art performance on both the data imputation and uncertainty quantification tasks.
Point cloud segmentation algorithm based on deep learning can effectively segment point clouds in high-dimensional space by designing complex feature extraction modules. However, the lack of feature mining for boundary point set results in suboptimal accuracy in boundary segmentation. Some studies have applied the idea of contrastive learning to point cloud segmentation to solve the problem of insufficient boundary region segmentation performance, but the disorder and sparse characteristics of point cloud have not been fully utilized, and the feature extraction is not accurate enough. To solve these problems, we propose CL2M to learn more accurate features of point clouds at different locations through the self-attention mechanism, and the contrastive learning method is introduced to improve the segmentation accuracy of point cloud boundaries. In the process of contrastive boundary learning, labels in semantic space are deeply mined and a contrastive boundary learning module based on label distribution is designed to make the label distribution of point cloud in high-dimensional space contain more semantic information. The model makes full use of the label distribution law to calculate the distance between distributions, and can accurately divide positive and negative samples, reducing the cumulative errors caused by conventional hard partition. The results on two public data sets show that CL2M is superior to the existing point cloud segmentation model on several evaluation indexes, which verifies the effectiveness of the model.
Multimodal sentiment analysis is a multimodal task that uses multiple modalities of subjective information to analyze sentiment. In some scenarios, the sentimental expression in different modalities is inconsistent, even contradictory, which will weaken the effect of multimodal collaborative decision-making. In this paper, a multimodal learning method is proposed to learn the modal feature representations with consistent sentimental semantics. In order to improve the common feature representation of different modalities and learn the dynamic interaction between modalities without affecting the original information, we first learn the common feature representation of each modality, and then use cross attention to enable one modality to effectively obtain auxiliary information from the common feature representations of other modalities. In multimodal fusion, we propose a multimodal attention, which is used to weighted concatenate modal feature representations, in order to increase the expression of contributed modalities and suppress the influence of weak modalities. The experimental results of the proposed method on the sentiment analysis datasets MOSI, MOSEI, CH-SIMS are better than those of the compared models, indicating the necessity and rationality of considering the problem of sentimental semantic inconsistency in multimodal sentiment analysis.
In recent years, large language models (LLMs) have been widely applied in a range of downstream tasks and have demonstrated remarkable text understanding, generation, and reasoning capabilities in various fields. However, jailbreak attacks are emerging as a new threat to LLMs. Jailbreak attacks can bypass the security mechanisms of LLMs, weaken the influence of safety alignment, and induce harmful outputs from aligned LLMs. Issues such as abuse, hijacking and leakage caused by jailbreak attacks have posed serious threats to both dialogue systems and applications based on LLMs. We present a systematic review of jailbreak attacks in recent years, categorize these attacks into three distinct types based on their underlying mechanism: manually designed attacks, LLM-generated attacks, and optimization-based attacks. We provide a comprehensive summary of the core principles, implementation methods, and research findings derived from relevant studies, thoroughly examine the evolutionary trajectory of jailbreak attacks on LLMs, offering a valuable reference for future research endeavors. Moreover, a concise overview of the existing security measures is offered. It introduces pertinent techniques from the perspectives of internal defense and external defense, which aim to mitigate jailbreak attacks and enhance the content security of LLM generation. Finally, we delve into the existing challenges and frontier directions in the field of jailbreak attacks on LLMs, examine the potential of multimodal approaches, model editing, and multi-agent methodologies in tackling jailbreak attacks, providing valuable insights and research prospects to further advance the field of LLM security.
With the rapid development of large-scale model technology, these models have exhibited remarkable performance in fields such as natural language processing and computer vision, becoming essential tools for addressing complex issues and drawing significant interest from both the scientific community and the industry. Nonetheless, current cloud-platform-based schemes for training and inference of large models face multiple challenges, including high expenses, restricted scalability, and information security risks. As the scale of model parameters expands continually, the need for low-cost, efficient training and inference methods grows ever more pressing. Carrying out collaborative training and inference of large models on edge devices can dramatically decrease latency and bandwidth demands, concurrently reinforcing data privacy and operational efficiency. This strategy furnishes vital technological support for the economical deployment of large models across a variety of contexts, thereby evolving into one of the prominent research hotspots. This article conducts a thorough investigation of research pertinent to large models in the context of edge intelligence, with an in-depth analysis and discourse primarily focused on two aspects: edge-based training and inference of large models. Ultimately, it outlines the challenges confronted in the progression of large model technologies tailored for edge intelligence and delineates future prospects. The ambition is to stimulate a heightened comprehension and intensified attention from both academic and industrial sectors towards technologies involving large models for edge intelligence, thereby encouraging further scholarly exploration in this thriving domain.
The fusion of deep learning and the Internet of things has significantly promoted the development of the AIoT ecosystem. On the one hand, the huge amounts of multi-modal data collected by AIoT devices provide deep learning with abundant training data resources, which plays a more important role in the era of big models. On the other hand, the development of deep learning makes AIoT devices smarter, which shows great potential for promoting social development and the convenience of human life. As major support for the usage of deep learning in AIoT, federated learning effectively makes use of the training data provided by AIoT devices to train deep learning models with data privacy protection while collaborative inference overcomes the obstacles in the deployment of deep learning brought by the limited computation resource of AIoT devices. We induce the concept of AIoT-oriented collaborative intelligence. Aiming at implementing knowledge transmission and computation resource supply with high efficiency and security, we review the related works, published in the past 10 years, about the architecture, algorithm, privacy, and security of federated learning and collaborative inference, and introduce the inner connection of federated learning and collaborative inference. The algorithm part summarizes the federated learning and collaborative inference algorithm related to AIoT use cases and their optimization goals. The architecture part introduces the related works about deep learning accelerators, deep learning compilation, deep learning frameworks, communication among devices, and collaboration among devices from the view of AI computing systems. The privacy and security part introduces the privacy and security threats faced by AIoT-oriented collaborative intelligence and the defense methods against them. We also provide insights into the future development of AIoT-oriented collaborative intelligence in the aspect of equipment sharing, model sharing, collaboration of privacy and security mechanisms, and collaboration of incentive mechanisms.
NAND flash is widely utilized in mobile devices due to its excellent characteristics, including large capacity, light weight, and shock resistance. The flash friendly file system (F2FS), designed for flash features, is a typical log-structured file system (LFS). It employs a log-structured write mechanism to enhance random write performance, utilizes roll-forward recovery technology for fast consistency protection, and is commonly used as a file system for mobile devices. However, the performance of file system is impacted by fragmentation and segment cleaning. The out-of-place update mechanism of LFS and the small write mode of high-concurrency and random synchronization of mobile applications exacerbate fragmentation, leading to sluggish I/O request responses and device operation freezes. We initially introduce the relevant concepts and content of log-structured file systems in mobile devices. We then primarily outline the research status of fragmentation and segment cleaning of LFS. Firstly, we analyze the generation and impact of fragmentation, summarize the research work on reducing fragments from the perspectives of preventing fragments and reorganizing fragments. Secondly, we examine the impact of the mixed storage of hot and cold data on segment cleaning. Additionally, we summarize the research status of distinguishing hot and cold data from static and dynamic classification, and segment cleaning from the perspectives of managing data distribution and adjusting the timing, frequency, and objects of segment cleaning. Finally, we outline the main challenges and future research prospects of log-structured file systems in mobile devices.
In recent years, the rapid urbanization and development of the social economy have led to a growing focus on public safety issues. Governments across the world are increasingly promoting the construction of smart cities and intelligent security systems to safeguard the lives and property of citizens and maintain social stability. Person re-identification (ReID) is an essential technology for building smart cities, with significant implications for security monitoring and criminal investigation applications. The goal of person re-identification is to accurately identify specific individuals captured under different cameras. However, due to intra-class differences resulting from various factors such as illumination, viewpoint, occlusion, and pose, person re-identification remains a challenging task in the field of computer vision. Although existing fully supervised person re-identification methods have made significant progress, the scarcity of data and labels poses a bottleneck for further improving model performance. To address this challenge, we introduce a more complex and diverse synthetic dataset with easy-to-obtain labels for auxiliary training, and propose a novel camera-aware asymmetric adversarial learning (CAAL) method that overcomes intra-class variation among multiple cameras and the domain-shift between real data and synthetic data, enabling the learning of camera-invariant feature representations from diverse data sources. Furthermore, to mitigate the impact of misleading information carried by synthetic datasets and prevent the model from overfitting to synthetic data during adversarial training, we propose using an auxiliary network trained on real-world data to constrain the training of the backbone network. Finally, we conduct extensive experiments on two public datasets to demonstrate the effectiveness of the proposed method.
In recent years, the rapid development of artificial intelligence technology, particularly deep learning, has led to its widespread application in various fields such as computer vision and natural language processing. However, recent research indicates potential security risks associated with these advanced AI models could compromise their reliability. In light of this concern, this survey delves into cutting-edge research findings pertaining to security attacks, attack detection, and defense strategies for artificial intelligence models. Specifically regarding model security attacks, our work focuses on elucidating the principles and technical status of adversarial attacks, model inversion attacks, and model theft attacks. With regards to model attack detection methods explored in this paper, they include defensive distillation techniques, regularization approaches, outlier detection, robust statistics. As for model defense strategies examined in this study, they encompass adversarial training measures, model structure defense mechanisms, query control defenses along with other technical means. This comprehensive survey not only summarizes but also expands upon techniques and methodologies relevant to ensuring the security of artificial intelligence models thereby providing a solid theoretical foundation for their secure applications while simultaneously enabling researchers to gain a better understanding of the current state-of-the-art research within this field facilitating informed decisions when selecting future research directions.
Graph contrastive learning is widely employed in recommender system due to its effectiveness in mitigating data sparsity issue. However, most current recommendation algorithms based on graph contrastive learning start to learn from only a single perspective, severely limiting the model’s generalization capability. Furthermore, the over-smoothing problem inherent in graph convolutional networks also affects the model’s stability. Based on this, we propose the multi-perspective graph contrastive learning recommendation method with layer attention mechanism. On the one hand, this method proposes three contrastive learning approaches from two different perspectives. From a view-level perspective, it constructs perturbation-enhanced view by adding random noise for the original graph and employing singular value decomposition (SVD) recombination to establish SVD-enhanced view. It then performs view-level contrastive learning on these two enhanced views. From a node-level perspective, it conducts contrastive learning on candidate nodes and candidate structural neighbors using semantic information between nodes, optimizes multi-task learning with three contrastive auxiliary tasks and a recommendation task to enhance the quality of node embeddings, thereby improving the model’s generalization ability. On the other hand, in the context of learning for user and item node embeddings by graph convolutional network, a layer attention mechanism is employed to aggregate the final node embeddings. This enhances the model’s higher-order connectivity and mitigates the over-smoothing issue. When compared with ten classic models on four publicly available datasets, such as LastFM, Gowalla, Ifashion, and Yelp, the results indicate that this method achieves an average improvement of 3.12% in
Google’s knowledge graph technology has drawn a lot of research attentions in recent years. However, due to the limited public disclosure of technical details, people find it difficult to understand the connotation and value of this technology. In this paper, we introduce the key techniques involved in the construction of knowledge graph in a bottom-up way, starting from a clearly defined concept and a technical architecture of the knowledge graph. Firstly, we describe in detail the definition and connotation of the knowledge graph, and then we propose the technical framework for knowledge graph construction, in which the construction process is divided into three levels according to the abstract level of the input knowledge materials, including the information extraction layer, the knowledge integration layer, and the knowledge processing layer, respectively. Secondly, the research status of the key technologies for each level are surveyed comprehensively and also investigated critically for the purposes of gradually revealing the mysteries of the knowledge graph technology, the state-of-the-art progress, and its relationship with related disciplines. Finally, five major research challenges in this area are summarized, and the corresponding key research issues are highlighted.
With rapid advancements in edge computing, sensing, AI, and communication technologies, vehicles are undergoing an unprecedented transformation. We introduce a new computing paradigm for the autonomous driving era—vehicle computing. In this paradigm, data and control layers are separated, creating an open computing platform that supports multi-party collaboration and data sharing, breaking away from the limitations of traditional, closed vehicle systems. This paradigm enables vehicles to transcend conventional transportation roles, evolving into versatile mobile computing platforms that support a wide range of advanced applications and third-party services. We define the core concepts of vehicle computing, analyze the revolutionary evolution of software and computing architectures within vehicles, and present promising application examples, as well as a novel business model enabled by this paradigm. The five core functionalities of vehicle computing, i.e., computation, communication, energy management, sensing, and data storage, and their related cutting-edge technologies are thoroughly explored. We conclude by discussing key technical challenges and promising opportunities within vehicle computing, aiming to inspire further academic and industry research in this innovative field.
The large models represented by ChatGPT have attracted a lot of attention from industry and academia for their excellent performance on text generation and semantic understanding tasks. The number of large model parameters has increased tens of thousands of times in three years and is still growing, which brings new challenges to storage systems. First, we analyze the storage challenges of large model training, pointing out that large model training has unique computation patterns, storage access patterns, and data characteristics, which makes traditional storage techniques inefficient in handling large model training tasks. Then, we describe three types of storage acceleration techniques and two types of fault-tolerant techniques. The storage acceleration techniques for large model training include: 1) distributed storage technique based on large model computation patterns designs the partitioning, storage, and transferring strategies of model data in distributed clusters based on the partitioning of large model computation tasks and the dependencies between computation tasks; 2) heterogeneous storage access pattern-aware technique for large model training develops data prefetching and transferring strategies among heterogeneous devices with the predictability of storage access patterns in large model training; 3) large model data reduction technique reduces the data size in the model training process according to the characteristics of large model data. The storage fault-tolerant techniques for large model training include: 1) parameter checkpointing technique stores the large model parameters to persistent storage devices; 2) redundant computation technique computes the same version of parameters repeatedly in multiple GPUs. Finally, we give the summary and suggestions for future research.
- First
- Prev
- 1
- 2
- 3
- 4
- 5
- Next
- Last
- Total 5 Pages
- To
- Go