面向少样本学习的增强式跨模态微调框架

丁琳琳; 李佳林; 卢晶; 李默; 王嫒娜

doi:10.7544/issn1000-1239.202550755

面向少样本学习的增强式跨模态微调框架

Enhanced Cross-Modal Fine-Tuning Framework for Few-Shot Learning

摘要

摘要: 近年来，大语言模型（large language model，LLM）在多元时间序列预测中展现出潜力，但现有基于大语言模型的跨模态方法在少样本学习和长期预测中仍面临模态对齐不充分与泛化能力受限的挑战。为解决上述问题，提出一种集成动态注意力与层次蒸馏的增强式跨模态微调框架（enhanced cross-modal LLM fine-tuning，Enhanced CALF）。该框架首先构建动态注意力跨模态匹配模块，引入自适应权重生成与对齐度预测机制，能够依据数据分布特性与模态间关联强度，动态调整注意力权重分配，以提升跨模态特征对齐的精准度；其次，构建多层次知识蒸馏与对比学习模块，通过在Transformer各层级引入投影映射，并结合层级蒸馏损失与自适应温度对比损失，实现层次化的特征传递，涵盖从局部细节到整体语义的各个层面，以提升跨模态表示的一致性；最后，设计自适应对齐机制，通过模态间对齐度分数的量化评估，动态调整总损失函数的权重，从而优化模型训练过程。在7个真实数据集上的实验表明，Enhanced CALF在长期预测与少样本学习任务中优于现有基线模型。

Abstract: In recent years, large language model (LLM) has shown potential in multivariate time series forecasting. However, existing cross-modal methods based on LLM still face the challenges of insufficient modality alignment and limited generalization capability in few-shot learning and long-term forecasting tasks. To address the above issues, we propose an enhanced cross-modal LLM fine-tuning framework, termed Enhanced CALF, which integrates dynamic attention and hierarchical distillation. Firstly, the framework constructs a dynamic attention cross-modal matching module, introducing adaptive weight generation and alignment prediction mechanisms. This module can dynamically adjust the allocation of attention weights based on data distribution characteristics and inter-modality correlation strengths, thereby improving the precision of cross-modal feature alignment. Secondly, a multi-level knowledge distillation and contrastive learning module is built. By introducing projection mappings at each Transformer layer and combining hierarchical distillation loss with an adaptive temperature contrastive loss, it achieves hierarchical feature transmission covering aspects from local details to global semantics, aiming to enhance the consistency of cross-modal representations. Finally, an adaptive alignment mechanism is designed, which dynamically adjusts the weights of the total loss function through quantitative evaluation of inter-modality alignment scores, thereby optimizing the model training process. Experiments on seven real-world datasets demonstrate that Enhanced CALF outperforms existing baseline models in long-term forecasting and few-shot learning tasks.

HTML全文

参考文献(28)

施引文献

资源附件(0)