Dual-Perspective Multi-Level Cross-Modal Recommendation Based on Large Language Models

Li Yiming; Yu Yaxin; Yu Zhisheng; Si Yiting; Ye Yusong

doi:10.7544/issn1000-1239.202550040

Li Yiming, Yu Yaxin, Yu Zhisheng, Si Yiting, Ye Yusong. Dual-Perspective Multi-Level Cross-Modal Recommendation Based on Large Language Models[J]. Journal of Computer Research and Development, 2026, 63(1): 147-161. DOI: 10.7544/issn1000-1239.202550040

Citation:

Dual-Perspective Multi-Level Cross-Modal Recommendation Based on Large Language Models

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Multimodal recommendation systems aim to deliver more accurate and personalized recommendations by leveraging diverse data modalities such as text and images. Despite their potential, existing approaches face several critical challenges. 1) Feature distortion. The input embeddings are processed by relatively small pre-trained language models and deep convolutional neural networks, which may lead to inaccurate and incomplete feature representations. 2) Single encoding perspective. The multimodal encoding layers of current models only consider encoding from a single perspective that is either memory-based or expansion-based, which limits representational capacity and leads to information loss. 3) Poor multimodal alignment. Embeddings from different modalities are distributed in distinct and heterogeneous semantic spaces, and need to be effectively mapped into a shared semantic space to achieve accurate alignment. However, existing methods typically rely on simple operations such as direct product of behavioral information, which cannot adequately capture the complex and high-order relationships among modalities. As a result, precise alignment across multiple modalities is difficult to achieve. To address these issues, we propose a novel model named DPRec. This model considers encoding from both memory and expansion perspectives and introduces hypergraphs for multi-level precise cross-modal alignment. The model is evaluated through extensive experiments on three real-world datasets. Experimental results confirm the effectiveness of DPRec, demonstrating its superiority over state-of-the-art approaches in improving recommendation accuracy.

FullText(HTML)

References (40)

Cited By

Turn off MathJax

Article Contents

Dual-Perspective Multi-Level Cross-Modal Recommendation Based on Large Language Models

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content