Large Language Model-Based Trusted Multi-Modal Recommendation

Yan Meng; Xu Cai; Huang Haibin; Zhao Wei; Guan Ziyu

doi:10.7544/issn1000-1239.202440433

Journal of Computer Research and Development > 2025 > Accepted Manuscript > DOI: 10.7544/issn1000-1239.202440433 CSTR: 32373.14.issn1000-1239.202440433

Yan Meng, Xu Cai, Huang Haibin, Zhao Wei, Guan Ziyu. Large Language Model-Based Trusted Multi-Modal Recommendation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440433

Citation:

PDF (1239 KB)

Large Language Model-Based Trusted Multi-Modal Recommendation

School of Computer Science and Technology, Xidian University, Xi’an 710126

Funds: This work was supported by the National Natural Science Foundation of China (62133012, 61936006, 62103314, 62073255, 62303366), the Key Research and Development Program of Shanxi Province (2020ZDLGY04-07), the Innovation Capability Support Program of Shanxi Province (2021TD-05), and the Natural Science Basic Research Program of Shaanxi Province (2023-JC-QN-0648).

More Information

Author Bio:
Yan Meng: born in 1997. PhD candidate. Her main research interests include multi-modal learning, recommendation, and knowledge graph

Xu Cai: born in 1994. PhD, associate professor, master supervisor. His main research interests are trustworthy learning, multi-view learning

Huang Haibin: born in 2000. Master candidate. His main research interests include recommendation and multi-modal learning

Zhao Wei: born in 1979. PhD, professor, PhD supervisor. His research interests include signal processing, pattern recognition, and intelligent systems

Guan Ziyu: born in 1982. PhD, professor, PhD supervisor. His main research interests include attributed graph mining, expertise modeling, and recommender systems
Received Date: June 02, 2024
Revised Date: August 30, 2024
Accepted Date: January 08, 2025
Available Online: January 08, 2025

Graphical Abstract

Abstract

Abstract

Sequential recommendation is centered on mining users' preferences and behavior patterns from their interaction sequences. Existing works have recognized the inadequacy of single-modal interaction data, and have utilized a large amount of multi-modal data, including item reviews, homepage images, and other sources, to complement interaction data and improve recommendation performance. However, these multi-modal data are often interspersed with unavoidable noise that may limit the exploration of personalized user preferences. While suppressing inter-modal inconsistent information can reduce noise interference, it is almost impossible to completely eliminate noise from user-generated multimodal content. To address the above challenges, we propose a Large language model-based Trusted multi-modal Recommendation (Large-TR) algorithm, which aims to provide the trustworthy recommendation in noisy multi-modal data scenarios. Specifically, the algorithm relies on the excellent natural language understanding capability of the large language model, which efficiently filters the noise in multi-modal data and achieves more accurate and detailed modelling of user preferences. Additionally, we design a trustworthy decision mechanism that dynamically evaluates the uncertainty of recommendation results and ensures the usability of recommendation results in high-risk scenarios. Experimental results on four widely used public datasets show that the algorithm proposed in this paper has better performance compared to other baseline algorithms. Our source code is available at https://github.com/ hhbray/Large-TR.
- sequential recommendation,
- multi-modal,
- user-generated content,
- trustworthy decision,
- large language model

FullText(HTML)

References (36)

References

[1]	Wu Bin, He Xiangnan, Sun Zhongchuan, et al. ATM: An attentive translation model for next-item recommendation[J]. IEEE Transactions on Industrial Informatics, 2019, 16(3): 1448−1459
[2]	McAuley J, Targett C, Shi Qinfeng, et al. Image-based recommendations on styles and substitutes [C] // Proc of the 38th Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2015: 43−52
[3]	Hidasi B, Karatzoglou A, Baltrunas L, et al. Session-based recommendations with recurrent neural networks[J]. arXiv preprint, arXiv: 1511.06939, 2015
[4]	Chang Jianxin, Gao Chen, Zheng Yu, et al. Sequential recommendation with graph neural networks [C] // Proc of the 44th Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2021: 378−387
[5]	Chen Xu, Chen Hanxiong, Xu Hongteng, et al. Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation [C] // Proc of the 42nd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2019: 765−774
[6]	Cheng Zhiyong, Chang Xiaojun, Zhu Lei, et al. MMALFM: Explainable recommendation by leveraging reviews and images[J]. ACM Transactions on Information Systems, 2019, 37(2): 1−28
[7]	Chu Weita, Tsai Y. A hybrid recommendation system considering visual information for predicting favorite restaurants[J]. World Wide Web, 2017, 20: 1313−1331 doi: 10.1007/s11280-017-0437-1
[8]	Qian Shengsheng, Zhang Tianzhu, Xu Changsheng. Multi-modal multi-view topic-opinion mining for social event analysis [C] // Proc of the 24th ACM Int Conf on Multimedia. New York: ACM, 2016: 2−11
[9]	Chen Changrui, Han Jungong, Debattista K. Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(8): 5595−5611 doi: 10.1109/TPAMI.2024.3367416
[10]	Silva A. Enhancing deep multimodal representation: Online, noise-robust and unsupervised learning [D]. Parkville, AU: University of Melbourne, 2022
[11]	He Ruining, McAuley J. Fusing similarity models with markov chains for sparse sequential recommendation [C] // Proc of the 16th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2016: 191−200
[12]	Kabbur S, Ning Xia, Karypis F: Factored item similarity models for top-n recommender systems [C] // Proc of the 19th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2013: 659−667
[13]	Rendle S. Factorization machines [C] // Proc of the 10th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2010: 995−1000
[14]	Zimdars A, Chickering M, Meek C. Using temporal data for making recommendations[J]. arXiv preprint, arXiv: 1301.2320, 2013
[15]	Zhou Kun, Wang Hui, Zhao Wayne Xin, et al. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization [C] // Proc of the 29th ACM Int Conf on Information & Knowledge Management. New York: ACM, 2020: 1893−1902
[16]	Xie Yueqi, Zhou Peilin, Kim S. Decoupled side information fusion for sequential recommendation [C] // Proc of the 45th Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2022: 1611−1621
[17]	He Ruining, McAuley J. VBPR: Visual bayesian personalized ranking from implicit feedback [C] // Proc of the 29th AAAI Conf on Artificial Intelligence. Palo Alto, CA, 2015: 144 - 150
[18]	Lei Chenyi, Liu Yong, Zhang Lingzi, et al. Semi: A sequential multi-modal information transfer network for e-commerce micro-video recommendations [C] // Proc of the 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining. New York: ACM, 2021: 3161−3171
[19]	Geng Shijie, Liu Shuchang, Fu Zuohui, et al. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5) [C] // Proc of the 16th ACM Conf on Recommender Systems. New York: ACM, 2022: 299−315
[20]	Lin Junyang, Men Rui, Yang An, et al. M6: Multi-modality-to-multi-modality multitask mega-transformer for unified pretraining [C] // Proc of the 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining. New York: ACM, 2021: 3251−3261
[21]	Dai Sunhao, Shao Ninglu, Zhao Haiyuan, et al. Uncovering chatgpt’s capabilities in recommender systems [C] // Proc of the 17th ACM Conf on Recommender Systems. New York: ACM, 2023: 1126−1132
[22]	Bao Keqin, Zhang Jizhi, Zhang Yang, et al. Tallrec: An effective and efficient tuning framework to align large language model with recommendation [C] // Proc of the 17th ACM Conf on Recommender Systems. New York: ACM, 2023: 1007−1014
[23]	Lin Xinyu, Wang Wenjie, Li Yongqi, et al. A multi-facet paradigm to bridge large language model and recommendation[J]. arXiv preprint, arXiv: 2310.06491, 2023
[24]	Mann B, Ryder N, Subbiah M, et al. Language models are few-shot learners[J]. arXiv preprint, arXiv: 2005.14165, 2020
[25]	Devlin J, Chang Mingwei, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint, arXiv: 1810.04805, 2018
[26]	Zhang Qi, Wang Jiawen, Huang Haoran, et al. Hashtag recommendation for multimodal microblog using co-attention network [C] // Proc of the 26th Int Joint Conf on Artificial Intelligence. San Francisco, CA: Morgan Kaufmann, 2017: 3420−3426
[27]	Radford A, Kim W, Hallacy C, et al. Learning transferable visual models from natural language supervision [C] // Proc of the 24th Int Conf on Machine Learning. New York: ACM, 2021: 8748−8763
[28]	Zhang Ying, Lu Huchuan. Deep cross-modal projection learning for image-text matching [C] // Proc of the 15th European Conf on Computer Vision. Berlin: Springer, 2018: 686−701
[29]	Xu Cai, Si Jiajun, Guan Ziyu, et al. Reliable conflictive multi-view learning[C] //Proc of the 38th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2024: 16129−16137
[30]	Jsang A. Subjective Logic: A Formalism for Reasoning Under Uncertainty[M]. Berlin: Springer, 2018
[31]	Kang Wangcheng, McAuley J. Self-attentive sequential recommendation [C] // Proc of the 18th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2018: 197−206
[32]	Krichene W, Rendle S. On sampled metrics for item recommendation [C] // Proc of the 26th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining. New York: ACM, 2020: 1748−1757
[33]	Tang Jiaxi, Wang Ke. Personalized top-n sequential recommendation via convolutional sequence embedding [C] // Proc of the 11th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2018: 565−573
[34]	Sun Fei, Liu Jun, Wu Jian, et al. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer [C] // Proc of the 28th ACM Int Conf on Information and Knowledge Management. New York: ACM, 2019: 1441−1450
[35]	Li Chenliang, Niu Xichuan, Luo Xiangyang, et al. A review-driven neural model for sequential recommendation[J]. arXiv preprint, arXiv: 1907.00590, 2019
[36]	Song Kunzhe, Sun Qingfeng, Xu Can, et al. Self-supervised multi-modal sequential recommendation[J]. arXiv preprint, arXiv: 2304.13277, 2023