Citation: | Yan Meng, Xu Cai, Huang Haibin, Zhao Wei, Guan Ziyu. Large Language Model-Based Trusted Multi-Modal Recommendation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440433 |
Sequential recommendation is centered on mining users' preferences and behavior patterns from their interaction sequences. Existing works have recognized the inadequacy of single-modal interaction data, and have utilized a large amount of multi-modal data, including item reviews, homepage images, and other sources, to complement interaction data and improve recommendation performance. However, these multi-modal data are often interspersed with unavoidable noise that may limit the exploration of personalized user preferences. While suppressing inter-modal inconsistent information can reduce noise interference, it is almost impossible to completely eliminate noise from user-generated multimodal content. To address the above challenges, we propose a Large language model-based Trusted multi-modal Recommendation (Large-TR) algorithm, which aims to provide the trustworthy recommendation in noisy multi-modal data scenarios. Specifically, the algorithm relies on the excellent natural language understanding capability of the large language model, which efficiently filters the noise in multi-modal data and achieves more accurate and detailed modelling of user preferences. Additionally, we design a trustworthy decision mechanism that dynamically evaluates the uncertainty of recommendation results and ensures the usability of recommendation results in high-risk scenarios. Experimental results on four widely used public datasets show that the algorithm proposed in this paper has better performance compared to other baseline algorithms. Our source code is available at https://github.com/ hhbray/Large-TR.
[1] |
Wu Bin, He Xiangnan, Sun Zhongchuan, et al. ATM: An attentive translation model for next-item recommendation[J]. IEEE Transactions on Industrial Informatics, 2019, 16(3): 1448−1459
|
[2] |
McAuley J, Targett C, Shi Qinfeng, et al. Image-based recommendations on styles and substitutes [C] // Proc of the 38th Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2015: 43−52
|
[3] |
Hidasi B, Karatzoglou A, Baltrunas L, et al. Session-based recommendations with recurrent neural networks[J]. arXiv preprint, arXiv: 1511.06939, 2015
|
[4] |
Chang Jianxin, Gao Chen, Zheng Yu, et al. Sequential recommendation with graph neural networks [C] // Proc of the 44th Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2021: 378−387
|
[5] |
Chen Xu, Chen Hanxiong, Xu Hongteng, et al. Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation [C] // Proc of the 42nd Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2019: 765−774
|
[6] |
Cheng Zhiyong, Chang Xiaojun, Zhu Lei, et al. MMALFM: Explainable recommendation by leveraging reviews and images[J]. ACM Transactions on Information Systems, 2019, 37(2): 1−28
|
[7] |
Chu Weita, Tsai Y. A hybrid recommendation system considering visual information for predicting favorite restaurants[J]. World Wide Web, 2017, 20: 1313−1331 doi: 10.1007/s11280-017-0437-1
|
[8] |
Qian Shengsheng, Zhang Tianzhu, Xu Changsheng. Multi-modal multi-view topic-opinion mining for social event analysis [C] // Proc of the 24th ACM Int Conf on Multimedia. New York: ACM, 2016: 2−11
|
[9] |
Chen Changrui, Han Jungong, Debattista K. Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(8): 5595−5611 doi: 10.1109/TPAMI.2024.3367416
|
[10] |
Silva A. Enhancing deep multimodal representation: Online, noise-robust and unsupervised learning [D]. Parkville, AU: University of Melbourne, 2022
|
[11] |
He Ruining, McAuley J. Fusing similarity models with markov chains for sparse sequential recommendation [C] // Proc of the 16th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2016: 191−200
|
[12] |
Kabbur S, Ning Xia, Karypis F: Factored item similarity models for top-n recommender systems [C] // Proc of the 19th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2013: 659−667
|
[13] |
Rendle S. Factorization machines [C] // Proc of the 10th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2010: 995−1000
|
[14] |
Zimdars A, Chickering M, Meek C. Using temporal data for making recommendations[J]. arXiv preprint, arXiv: 1301.2320, 2013
|
[15] |
Zhou Kun, Wang Hui, Zhao Wayne Xin, et al. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization [C] // Proc of the 29th ACM Int Conf on Information & Knowledge Management. New York: ACM, 2020: 1893−1902
|
[16] |
Xie Yueqi, Zhou Peilin, Kim S. Decoupled side information fusion for sequential recommendation [C] // Proc of the 45th Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2022: 1611−1621
|
[17] |
He Ruining, McAuley J. VBPR: Visual bayesian personalized ranking from implicit feedback [C] // Proc of the 29th AAAI Conf on Artificial Intelligence. Palo Alto, CA, 2015: 144 - 150
|
[18] |
Lei Chenyi, Liu Yong, Zhang Lingzi, et al. Semi: A sequential multi-modal information transfer network for e-commerce micro-video recommendations [C] // Proc of the 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining. New York: ACM, 2021: 3161−3171
|
[19] |
Geng Shijie, Liu Shuchang, Fu Zuohui, et al. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5) [C] // Proc of the 16th ACM Conf on Recommender Systems. New York: ACM, 2022: 299−315
|
[20] |
Lin Junyang, Men Rui, Yang An, et al. M6: Multi-modality-to-multi-modality multitask mega-transformer for unified pretraining [C] // Proc of the 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining. New York: ACM, 2021: 3251−3261
|
[21] |
Dai Sunhao, Shao Ninglu, Zhao Haiyuan, et al. Uncovering chatgpt’s capabilities in recommender systems [C] // Proc of the 17th ACM Conf on Recommender Systems. New York: ACM, 2023: 1126−1132
|
[22] |
Bao Keqin, Zhang Jizhi, Zhang Yang, et al. Tallrec: An effective and efficient tuning framework to align large language model with recommendation [C] // Proc of the 17th ACM Conf on Recommender Systems. New York: ACM, 2023: 1007−1014
|
[23] |
Lin Xinyu, Wang Wenjie, Li Yongqi, et al. A multi-facet paradigm to bridge large language model and recommendation[J]. arXiv preprint, arXiv: 2310.06491, 2023
|
[24] |
Mann B, Ryder N, Subbiah M, et al. Language models are few-shot learners[J]. arXiv preprint, arXiv: 2005.14165, 2020
|
[25] |
Devlin J, Chang Mingwei, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint, arXiv: 1810.04805, 2018
|
[26] |
Zhang Qi, Wang Jiawen, Huang Haoran, et al. Hashtag recommendation for multimodal microblog using co-attention network [C] // Proc of the 26th Int Joint Conf on Artificial Intelligence. San Francisco, CA: Morgan Kaufmann, 2017: 3420−3426
|
[27] |
Radford A, Kim W, Hallacy C, et al. Learning transferable visual models from natural language supervision [C] // Proc of the 24th Int Conf on Machine Learning. New York: ACM, 2021: 8748−8763
|
[28] |
Zhang Ying, Lu Huchuan. Deep cross-modal projection learning for image-text matching [C] // Proc of the 15th European Conf on Computer Vision. Berlin: Springer, 2018: 686−701
|
[29] |
Xu Cai, Si Jiajun, Guan Ziyu, et al. Reliable conflictive multi-view learning[C] //Proc of the 38th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2024: 16129−16137
|
[30] |
Jsang A. Subjective Logic: A Formalism for Reasoning Under Uncertainty[M]. Berlin: Springer, 2018
|
[31] |
Kang Wangcheng, McAuley J. Self-attentive sequential recommendation [C] // Proc of the 18th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2018: 197−206
|
[32] |
Krichene W, Rendle S. On sampled metrics for item recommendation [C] // Proc of the 26th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining. New York: ACM, 2020: 1748−1757
|
[33] |
Tang Jiaxi, Wang Ke. Personalized top-n sequential recommendation via convolutional sequence embedding [C] // Proc of the 11th ACM Int Conf on Web Search and Data Mining. New York: ACM, 2018: 565−573
|
[34] |
Sun Fei, Liu Jun, Wu Jian, et al. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer [C] // Proc of the 28th ACM Int Conf on Information and Knowledge Management. New York: ACM, 2019: 1441−1450
|
[35] |
Li Chenliang, Niu Xichuan, Luo Xiangyang, et al. A review-driven neural model for sequential recommendation[J]. arXiv preprint, arXiv: 1907.00590, 2019
|
[36] |
Song Kunzhe, Sun Qingfeng, Xu Can, et al. Self-supervised multi-modal sequential recommendation[J]. arXiv preprint, arXiv: 2304.13277, 2023
|