Semi-supervised Open Vocabulary Multi-label Learning Based on Graph Prompting

Li Zhongnian; Huangfu Zhiyu; Yang Kaijie; Ying Peng; Sun Tongfeng; Xu Xinzheng

doi:10.7544/issn1000-1239.202440123

Journal of Computer Research and Development > 2025 > 62(2): 432-442. > DOI: 10.7544/issn1000-1239.202440123 CSTR: 32373.14.issn1000-1239.202440123

Li Zhongnian, Huangfu Zhiyu, Yang Kaijie, Ying Peng, Sun Tongfeng, Xu Xinzheng. Semi-supervised Open Vocabulary Multi-label Learning Based on Graph Prompting[J]. Journal of Computer Research and Development, 2025, 62(2): 432-442. DOI: 10.7544/issn1000-1239.202440123

Citation:

PDF (4165 KB)

Semi-supervised Open Vocabulary Multi-label Learning Based on Graph Prompting

Li Zhongnian^{1, 3,},
Huangfu Zhiyu¹,
Yang Kaijie¹,
Ying Peng¹,
Sun Tongfeng^{1, 3},
Xu Xinzheng^{1, 2, 3, ,}

1.
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116
2.
State Key Laboratory of CAD & CG (Zhejiang University), Hangzhou 310058
3.
Mine Digitization Engineering Research Center (China University of Mining and Technology), Ministry of Education, Xuzhou, Jiangsu 221116

Funds: This work was supported by the National Natural Science Foundation of China (61976217,62306320), the Natural Science Foundation of Jiangsu Province (BK20231063), the Open Project Program of the State Key Laboratory of CAD&CG of Zhejiang University (A2424), and the Graduate Innovation Program of China University of Mining and Technology (2024WLJCRCZL262).

More Information

Author Bio:
Li Zhongnian: born in 1990. PhD, lecturer. Member of CCF. His main research interests include data mining, machine learning, and information system

Huangfu Zhiyu: born in 2000. Master. His main research interests include artificial intelligence and machine learning

Yang Kaijie: born in 2001. Master. His main research interests include artificial intelligence and machine learning

Ying Peng: born in 2001. PhD. Student member of CCF. His main research interests include artificial intelligence, data mining, and machine learning

Sun Tongfeng: born in 1977. PhD, associate professor. Member of CCF. His main research interests include artificial intelligence, machine learning, and data mining

Xu Xinzheng: born in 1980. PhD, professor, PhD supervisor. Senior member of CCF. His main research interests include artificial intelligence, machine learning, and medical image processing
Received Date: February 29, 2024
Revised Date: October 30, 2024
Accepted Date: November 12, 2024
Available Online: November 14, 2024

Graphical Abstract

Abstract

Abstract

Semi-supervised multi-label learning employs labeled and unlabeled data to train a model, which effectively achieves good results and reduces the labeling cost of multi-label data. Therefore, semi-supervised multi-label learning has attracted many researchers dedicated to this field. However, in the semi-supervised annotation process, due to the large number of labels, it is a common situation that some labels lack of samples, and these labels are called open vocabulary. It is difficult for the model to learn the label information of the open vocabulary, which leads to the degradation of its performance. To address the above problem, we propose a semi-supervised open vocabulary multi-label learning method based on graph prompting. Specifically, this method uses a graph neural network via prompt to fine-tune the pre-trained model and explore the relationship between open vocabulary and supervised samples. By using images and text, we construct a graph neural network as the input of text for the pre-trained model. Furthermore, by leveraging the generalization ability of the pre-trained model on open vocabulary, pseudo-labels are generated for unsupervised samples. Then we use pseudo-labels to train the classification layer and to enable the model to achieve better performance in classifying open vocabulary. Experimental results on multiple benchmark datasets, including VOC, COCO, CUB, and NUS, consistently demonstrate that the proposed method outperforms existing methods and achieves state-of-the-art performance.
- semi-supervised multi-label learning,
- pre-trained model,
- graph neural network,
- open vocabulary,
- prompt

FullText(HTML)

References (33)

References

[1]	Tian Yingjie, Yu Xiaotong, Fu Saiji. Partial label learning: Taxonomy, analysis and outlook[J]. Neural networks, 2023, 161(1): 708−734
[2]	Rastogi R, Kumar S. Discriminatory label-specific weights for multi-label learning with missing labels[J]. Neural Processing Letters, 2023, 55(2): 1397−1431 doi: 10.1007/s11063-022-10945-z
[3]	李宇峰,黄圣君,周志华. 一种基于正则化的半监督多标记学习方法[J]. 计算机研究与发展,2012,49(6):1272−1278 Li Yufeng, Huang Shengjun, Zhou Zhihua. Regularized semi-supervised multi-label learning[J]. Journal of Computer Research and Development, 2012, 49(6): 1272−1278 (in Chinese)
[4]	Lai Zhengfeng, Wang Chao, Cheung S C , et al. SAR: Self-adaptive refinement on pseudo labels for multiclass-imbalanced semi-supervised learning[C]//Proc of IEEE/CVF Conf on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2022: 4091−4100
[5]	张永,陈蓉蓉,张晶. 基于交叉熵的安全Tri-training算法[J]. 计算机研究与发展,2021,58(1):60−69 doi: 10.7544/issn1000-1239.2021.20190838 Zhang Yong, Chen Rongrong, Zhang Jing. Safe tri-training algorithm based on cross entropy[J]. Journal of Computer Research and Development, 2021, 58(1): 60−69 (in Chinese) doi: 10.7544/issn1000-1239.2021.20190838
[6]	Fan Yue, Kukleva A, Dai Dengxin, et al. Revisiting consistency regularization for semi-supervised learning[J]. International Journal of Computer Vision, 2023, 131(3): 626−643 doi: 10.1007/s11263-022-01723-4
[7]	Qiu Ye, Gong Xiaolong, Ma Zhiyi, et al. MixLab: An informative semi-supervised method for multi-label classification[C]//Proc of the 9th CCF Int Conf on Natural Language Processing and Chinese Computing. Berlin: Springer, 2020: 506−518
[8]	Huang Teng, Jia Binbin, Zhang Minling. Progressive label propagation for semi-supervised multi-dimensional classification[C]//Proc of the 32nd Int Joint Conf on Artificial Intelligence. San Francisco, CA: Margan Kaufmann, 2023: 3821−3829
[9]	Liu Xuan, Shi Tianyi, Zhou Guohui, et al. Emotion classification for short texts: An improved multi-label method[J]. Humanities and Social Sciences Communications, 2023, 10(1): 1−9
[10]	Zhang Shuo, Zhang Jiaojiao, Tian Biao, et al. Multi-modal contrastive mutual learning and pseudo-label re-learning for semi-supervised medical image segmentation[J]. Medical Image Analysis, 2022, 83(1): 102656−102656
[11]	Song Kun, Ma Huimin, Zou Bochao, et al. FD-Align: Feature discrimination alignment for fine-tuning pre-trained models in few-shot learning[C]//Proc of the 37th Conf on Neural Information Processing Systems. New York: Curran Associates, 2023: 43579−43592
[12]	Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//Proc of the 38th Int Conf on Machine Learning. New York: ACM, 2021: 8748−8763
[13]	Mo S, Kim M, Lee K, et al. S-CLIP: Semi-supervised vision-language learning using few specialist captions[C]//Proc of the 37th Conf on Neural Information Processing Systems. New York: Curran Associates, 2023: 61187−61212
[14]	Wang Ao, Chen Hui, Lin Zijia, et al. Hierarchical prompt learning using CLIP for multi-label classification with single positive labels[C]//Proc of the 31st ACM Int Conf on Multimedia. New York: ACM, 2023: 5594−5604
[15]	Zhang Youcai, Huang Xinyu, Ma Jinyu, et al. Recognize anything: A strong image tagging model[J]. arXiv preprint, arXiv: 2306.03514, 2023
[16]	Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): 333−359 doi: 10.1007/s10994-011-5256-5
[17]	Liu Zhunga, Pan Quan, Dezert J, et al. Classifier fusion with contextual reliability evaluation[J]. IEEE Transactions on Cybernetics, 2017, 48(5): 1605−1618
[18]	Zhao Xiangyun, Vemulapalli R, Mansfield P A, et al. Contrastive learning for label efficient semantic segmentation[C]//Proc of the 18th IEEE/CVF Int Conf on Computer Vision. Los Alamitos, CA: IEEE Computer Society, 2021: 10623−10633
[19]	Wang Lichen, Liu Yunyu, Qin Can, et al. Dual relation semi-supervised multi-label learning[C]//Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 6227−6234
[20]	Xie Mingkun, Huang Shengjun. Partial multi-label learning with noisy label identification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7): 3676−3687
[21]	Duan Yue, Zhao Zhen, Qi Lei, et al. Towards semi-supervised learning with non-random missing labels[C]//Proc of the 19th IEEE/CVF Int Conf on Computer Vision. Los Alamitos, CA: IEEE Computer Society, 2023: 16121−16131
[22]	Zhou Zhihua. Open-environment machine learning[J]. National Science Review, 2022, 9(8): nwac123 doi: 10.1093/nsr/nwac123
[23]	Eichstaedt J C, Kern M L, Yaden D B, et al. Closed-and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations[J]. Psychological Methods, 2021, 26(4): 398−398 doi: 10.1037/met0000349
[24]	Gu Xinyue, Lin T Y, Kuo Weicheng, et al. Open-vocabulary object detection via vision and language knowledge distillation[J]. arXiv preprint, arXiv: 2104.13921, 2021
[25]	Zhang Hao, Li Feng, Zou Xueyan, et al. A simple framework for open-vocabulary segmentation and detection[C]//Proc of the 19th IEEE/CVF Int Conf on Computer Vision. Los Alamitos, CA: IEEE Computer Society, 2023: 1020−1031
[26]	He Sunan, Guo Taian, Dai Tao, et al. Open-vocabulary multi-label classification via multi-modal knowledge transfer[C]//Proc of the 37th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2023: 808−816
[27]	Li Junnan, Li Dongxu, Xiong Caiming, et al. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proc of the 39th Int Conf on Machine Learning. New York: ACM, 2022: 12888−12900
[28]	Zhou Kaiyang, Yang Jingkang, Loy C C, et al. Learning to prompt for vision-language models[J]. International Journal of Computer Vision, 2022, 130(9): 2337−2348 doi: 10.1007/s11263-022-01653-1
[29]	Zhou Kaiyang, Yang Jingkang, Loy C C, et al. Conditional prompt learning for vision-language models[C]//Proc of IEEE/CVF Conf on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2022: 16816−16825
[30]	Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint, arXiv: 1412.6980, 2014
[31]	Ridnik T, Ben-Baruch E, Zamir N, et al. Asymmetric loss for multi-label classification[C]//Proc of IEEE/CVF Conf on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2021: 82−91
[32]	Zhu Ke, Wu Jianxin. Residual attention: A simple but effective method for multi-label recognition[C]//Proc of IEEE/CVF Conf on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2021: 184−193
[33]	Lanchantin J, Wang Tianlu, Ordonez V, et al. General multi-label image classification with transformers[C]//Proc of IEEE/CVF Conf on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2021: 16478−16488

[1]	Zhang Chunyun, Zhao Hongyan, Deng Jiqin, Cui Chaoran, Dong Xiaolin, Chen Zhumin. Category Adversarial Joint Learning Method for Cross-Prompt Automated Essay Scoring[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440266
[2]	Li Yinqiang, Lan Tian, Liu Yao, Xiang Feiyang, Sun Lichun, Du Zhihan, Liu Qiao. Term-Prompted and Dual-Path Text Generation for Aspect Sentiment Triplet Extraction[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330838
[3]	Zhu Rongjiang, Shi Yuheng, Yang Shuo, Wang Ziyi, Wu Xinxiao. Open-Vocabulary Multi-Label Action Recognition Guided by LLM Knowledge[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440522
[4]	Cui Yuanning, Sun Zequn, Hu Wei. A Pre-trained Universal Knowledge Graph Reasoning Model Based on Rule Prompts[J]. Journal of Computer Research and Development, 2024, 61(8): 2030-2044. DOI: 10.7544/issn1000-1239.202440133
[5]	Lai Peiyuan, Li Cheng, Wang Zenghui, Wang Changdong, Liao Dezhang. Traffic Flow Prediction Based on Graph Prompt-Finetuning[J]. Journal of Computer Research and Development, 2024, 61(8): 2020-2029. DOI: 10.7544/issn1000-1239.202440113
[6]	Wu Di, Zhao Yanyan, Qin Bing. A Joint Emotion-Cognition Based Approach for Moral Judgement[J]. Journal of Computer Research and Development, 2024, 61(5): 1193-1205. DOI: 10.7544/issn1000-1239.202330812
[7]	Wang Mengru, Yao Yunzhi, Xi Zekun, Zhang Jintian, Wang Peng, Xu Ziwen, Zhang Ningyu. Safety Analysis of Large Model Content Generation Based on Knowledge Editing[J]. Journal of Computer Research and Development, 2024, 61(5): 1143-1155. DOI: 10.7544/issn1000-1239.202330965
[8]	Jin Dongming, Jin Zhi, Chen Xiaohong, Wang Chunhui. ChatModeler: A Human-Machine Collaborative and Iterative Requirements Elicitation and Modeling Approach via Large Language Models[J]. Journal of Computer Research and Development, 2024, 61(2): 338-350. DOI: 10.7544/issn1000-1239.202330746
[9]	Liu Xinghong, Zhou Yi, Zhou Tao, Qin Jie. Self-Paced Learning for Open-Set Domain Adaptation[J]. Journal of Computer Research and Development, 2023, 60(8): 1711-1726. DOI: 10.7544/issn1000-1239.202330210
[10]	Du Zhijuan, Du Zhirong, Wang Lu. Open Knowledge Graph Representation Learning Based on Neighbors and Semantic Affinity[J]. Journal of Computer Research and Development, 2019, 56(12): 2549-2561. DOI: 10.7544/issn1000-1239.2019.20190648