ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 基于关键词精化和句法树的商品图像句子标注

1. 1(武汉大学计算机学院 武汉 430072); 2(华东交通大学软件学院 南昌 330013); 3(贵州师范大学大数据与计算机科学学院 贵阳 550001); 4(百度在线网络技术(北京)有限公司 北京 100085) (zhanghongbin@whu.edu.cn)
• 出版日期: 2016-11-01
• 基金资助:
国家自然科学基金项目(61133012)；国家社会科学基金重大招标项目(11&ZD189)；教育部人文社科基金项目(16YJAZH029)；江西省科技厅科技攻关项目(20121BBG70050,20142BBG70011)；江西省高校人文社科基金项目(XW1502,TQ1503)；江西省普通本科高校中青年教师发展计划访问学者专项资金；江西省社科规划项目(16TQ02) This work was supported by the National Natural Science Foundation of China (61133012), the National Social Science Major Tender Project (11&ZD189), the Humanity and Social Science Foundation of Ministry of Education (16YJAZH029), the Science and Technology Research Project of Jiangxi Provincial Department of Science and Technology (20121BBG70050,20142BBG70011), the Humanity and Social Science Foundation of Jiangxi Provincial Universities (XW1502,TQ1503), the Visiting Scholar Special Fund for the Development Plan of Young and Middle-Aged Teachers of General Universities in Jiangxi Province, and the Social Science Planning Project of Jiangxi Province (16TQ02).

### Caption Generation from Product Image Based on Tag Refinement and Syntactic Tree

Zhang Hongbin1,2, Ji Donghong1, Yin Lan1,3, Ren Yafeng1, Niu Zhengyu4

1. 1(Computer School, Wuhan University, Wuhan 430072); 2(School of Software, East China Jiaotong University, Nanchang 330013); 3(School of Big Data and Computer Science, Guizhou Normal University, Guiyang 550001); 4(Baidu Online Network Technology (Beijing) Co, Ltd, Beijing 100085)
• Online: 2016-11-01

Abstract: Automatic caption generation from product image is an interesting and challenging research task of image annotation. However, noisy words interference and inaccurate syntactic structures are the key problems that affect the research heavily. For the first problem, a novel idea of tag refinement (TR) is presented: absolute rank (AR) feature is applied to strengthen the key words weights. The process is called the first tag refinement. The semantic correlation score of each word is calculated in turn and the words that have the tightest semantic correlations with images content are summarized for caption generation. The process is called the second tag refinement. A novel natural language generation (NLG) algorithm named word sequence blocks building (WSBB) is designed accordingly to generate N gram word sequences. For the second problem, a novel idea of syntactic tree (ST) is presented: a complete syntactic tree is constructed recursively based on the N gram word sequences and predefined syntactic subtrees. Finally, sentence is generated by traversing all leaf nodes of the syntactic tree. Experimental results show both the tag refinement and the syntactic tree help to improve the annotation performance. More importantly, not only the semantic information compatibility but also the syntactic mode compatibility of the generated sentence is better retained simultaneously. Moreover, the sentence contains abundant semantic information as well as coherent syntactic structure.