Automatic caption generation from product image is an interesting and challenging research task of image annotation. However, noisy words interference and inaccurate syntactic structures are the key problems that affect the research heavily. For the first problem, a novel idea of tag refinement (TR) is presented: absolute rank (AR) feature is applied to strengthen the key words weights. The process is called the first tag refinement. The semantic correlation score of each word is calculated in turn and the words that have the tightest semantic correlations with images content are summarized for caption generation. The process is called the second tag refinement. A novel natural language generation (NLG) algorithm named word sequence blocks building (WSBB) is designed accordingly to generate N gram word sequences. For the second problem, a novel idea of syntactic tree (ST) is presented: a complete syntactic tree is constructed recursively based on the N gram word sequences and predefined syntactic subtrees. Finally, sentence is generated by traversing all leaf nodes of the syntactic tree. Experimental results show both the tag refinement and the syntactic tree help to improve the annotation performance. More importantly, not only the semantic information compatibility but also the syntactic mode compatibility of the generated sentence is better retained simultaneously. Moreover, the sentence contains abundant semantic information as well as coherent syntactic structure.