蝴蝶种类自动识别研究

谢娟英; 侯琦; 史颖欢; 吕鹏; 景丽萍; 庄福振; 张军平; 谭晓阳; 许升全

doi:10.7544/issn1000-1239.2018.20180181

摘要: 针对现有蝴蝶识别研究中所用数据集蝴蝶种类偏少，且只含有蝴蝶标本照片、不含生态环境中蝴蝶照片的问题，发布了一个同时包含标本照片和生态照片的蝴蝶图像数据集，其中标本照片包含全部中国蝶类志蝴蝶种类，共计4270张照片、1176种，蝴蝶生态环境下照片1425张、111种.提出基于深度学习技术Faster R-CNN的蝴蝶种类自动识别系统，包括生态照片中蝴蝶位置的自动检测和物种鉴定.实验去除只含有单张生态照片的蝴蝶种类，对剩余的蝴蝶生态照片进行5-5划分，构造2种不同训练数据集：一半生态照片+全部模式照片、一半生态照片+对应种类模式照片；训练3种不同网络结构的蝴蝶自动识别系统，以平均精度均值(mean average precision, mAP)为评价指标，采用上下、左右翻转、不同角度旋转、加噪、不同程度模糊、对比度升降等9种方式扩充训练集.实验结果表明，基于Faster R-CNN深度学习框架的蝴蝶自动识别系统对生态环境中的蝴蝶照片能实现其中蝴蝶位置的自动检测和物种识别，模型的mAP最低值接近60%，并能同时检测出生态照中的多只蝴蝶和完成物种识别.

Abstract: The available butterfly image data sets comprise a few limited species, and the images in the data sets are always standard patterns without the images of butterflies in their living environments. To overcome the aforementioned limitations in the butterfly image data sets, we build a butterfly image data set composed of all species of butterflies in Monograph of Chinese butterflies with 4270 standard pattern images of 1176 butterfly species, and 1425 butterfly images from living environment of 111 species. We use the deep learning technique Faster R-CNN to develop an automatic butterfly identification system including butterfly position detection in images from living environment and species recognition. We delete those butterfly species with only one living environment image from data set, then partition the rest butterfly images from living environment into two subsets in half-half partition way, such that one is used as testing subset, and the other is respectively combined with all standard patterns of butterfly images or the standard patterns of butterfly images with the same species as the images from living environment to get two different training subsets. In order to construct the training subset for Faster R-CNN, nine methods are adopted to amplify the images in the training subset including the turning of up and down, and left and right, rotation with different angles, adding noises, blurring, and contrast ratio adjusting etc. Three kinds of network structure based prediction models are trained. The mAP (mean average prediction) criterion is used to evaluate the performance of the predictive models. The experimental results demonstrate that our Faster R-CNN based butterfly automatic identification system performs well. Its worst mAP is up to 60%, and it can simultaneously detect the positions of more than one butterflies in one image from living environment and can recognize their species as well.

蝴蝶种类自动识别研究

The Automatic Identification of Butterfly Species