ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (8): 1609-1618.doi: 10.7544/issn1000-1239.2018.20180181

Special Issue: 2018数据挖掘前沿进展专题

Previous Articles     Next Articles

The Automatic Identification of Butterfly Species

Xie Juanying1, Hou Qi1, Shi Yinghuan2, Lü Peng3, Jing Liping4, Zhuang Fuzhen5, Zhang Junping6, Tan Xiaoyang7,Xu Shengquan8   

  1. 1(School of Computer Science, Shaanxi Normal University, Xi’an 710119);2(Department of Computer Science & Technology, Nanjing University, Nanjing 210023);3(School of Computer Science & Technology, Shandong University of Finance and Economics, Jinan 250014);4(School of Computer & Information Technology, Beijing Jiaotong University, Beijing 100044);5(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);6(School of Computer Science, Fudan University, Shanghai 200433);7(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016);8(College of Life Sciences, Shaanxi Normal University, Xi’an 710119)
  • Online:2018-08-01

Abstract: The available butterfly image data sets comprise a few limited species, and the images in the data sets are always standard patterns without the images of butterflies in their living environments. To overcome the aforementioned limitations in the butterfly image data sets, we build a butterfly image data set composed of all species of butterflies in Monograph of Chinese butterflies with 4270 standard pattern images of 1176 butterfly species, and 1425 butterfly images from living environment of 111 species. We use the deep learning technique Faster R-CNN to develop an automatic butterfly identification system including butterfly position detection in images from living environment and species recognition. We delete those butterfly species with only one living environment image from data set, then partition the rest butterfly images from living environment into two subsets in half-half partition way, such that one is used as testing subset, and the other is respectively combined with all standard patterns of butterfly images or the standard patterns of butterfly images with the same species as the images from living environment to get two different training subsets. In order to construct the training subset for Faster R-CNN, nine methods are adopted to amplify the images in the training subset including the turning of up and down, and left and right, rotation with different angles, adding noises, blurring, and contrast ratio adjusting etc. Three kinds of network structure based prediction models are trained. The mAP (mean average prediction) criterion is used to evaluate the performance of the predictive models. The experimental results demonstrate that our Faster R-CNN based butterfly automatic identification system performs well. Its worst mAP is up to 60%, and it can simultaneously detect the positions of more than one butterflies in one image from living environment and can recognize their species as well.

Key words: butterflies, automatic identification, object detection, deep learning, classification

CLC Number: