ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (9): 2025-2032.doi: 10.7544/issn1000-1239.2015.20140694

• 软件技术 • 上一篇    下一篇


张兴忠1, 王运生1, 曾智2, 牛保宁1   

  1. 1(太原理工大学计算机科学与技术学院 太原 030024); 2(中国科学院自动化研究所 北京 100190) (
  • 出版日期: 2015-09-01
  • 基金资助: 

An Efficient Filtering-and-Refining Retrieval Method for Big Audio Data

Zhang Xingzhong1, Wang Yunsheng1, Zeng Zhi2, Niu Baoning1   

  1. 1(College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024); 2(Institute of Automation, Chinese Academy of Sciences, Beijing 100190)
  • Online: 2015-09-01

摘要: 针对互联网音频大数据的高速检索问题,结合音频指纹技术与过滤-提纯思想,提出一种面向音频大数据的鲁棒高效检索方法.在经典的Philips音频指纹基础上,提出了一种基于bag-of-features(BoF)的音频中间过滤指纹用于快速缩小检索范围,与Fibonacci Hashing检索相比提高检索速度约130倍;并设计了一种基于阈值的固定间隔抽样匹配方法,大幅减少匹配计算量,进一步提高检索速度可达140倍.实验结果显示:使用该方法在约10万首音频中对不同时长的音频片段进行批量检索,平均检索时间均小于1s;对音频进行MP3转换、重采样、随机剪切后再检索,召回率均在99.47%以上,理论准确率接近100%.

关键词: 音频大数据, 高速检索, Philips音频指纹, 过滤-提纯, 音频中间指纹

Abstract: Fast audio retrieval is demanding due to the high dimension nature and increasingly larger volume of audios in the Internet. Although audio fingerprinting can greatly reduce its dimension while keeping audio identifiable, the dimension of audio fingerprints is still too high to scale up for big audio data. The number of audios to be checked has to be small enough. This paper proposes a robust and fast audio retrieval method for big audio data, which combines audio fingerprinting with filtering-and-refining method. An audio middle fingerprint is devised with considerable small dimension for quickly filtering most likely audios, by applying bag-of-features(BoF) technique on the classical Philips audio fingerprint, which can reduce the search scope with a 130 times speed gain compared with the Fibonacci Hashing retrieval. A matching algorithm is developed to reduce the computational complexity by comparing the samples at fixed interval of two audios with thresholds, which results in a maximal speed gain of 140 times. Experimental results show that the average time of retrieving audio clips of different length in about 100000 audios is less than 1s. After applying MP3 conversion, resampling, and random shearing, the recall rates are all above 99.47%, and the theoretical accuracy is close to 100%.

Key words: big audio data, efficient retrieval, Philips audio fingerprint, filtering-and-refining, audio middle fingerprint