高级检索
    张兴忠, 王运生, 曾智, 牛保宁. 一种高效过滤-提纯音频大数据检索方法[J]. 计算机研究与发展, 2015, 52(9): 2025-2032. DOI: 10.7544/issn1000-1239.2015.20140694
    引用本文: 张兴忠, 王运生, 曾智, 牛保宁. 一种高效过滤-提纯音频大数据检索方法[J]. 计算机研究与发展, 2015, 52(9): 2025-2032. DOI: 10.7544/issn1000-1239.2015.20140694
    Zhang Xingzhong, Wang Yunsheng, Zeng Zhi, Niu Baoning. An Efficient Filtering-and-Refining Retrieval Method for Big Audio Data[J]. Journal of Computer Research and Development, 2015, 52(9): 2025-2032. DOI: 10.7544/issn1000-1239.2015.20140694
    Citation: Zhang Xingzhong, Wang Yunsheng, Zeng Zhi, Niu Baoning. An Efficient Filtering-and-Refining Retrieval Method for Big Audio Data[J]. Journal of Computer Research and Development, 2015, 52(9): 2025-2032. DOI: 10.7544/issn1000-1239.2015.20140694

    一种高效过滤-提纯音频大数据检索方法

    An Efficient Filtering-and-Refining Retrieval Method for Big Audio Data

    • 摘要: 针对互联网音频大数据的高速检索问题,结合音频指纹技术与过滤-提纯思想,提出一种面向音频大数据的鲁棒高效检索方法.在经典的Philips音频指纹基础上,提出了一种基于bag-of-features(BoF)的音频中间过滤指纹用于快速缩小检索范围,与Fibonacci Hashing检索相比提高检索速度约130倍;并设计了一种基于阈值的固定间隔抽样匹配方法,大幅减少匹配计算量,进一步提高检索速度可达140倍.实验结果显示:使用该方法在约10万首音频中对不同时长的音频片段进行批量检索,平均检索时间均小于1s;对音频进行MP3转换、重采样、随机剪切后再检索,召回率均在99.47%以上,理论准确率接近100%.

       

      Abstract: Fast audio retrieval is demanding due to the high dimension nature and increasingly larger volume of audios in the Internet. Although audio fingerprinting can greatly reduce its dimension while keeping audio identifiable, the dimension of audio fingerprints is still too high to scale up for big audio data. The number of audios to be checked has to be small enough. This paper proposes a robust and fast audio retrieval method for big audio data, which combines audio fingerprinting with filtering-and-refining method. An audio middle fingerprint is devised with considerable small dimension for quickly filtering most likely audios, by applying bag-of-features(BoF) technique on the classical Philips audio fingerprint, which can reduce the search scope with a 130 times speed gain compared with the Fibonacci Hashing retrieval. A matching algorithm is developed to reduce the computational complexity by comparing the samples at fixed interval of two audios with thresholds, which results in a maximal speed gain of 140 times. Experimental results show that the average time of retrieving audio clips of different length in about 100000 audios is less than 1s. After applying MP3 conversion, resampling, and random shearing, the recall rates are all above 99.47%, and the theoretical accuracy is close to 100%.

       

    /

    返回文章
    返回