• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于维度分组降维的高维数据近似k近邻查询

李松, 胡晏铭, 郝晓红, 张丽平, 郝忠孝

李松, 胡晏铭, 郝晓红, 张丽平, 郝忠孝. 基于维度分组降维的高维数据近似k近邻查询[J]. 计算机研究与发展, 2021, 58(3): 609-623. DOI: 10.7544/issn1000-1239.2021.20200285
引用本文: 李松, 胡晏铭, 郝晓红, 张丽平, 郝忠孝. 基于维度分组降维的高维数据近似k近邻查询[J]. 计算机研究与发展, 2021, 58(3): 609-623. DOI: 10.7544/issn1000-1239.2021.20200285
Li Song, Hu Yanming, Hao Xiaohong, Zhang Liping, Hao Zhongxiao. Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing[J]. Journal of Computer Research and Development, 2021, 58(3): 609-623. DOI: 10.7544/issn1000-1239.2021.20200285
Citation: Li Song, Hu Yanming, Hao Xiaohong, Zhang Liping, Hao Zhongxiao. Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing[J]. Journal of Computer Research and Development, 2021, 58(3): 609-623. DOI: 10.7544/issn1000-1239.2021.20200285
李松, 胡晏铭, 郝晓红, 张丽平, 郝忠孝. 基于维度分组降维的高维数据近似k近邻查询[J]. 计算机研究与发展, 2021, 58(3): 609-623. CSTR: 32373.14.issn1000-1239.2021.20200285
引用本文: 李松, 胡晏铭, 郝晓红, 张丽平, 郝忠孝. 基于维度分组降维的高维数据近似k近邻查询[J]. 计算机研究与发展, 2021, 58(3): 609-623. CSTR: 32373.14.issn1000-1239.2021.20200285
Li Song, Hu Yanming, Hao Xiaohong, Zhang Liping, Hao Zhongxiao. Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing[J]. Journal of Computer Research and Development, 2021, 58(3): 609-623. CSTR: 32373.14.issn1000-1239.2021.20200285
Citation: Li Song, Hu Yanming, Hao Xiaohong, Zhang Liping, Hao Zhongxiao. Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing[J]. Journal of Computer Research and Development, 2021, 58(3): 609-623. CSTR: 32373.14.issn1000-1239.2021.20200285

基于维度分组降维的高维数据近似k近邻查询

基金项目: 国家自然科学基金项目(61872105);黑龙江省自然科学基金项目(LH2020F047);黑龙江省留学归国人员科学基金项目(LC2018030);黑龙江省教育厅科学技术研究项目(12531z004)
详细信息
  • 中图分类号: TP391

Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing

Funds: This work was supported by the National Natural Science Foundation of China (61872105), the Natural Science Foundation of Heilongjiang Province of China (LH2020F047), the Scientific Research Foundation for Returned Scholars Abroad of Heilongjiang Province of China (LC2018030), and the Technology Research Project of Heilongjiang Provincial Education Department (12531z004).
  • 摘要: 针对现有的高维空间近似k近邻查询算法在数据降维时不考虑维度间关联关系的问题, 首次提出了基于维度间关联规则进行维度分组降维的方法.该方法通过将相关联维度分成一组进行降维来减少数据信息的损失, 同时针对Hash降维后产生的数据偏移问题, 设置了符号位并基于符号位的特性对结果进行精炼; 为提高维度间关联规则挖掘的效率, 提出了一种新的基于UFP-tree的频繁项集挖掘算法.通过将数据映射成二进制编码来进行查询, 有效地提高了近似k近邻查询效率, 同时基于信息熵筛选编码函数, 提高了编码质量; 在查询结果精炼的过程, 基于信息熵对候选集数据的编码位进行权重的动态设定, 通过比较动态加权汉明距离和符号位碰撞次数返回最终近似k近邻结果.理论和实验研究表明, 所提方法能够较好地处理高维空间中近似k近邻查询问题.
    Abstract: Aiming at the problem that the existing high-dimensional space AkNN query algorithm does not consider the association relationship between dimensions when reducing the data dimensionality, we propose the method which groups the dimensions based on association rules between dimensions to reduce the data dimensionality first. The algorithm reduces the loss of data information by dividing the related dimensions into a group for dimensionality reduction. At the same time, in order to solve the problem of data offset caused by Hash reduction, the sign bits are set and the query result is refined based on the characteristics of the sign bits. To improve the efficiency of mining association rules between dimensions, a new frequent itemset mining algorithm based on the UFP-tree is proposed in this paper. In order to improve the efficiency of the AkNN query, we map the data into binary codes and query based on the codes. And the coding functions are filtered by information entropy to improve the coding quality. In the process of refining the query results, weights are dynamically set based on the information entropy of the encoded bits of the candidate set data, and the final AkNN results are returned by comparing the dynamic weighted Hamming distance and the number of sign bit collisions. Theoretical and experimental studies show that the proposed method can effectively deal with AkNN query problems in high-dimensional spaces.
  • 期刊类型引用(10)

    1. 汤书苑,周一青,李锦涛,刘畅,石晶林. 基于特征校准的双注意力遮挡行人检测器. 西安电子科技大学学报. 2024(06): 25-39 . 百度学术
    2. 曹玉东,陈冬昊,曹睿,赵朗. 融合Mask R-CNN的在线多目标行人跟踪方法. 计算机工程与科学. 2023(07): 1216-1225 . 百度学术
    3. 曹健,陈怡梅,李海生,蔡强. 基于深度学习的道路小目标检测综述. 计算机工程. 2023(10): 1-12 . 百度学术
    4. 徐放,苗夺谦,张红云. 基于多粒度的Transformer目标检测算法. 计算机科学. 2023(11): 143-150 . 百度学术
    5. 陈立,张帆,郭威,黄赟. 面向遥感图像的多阶段特征融合目标检测方法. 电子学报. 2023(12): 3520-3528 . 百度学术
    6. 宋思雨,苗夺谦. 基于多粒度空间混乱的细粒度图像分类算法. 智能系统学报. 2022(01): 144-150 . 百度学术
    7. 潘云磊. 基于改进蚁群算法的行人运动特征跟踪提取方法. 河北北方学院学报(自然科学版). 2022(07): 1-6 . 百度学术
    8. 欧群雍,谭同德,袁红斌. 结合CNN和Bi-LSTM的多行人目标检测跟踪方法. 无线电工程. 2022(09): 1633-1641 . 百度学术
    9. 袁小鑫,胡军,黄永洪. 针对目标检测器的假阳性对抗样本. 计算机研究与发展. 2022(11): 2534-2548 . 本站查看
    10. 史进玲,张江维,程菊明. 多标记的不完备多粒度决策系统的最优粒度选择. 许昌学院学报. 2021(02): 103-108 . 百度学术

    其他类型引用(15)

计量
  • 文章访问数: 
  • HTML全文浏览量:  0
  • PDF下载量: 
  • 被引次数: 25
出版历程
  • 发布日期:  2021-02-28

目录

    /

    返回文章
    返回