ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (11): 2343-2360.doi: 10.7544/issn1000-1239.2018.20170629

• 人工智能 •    下一篇

基因表达数据中的局部模式挖掘研究综述

姜涛1,李战怀2   

  1. 1(School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046); 2(School of Computer Science, Northwestern Polytechnical University, Xi’an 710129)
  • 出版日期: 2018-11-01
  • 基金资助: 
    国家自然科学基金项目(61702161,61732014,61502146,91746115,61602153);河南省重点研发与推广专项(科技攻关)(182102210213);河南省高等学校重点科研项目(18A520003,18A520015,18B510004)

A Survey on Local Pattern Mining in Gene Expression Data

Jiang Tao1, Li Zhanhuai2   

  1. 1(河南财经政法大学计算机与信息工程学院 郑州 450046); 2(西北工业大学计算机学院 西安 710129) (jiangtaoxxx@126.com)
  • Online: 2018-11-01

摘要: 基因微阵列(DNA microarray)是实验分子生物学中的一个重要突破,其使得研究者可以同时监测多个基因在多个实验条件下表达水平的变化,进而为发现基因协同表达网络、研制药物、预防疾病等提供技术支持.研究者们提出了大量的聚类算法来分析基因表达数据,但是标准的聚类算法(单向聚类)只能发现少量的知识.因为基因不可能在所有实验条件下共表达,也不可能展示出相同的表达水平,但是可能参与多种遗传通路.在这种情况下,双聚类方法应运而生.这样就将基因表达数据的分析从整体模式转向局部模式,从而改变了只根据数据的全部对象或属性将数据聚类的局面.主要从局部模式的定义、局部模式类型与标准、局部模式的挖掘与查询等方面进行了梳理.介绍了基因表达数据中局部模式挖掘当前的研究现状与进展,详细总结了基于定量和定性的局部模式挖掘标准以及相关的挖掘系统,分析了存在的问题,并深入探讨了未来的研究方向.

关键词: 基因微阵列, 基因表达, 局部模式, 保序子矩阵, 双聚类, 数据挖掘

Abstract: As an unprecedented breakthrough in experimental molecular biology domain, DNA microarray enables simultaneously monitoring of the expression level of thousands of genes over many experimental conditions. Studies have shown that analyzing microarray data is essential for finding gene co-expression network, designing new types of drugs, preventing disease, and so on. To analyze gene expression datasets, the researchers design many clustering methods, which can only find fewer of useful knowledge. Due to a subset of genes co-regulate and co-express only under a subset of experimental conditions, and also not co-express at the same level, they can belong to several genetic pathways that are not apparent. In this situation, the biclustering method is proposed. At the same time, the direction of gene expression analysis changes from the whole pattern mining to the local pattern discovery, and then it changes the situation of clustering data only based on all the objects or attributes of the data. The paper introduces the state-of-the-art progress, which includes the definition of local pattern, the types and criteria of local pattern, mining and query methods of local pattern. Then it concludes the mining criteria based on quantity and quality, and related software. Further, it gives the problems in the existing algorithms and tools. Finally, we discuss the research direction in the future.

Key words: DNA microarray, gene expression, local pattern, order-preserving submatrix, biclustering, data mining

中图分类号: