ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (8): 1683-1693.doi: 10.7544/issn1000-1239.2018.20180365

Special Issue: 2018数据挖掘前沿进展专题

Previous Articles     Next Articles

U-Statistics and Ensemble Learning Based Method for Gene-Gene Interaction Detection

Guo Yingjie1, Liu Xiaoyan1, Wu Chenxi2, Guo Maozu1,3,Li Ao1   

  1. 1(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001);2(Department of Mathematics, Rutgers University, Piscataway, NJ, USA 08854);3(Beijing Key Laboratory of Intelligent Processing for Building Big Data (Beijing University of Civil Engineering and Architecture), Beijing 100044)
  • Online:2018-08-01

Abstract: In qualitative genome-wide association studies (GWAS), most existing methods make strong assumptions on the interaction form between genes which limited their power. Lately, many methods for detecting gene-gene interaction have been developed, and among them, the gene-based methods have grown in popularity as they confer an advantage in both statistical power and biological interpretability. In this paper, we propose a hypothesis testing framework for gene-based gene-gene interaction detection based on U statistics and tree-based ensemble learners (GBUtrees). We construct a statistic that detects the deviation from the additive structure in the prediction of log odds ratio of a qualitative trait from each base learner, then average it for learners trained using different subsamples to turn it into the form of U statistics. GBUtrees benefits from both the non-linear modeling power of tree-based ensemble model and the asymptotic normality of U statistics. Our method makes no assumption on the form of interaction, which strengthens its power for detecting different kinds of interactions. Based on simulated studies of eight disease models and real data from the RA pathway in WTCCC dataset, we conclude that it is effective in detecting different kinds of interactions and can be useful for genome-wide association studies.

Key words: U statistics, ensemble learning, gene-gene interaction, single nucleotide polymorphism, genome-wide association studies

CLC Number: