Abstract:
In qualitative genome-wide association studies (GWAS), most existing methods make strong assumptions on the interaction form between genes which limited their power. Lately, many methods for detecting gene-gene interaction have been developed, and among them, the gene-based methods have grown in popularity as they confer an advantage in both statistical power and biological interpretability. In this paper, we propose a hypothesis testing framework for gene-based gene-gene interaction detection based on U statistics and tree-based ensemble learners (GBUtrees). We construct a statistic that detects the deviation from the additive structure in the prediction of log odds ratio of a qualitative trait from each base learner, then average it for learners trained using different subsamples to turn it into the form of U statistics. GBUtrees benefits from both the non-linear modeling power of tree-based ensemble model and the asymptotic normality of U statistics. Our method makes no assumption on the form of interaction, which strengthens its power for detecting different kinds of interactions. Based on simulated studies of eight disease models and real data from the RA pathway in WTCCC dataset, we conclude that it is effective in detecting different kinds of interactions and can be useful for genome-wide association studies.