高级检索

    基于U统计量和集成学习的基因互作检测方法

    U-Statistics and Ensemble Learning Based Method for Gene-Gene Interaction Detection

    • 摘要: 在全基因组关联研究GWAS中,多数方法对疾病与单核苷酸多态性位点之间的互作关系形式给出了强假设,这降低了相关方法的挖掘能力.近几年,以基因作为研究单位的基因-基因相互作用检测方法,因其在统计效力与生物可解释性方面的优势受到重视.针对已有方法检测相互作用类型时存在的局限性,提出一种基于U统计值与集成学习器的假设检验方法GBUtrees,通过构造统计量用于表征疾病性状与2个基因之间关系偏离加性模型的程度,检测以基因为单位的基因-基因相互作用.该统计量在不同子样例集下结果的平均值满足U统计量理论,从而可以利用U统计量的渐进正态分布性质获得所构造统计量的分布信息.GBUtrees对相互作用的形式不作假设,增强该方法对不同形式相互作用的挖掘能力.仿真与真实实验结果表明:该方法能够有效地进行不同类型相互作用的挖掘,可以应用于全基因组关联研究.

       

      Abstract: In qualitative genome-wide association studies (GWAS), most existing methods make strong assumptions on the interaction form between genes which limited their power. Lately, many methods for detecting gene-gene interaction have been developed, and among them, the gene-based methods have grown in popularity as they confer an advantage in both statistical power and biological interpretability. In this paper, we propose a hypothesis testing framework for gene-based gene-gene interaction detection based on U statistics and tree-based ensemble learners (GBUtrees). We construct a statistic that detects the deviation from the additive structure in the prediction of log odds ratio of a qualitative trait from each base learner, then average it for learners trained using different subsamples to turn it into the form of U statistics. GBUtrees benefits from both the non-linear modeling power of tree-based ensemble model and the asymptotic normality of U statistics. Our method makes no assumption on the form of interaction, which strengthens its power for detecting different kinds of interactions. Based on simulated studies of eight disease models and real data from the RA pathway in WTCCC dataset, we conclude that it is effective in detecting different kinds of interactions and can be useful for genome-wide association studies.

       

    /

    返回文章
    返回