高级检索

    增量ETL过程自动化产生方法的研究

    Research on Generating Incremental ETL Processes Automatically

    • 摘要: ETL过程用于将数据从数据源装载到数据仓库中,它可以被划分为两种类型:全量ETL过程和增量ETL过程.全量ETL过程只能处理全量数据,但易于设计.而增量ETL过程设计起来比较复杂,但适用于处理增量数据.主要对增量ETL过程的自动化产生方法进行了研究,根据已有的全量ETL过程,可以自动产生增量ETL过程,从而降低设计增量ETL过程的代价.利用已有的物化视图增量维护的方法,给出了根据全量ETL过程自动产生增量ETL过程的方法.但是已有的研究集中在包含选择、投影、联接和聚合运算情况下物化视图的增量维护,未见对包括差运算情况下的讨论.作为研究工作的基础,还详细讨论了包含差运算情况下物化视图的增量维护问题.

       

      Abstract: ETL processes are used for collecting data from data sources to data warehouse. ETL processes can be separated into two portions: full ETL processes and increment ETL processes. A full ETL process can be designed easily but it can only deal full data. An incremental ETL process is used for loading only those data which are newly created in the data sources, but it is difficult to design manually. In this paper, using existing methods of incremental maintenance of materialized views for reference, an approach to generate an incremental ETL process automatically from a full ETL process is put forward. Existing researches are focused on the incremental maintenance of materialized views in such circumstances which involve the operators of selection, projection, join and aggregation but not the difference operators. Since difference operators are used frequently in an ETL process, incremental maintenance of materialized views defined with difference operators is also discussed in detail.

       

    /

    返回文章
    返回