Abstract:
ETL processes are used for collecting data from data sources to data warehouse. ETL processes can be separated into two portions: full ETL processes and increment ETL processes. A full ETL process can be designed easily but it can only deal full data. An incremental ETL process is used for loading only those data which are newly created in the data sources, but it is difficult to design manually. In this paper, using existing methods of incremental maintenance of materialized views for reference, an approach to generate an incremental ETL process automatically from a full ETL process is put forward. Existing researches are focused on the incremental maintenance of materialized views in such circumstances which involve the operators of selection, projection, join and aggregation but not the difference operators. Since difference operators are used frequently in an ETL process, incremental maintenance of materialized views defined with difference operators is also discussed in detail.