Abstract:
As an important part of the Web data integration, Web data fusion is the quality assurance of integrated data and the precondition of accurate analysis and mining. However, being a uniform data fusion is treated as black box, which makes the fusion lack of interpretability and debuggable ability. Therefore, to describe fusion process and origin for solving the conflict, we should construct a provenance mechanism with data provenance. Data provenance describes about how data is generated and evolves with time going on, which can not only show which input tuples contribute to the data but also how they contribute. We study the semiring provenance for data fusion. Firstly, we propose an approximate iterative approach to optimal the computational process of semiring provenance. After, to speed up the convergence, we show a Newton-like approach. Recursion may make the situation complicated, we analysize the characteristic of semiring provenance and show that Kleene sequence and Newton-like sequence can convergent only after n step. And experiments show that the technologies in this paper are highly effective and feasible.