Abstract:
With the deep use of data science technology in all kinds of fields, as important assets of enterprise, data has shown more its value and importance. Most of the enterprises develop data quality detection systems to solve their own data quality issues by cooperating with the characteristics of industry. The assess models of these systems have different features, and the definitions of data quality dimensions are also different. This thesis attempts to define these models and data quality dimensions in a generic form, and aims to become the standards for the enterprise in developing data quality assess system. Through the analysis of research achievements in the field by domestic and foreign scholars, combining with years of experience in developing data quality detection and assess systems, First, a general mathematical model of data quality detection and assess is proposed. Next, based on this model, adopt ontology technology to define the transformation rules mapping from the general mathematical model to ontology model. Furthermore, considering most of the data stored in a relational database, take relational database as an example, based on the proposed mathematical model and transformation rules to achieve the extraction and construction of data quality assessment ontology. This model realizes the definition of complex quality rules, is standardized and can detect and assess data with different sources and different formats. Finally, combined with oil field of PetroChina development data quality assessment project, an application system is implemented, in order to verify the correctness, scientificalness, rationality and extensibility of the proposed model. Because the proposed data quality detection and assess model has nothing to do with the field, it posses generality.