高级检索

    通用数据质量评估模型及本体实现

    General Data Quality Assessment Model and Ontological Implementation

    • 摘要: 随着数据科学技术在各个领域的深度应用,作为企业重要资产的数据越发表现出它的价值性与重要性.绝大多数企业结合行业特点开发了数据质量检测系统来解决自身数据质量问题.这些系统的评估模型各有特点,对于数据质量维度的定义也是各有千秋.试图将这些模型与数据质量维度以一种通用的形式来定义,旨在成为企业研发数据质量评估系统的标准.通过分析国内外学者在该研究领域取得的成果并结合常年研发数据质量检测与评估系统的经验,首先,提出了一个通用的数据质量检测与评估的数学模型;接着,以此模型为基础,采用本体技术定义了从该通用的数据质量评估数学模型到本体模型映射的转换规则;随后,考虑到多数数据存储在关系数据库中,因此以关系数据模型为例,依据所提出的数学模型和转换规则实现了对数据质量评估本体的抽取与构建,该模型实现了复杂质量规则的定义,具有规范性,实现了对不同来源、不同格式的数据质量进行检测与评估;最后,结合中国石油的油田开发数据质量评估项目进行了系统的实现,验证所提出模型的正确性、科学性、合理性以及可扩充性等.由于所提出的数据质量检测与评估模型与领域无关,所以它具有通用性.

       

      Abstract: With the deep use of data science technology in all kinds of fields, as important assets of enterprise, data has shown more its value and importance. Most of the enterprises develop data quality detection systems to solve their own data quality issues by cooperating with the characteristics of industry. The assess models of these systems have different features, and the definitions of data quality dimensions are also different. This thesis attempts to define these models and data quality dimensions in a generic form, and aims to become the standards for the enterprise in developing data quality assess system. Through the analysis of research achievements in the field by domestic and foreign scholars, combining with years of experience in developing data quality detection and assess systems, First, a general mathematical model of data quality detection and assess is proposed. Next, based on this model, adopt ontology technology to define the transformation rules mapping from the general mathematical model to ontology model. Furthermore, considering most of the data stored in a relational database, take relational database as an example, based on the proposed mathematical model and transformation rules to achieve the extraction and construction of data quality assessment ontology. This model realizes the definition of complex quality rules, is standardized and can detect and assess data with different sources and different formats. Finally, combined with oil field of PetroChina development data quality assessment project, an application system is implemented, in order to verify the correctness, scientificalness, rationality and extensibility of the proposed model. Because the proposed data quality detection and assess model has nothing to do with the field, it posses generality.

       

    /

    返回文章
    返回