Incremental knowledge-based system for schema mapping

Anam, S

doi:10.25959/23239799.v1

Anam_whole_thesis.pdf (2.01 MB)

Incremental knowledge-based system for schema mapping

thesis

posted on 2023-05-27, 11:08 authored by Anam, S

Schemas describe the data structures of various domains such as purchase order, conference, health and music. A large number of schemas are available on the Web. Since different schema elements may have the same semantics but exist in distinct schemas, it is important to manage their semantic heterogeneity. Schema matching is usually used to determine mappings between semantically correspondent elements of different schemas. It can be conducted manually, semi-automatically and automatically. Man- ual matching is a time-consuming, error-prone and expensive process. Fully-automated matching is not possible because of the complexity of the schemas. This research investigated semi-automatic schema matching systems to overcome manual works for schema mapping. In general, these systems use machine learning and knowledge engineering approaches. Machine learning approaches require training datasets for building matching models. However, it is usually very diffcult to ob- tain appropriate training datasets for large datasets and to change the trained models once mapped. Knowledge engineering approaches require domain experts and time- consuming knowledge acquisition. In order to solve these problems, an incremental knowledge engineering approach - Ripple-Down Rules (RDR) can be a promising approach since it allows its knowledge to grow incrementally. However, acquiring matching rules is still a time-intensive task. In order to overcome the limitations of these independent approaches, a hybrid approach called Hybrid-RDR has been developed by combining a machine learning approach with the Censor Production Rules (CPR) based RDR approach. First, the most suitable machine learning algorithm, J48 is determined by comparing eleven machine learning approaches including decision trees, rules, Naive Bayes, AdaBootM1, and later combined with CPR based RDR for building Hybrid-RDR ap- proach. This approach constructs a matching model using J48. When new data are available, the model may suggest incorrect matchings for some cases which are corrected by incrementally adding rules to the knowledge base. The approach reuses the previous match operations (rules) and handles the schema matching problems using an incremental knowledge acquisition process. So users do not need to add, delete or modify schema matching results manually. The Hybrid-RDR approach works for element-level matching that only considers matching names of schema elements. Structure-level matching that considers the hierarchical structure of the schema, is required to adjust incorrect matches found from the element-level matching. A Knowledge-based Schema Matching System (KSMS) has also been developed that performs element-level matching using Hybrid-RDR and structure-level matching using Similarity Flooding algorithm. This algorithm considers the concept that two nodes are similar when their neighbor elements are similar. The final mappings are generated by combining the results of element-level matching and structure-level matching using aggregation functions. In order to evaluate the performance of the system, evaluations using real world schemas found on the Web have been conducted. Experimental results have shown that the system determines good performance both at element-level matching and structure-level matching. This research has resolved the ongoing problem of elements having different names within different schemas. The KSMS allows for matching of different schemas to produce accurate mappings.

History

Publication status

Unpublished

Rights statement

Repository Status

Restricted

Usage metrics

Keywords

Schema matching schema mapping string similarity metrics text processing techniques machine learning approaches Ripple Down Rules Hybrid-RDR approach

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Incremental knowledge-based system for schema mapping

History

Publication status

Rights statement

Repository Status

Usage metrics

Categories

Keywords

Licence

Exports