Anam_whole_thesis.pdf (2.01 MB)
Incremental knowledge-based system for schema mapping
thesisposted on 2023-05-27, 11:08 authored by Anam, S
Schemas describe the data structures of various domains such as purchase order, conference, health and music. A large number of schemas are available on the Web. Since different schema elements may have the same semantics but exist in distinct schemas, it is important to manage their semantic heterogeneity. Schema matching is usually used to determine mappings between semantically correspondent elements of different schemas. It can be conducted manually, semi-automatically and automatically. Man- ual matching is a time-consuming, error-prone and expensive process. Fully-automated matching is not possible because of the complexity of the schemas. This research investigated semi-automatic schema matching systems to overcome manual works for schema mapping. In general, these systems use machine learning and knowledge engineering approaches. Machine learning approaches require training datasets for building matching models. However, it is usually very diffcult to ob- tain appropriate training datasets for large datasets and to change the trained models once mapped. Knowledge engineering approaches require domain experts and time- consuming knowledge acquisition. In order to solve these problems, an incremental knowledge engineering approach - Ripple-Down Rules (RDR) can be a promising approach since it allows its knowledge to grow incrementally. However, acquiring matching rules is still a time-intensive task. In order to overcome the limitations of these independent approaches, a hybrid approach called Hybrid-RDR has been developed by combining a machine learning approach with the Censor Production Rules (CPR) based RDR approach. First, the most suitable machine learning algorithm, J48 is determined by comparing eleven machine learning approaches including decision trees, rules, Naive Bayes, AdaBootM1, and later combined with CPR based RDR for building Hybrid-RDR ap- proach. This approach constructs a matching model using J48. When new data are available, the model may suggest incorrect matchings for some cases which are corrected by incrementally adding rules to the knowledge base. The approach reuses the previous match operations (rules) and handles the schema matching problems using an incremental knowledge acquisition process. So users do not need to add, delete or modify schema matching results manually. The Hybrid-RDR approach works for element-level matching that only considers matching names of schema elements. Structure-level matching that considers the hierarchical structure of the schema, is required to adjust incorrect matches found from the element-level matching. A Knowledge-based Schema Matching System (KSMS) has also been developed that performs element-level matching using Hybrid-RDR and structure-level matching using Similarity Flooding algorithm. This algorithm considers the concept that two nodes are similar when their neighbor elements are similar. The final mappings are generated by combining the results of element-level matching and structure-level matching using aggregation functions. In order to evaluate the performance of the system, evaluations using real world schemas found on the Web have been conducted. Experimental results have shown that the system determines good performance both at element-level matching and structure-level matching. This research has resolved the ongoing problem of elements having different names within different schemas. The KSMS allows for matching of different schemas to produce accurate mappings.
Rights statementCopyright 2016 the author