Efficient provenance storage for relational queries
Version 2 2025-01-15, 01:14Version 2 2025-01-15, 01:14
Version 1 2023-05-23, 08:58Version 1 2023-05-23, 08:58
conference contribution
posted on 2025-01-15, 01:14authored byZ Bao, H Kohler, L Wang, X Zhou, S Sadiq
Provenance information is vital in many application areas as it helps explain data lineage and derivation. However, storing fine-grained provenance information can be expensive. In this paper, we present a framework for storing provenance information relating to data derived via database queries. In particular, we first propose a provenance tree data structure which matches the query structure and thereby presents a possibility to avoid redundant storage of information regarding the derivation process. Then we investigate two approaches for reducing storage costs. The first approach utilizes two ingenious rules to achieve reduction on provenance trees. The second one is a dynamic programming solution, which provides a way of optimizing the selection of query tree nodes where provenance information should be stored. The optimization algorithm runs in polynomial time in the query size and is linear in the size of the provenance information, thus enabling provenance tracking and optimization without incurring large overheads. Experiments show that our approaches guarantee significantly lower storage costs than existing approaches.
History
Publication title
Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Volume
39
Pagination
1352-1361
ISBN
978-1-4503-1156-4
Department/School
Information and Communication Technology
Publisher
Association for Computing Machinery
Publication status
Published
Place of publication
United States of America
Event title
21st ACM International Conference on Information and Knowledge Management
Event Venue
Maui, Hawaii
Date of Event (Start Date)
2012-10-29
Date of Event (End Date)
2012-11-02
Rights statement
Copyright 2012 ACM
Socio-economic Objectives
220499 Information systems, technologies and services not elsewhere classified