University of Tasmania
Browse

Efficient provenance storage for relational queries

Version 2 2025-01-15, 01:14
Version 1 2023-05-23, 08:58
conference contribution
posted on 2025-01-15, 01:14 authored by Z Bao, H Kohler, L Wang, X Zhou, S Sadiq
Provenance information is vital in many application areas as it helps explain data lineage and derivation. However, storing fine-grained provenance information can be expensive. In this paper, we present a framework for storing provenance information relating to data derived via database queries. In particular, we first propose a provenance tree data structure which matches the query structure and thereby presents a possibility to avoid redundant storage of information regarding the derivation process. Then we investigate two approaches for reducing storage costs. The first approach utilizes two ingenious rules to achieve reduction on provenance trees. The second one is a dynamic programming solution, which provides a way of optimizing the selection of query tree nodes where provenance information should be stored. The optimization algorithm runs in polynomial time in the query size and is linear in the size of the provenance information, thus enabling provenance tracking and optimization without incurring large overheads. Experiments show that our approaches guarantee significantly lower storage costs than existing approaches.

History

Publication title

Proceedings of the 21st ACM International Conference on Information and Knowledge Management

Volume

39

Pagination

1352-1361

ISBN

978-1-4503-1156-4

Department/School

Information and Communication Technology

Publisher

Association for Computing Machinery

Publication status

  • Published

Place of publication

United States of America

Event title

21st ACM International Conference on Information and Knowledge Management

Event Venue

Maui, Hawaii

Date of Event (Start Date)

2012-10-29

Date of Event (End Date)

2012-11-02

Rights statement

Copyright 2012 ACM

Socio-economic Objectives

220499 Information systems, technologies and services not elsewhere classified

Usage metrics

    University Of Tasmania

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC