RDR-based Open IE for the web document

Kim, MH; Compton, P; Kim, YS

RDR-based Open IE for the web document

conference contribution

posted on 2023-05-23, 09:19 authored by Kim, MH, Compton, P, Kim, YS

The Web contains a massive amount of information embedded in text and obtaining information from Web text is a major research challenge. One research focus is Open Information Extraction aimed at developing relation-independent information extraction. Open Information Extraction (OIE) systems seek to extract all potential relations from the text rather than extracting a few predefined relations. Existing OIE systems such as TEXTRUNNER usually take a machine learning based approach which requires large volumes of training data.

This paper presents a Ripple-Down Rules Open Information Extraction system based on processing example cases and manually adding rules when needed. The key advantages of this approach are that it can handle the freer writing style that occurs in Web documents and can correct errors introduced by natural language pre-processing tools, whereas systems like TEXTRUNNER depend on the quality of the entity-tagging preprocessing in the training data. We evaluated the Ripple-Down Rules approach against the OIE systems, TEXTRUNNER and StatSnowball. In these studies the Ripple-Down Rules approach, with minimal low-cost rule addition achieves much higher precision and somewhat improved recall compared to these other Open Information Extraction systems.

History

Publication title

Proceedings of the 6th International Conference on Knowledge Capture 2011

Pagination

105-112

ISBN

978-1-4503-0396-5

Department/School

School of Information and Communication Technology

Publisher

Association for Computing Machinery

Place of publication

New York, USA

Event title

6th International Conference on Knowledge Capture 2011

Event Venue

Alberta, Canada

Date of Event (Start Date)

2011-06-26

Date of Event (End Date)

2011-06-29

Rights statement

Repository Status

Restricted

Socio-economic Objectives

Application software packages

Usage metrics

Keywords

knowledge acquisition expert systems open information extraction ripple-down rules

Licence

In Copyright

RDR-based Open IE for the web document

History

Publication title

Pagination

ISBN

Department/School

Publisher

Place of publication

Event title

Event Venue

Date of Event (Start Date)

Date of Event (End Date)

Rights statement

Repository Status

Socio-economic Objectives

Usage metrics

Categories

Keywords

Licence

Exports