Elimination of Redundant Information for Web Data Mining

Taib, SM; Yeom, SJ; Kang, BH

Elimination of Redundant Information for Web Data Mining

conference contribution

posted on 2023-05-26, 08:09 authored by Taib, SM, Yeom, SJ, Kang, BH

These days, billions of Web pages are created with\ HTML or other markup languages. They only have a few\ uniform structures and contain various authoring styles\ compared to traditional text-based documents. However,\ users usually focus on a particular section of the page\ that presents the most relevant information to their\ interest. Therefore, Web documents classification needs\ to group and filter the pages based on their contents and\ relevant information. Many researches on Web mining\ report on mining Web structure and extracting\ information from web contents. However, they have\ focused on detecting tables that convey specific data, not\ the tables that are used as a mechanism for structuring\ the layout of Web pages. Case modeling of tables can be\ constructed based on structure abstraction. Furthermore,\ Ripple Down Rules (RDR) is used to implement\ knowledge organization and construction, because it\ supports a simple rule maintenance based on case and\ local validation.

History

Volume

1

Pagination

200-205

Publisher

IEEE

Publication status

Published

Event title

International Conference on Information Technology

Event Venue

Las Vegas, USA

Date of Event (Start Date)

2005-04-04

Date of Event (End Date)

2005-04-06

Repository Status

Open

Usage metrics

Elimination of Redundant Information for Web Data Mining

History

Volume

Pagination

Publisher

Publication status

Event title

Event Venue

Date of Event (Start Date)

Date of Event (End Date)

Repository Status

Usage metrics

Categories

Keywords

Licence

Exports