University of Tasmania
Browse

Noise Elimination from the Web Documents by Using URL paths and Information Redundancy

Download (338.54 kB)
conference contribution
posted on 2023-05-26, 07:21 authored by Byeong KangByeong Kang, Kim, YS
Noise data in the Web document significantly affect on the performance of the Web information management system. Many researchers have proposed document structure based noise data elimination methods. In this paper, we propose a different approach that uses a redundant information elimination approach in the Web documents from the same URL path. We propose a redundant word/phrase filtering method for single or multiple tokenizations. We conducted two experiments to examine efficiency and effectiveness of our filtering approaches. Experimental results show that our approach produces a high performance in these two criteria

History

Publication status

  • Published

Event title

The 2006 International Conference on Information & Knowledge Engineering

Event Venue

Las Vegas, US

Date of Event (Start Date)

2006-06-26

Date of Event (End Date)

2006-06-29

Repository Status

  • Open

Usage metrics

    University Of Tasmania

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC