University Of Tasmania
IKE06-Noise_Elimination_from.pdf (338.54 kB)
Download file

Noise Elimination from the Web Documents by Using URL paths and Information Redundancy

Download (338.54 kB)
conference contribution
posted on 2023-05-26, 07:21 authored by Byeong KangByeong Kang, Kim, YS
Noise data in the Web document significantly affect on the performance of the Web information management system. Many researchers have proposed document structure based noise data elimination methods. In this paper, we propose a different approach that uses a redundant information elimination approach in the Web documents from the same URL path. We propose a redundant word/phrase filtering method for single or multiple tokenizations. We conducted two experiments to examine efficiency and effectiveness of our filtering approaches. Experimental results show that our approach produces a high performance in these two criteria


Publication status

  • Published

Event title

The 2006 International Conference on Information & Knowledge Engineering

Event Venue

Las Vegas, US

Date of Event (Start Date)


Date of Event (End Date)


Repository Status

  • Open

Usage metrics


    No categories selected