File(s) under permanent embargo
An evaluation study of web monitoring : web monitoring vs. web crawling
thesisposted on 2023-05-26, 18:53 authored by Kim, YS
Nowadays people use web search engines to find information. Even though these engines endeavour to provide information in a complete and timely manner, there are significant delays and under-coverage in their services. However, people sometimes want to obtain new information from personally selected web pages without missing anything and with little delay. Web monitoring tries to fulfil this goal by revisiting the selected web pages frequently. Initially, web monitoring focused on the monitoring method, but then the research emphasis changed in order to address the problem of information overload and scheduling under limited resources. This dissertation focuses on the following research problems to improve the efficiency of web monitoring systems. Firstly, it analyses how efficiently a document classification system that uses an incremental knowledge acquisition method, called MCRDR (Multiple Classification Ripple-Down Rules), was used to resolve individual information overload problems. Secondly, it discusses how MCRDR knowledge bases, standard web search engines, and appropriate¬¨‚àë web page locating heuristics can be employed in unison to locate relevant monitoring web pages. Thirdly, it demonstrates that the web monitoring system exhibits better performance in respect of service coverage and delay than commercial web search engines. Lastly, it proposes a monitoring web page prioritization method that decides the orders of monitoring sequence using the estimated service coverage and delay of web search engines obtained by using various predictor variables identified from the web crawling policies and statistical regression methods.
Rights statementCopyright 2009 the Author - The University is continuing to endeavour to trace the copyright owner(s) and in the meantime this item has been reproduced here in good faith. We would be pleased to hear from the copyright owner(s). Thesis (PhD)--University of Tasmania, 2009. Includes bibliographical references. Ch. 1. Introduction -- Ch. 2. Web monitoring - possibilities and limitations -- Ch. 3. Managing information overload using MCRDR document -- Ch. 4. Relevant web page retrieval from search engines reusing MCRDR knowledge bases -- Ch. 5. Monitoring web page location huristics -- Ch. 6. Service coverage and delay in search engines -- Ch. 7. Modelling prioritization -- Ch. 8. Study conclusions