Web information publishers are interested in how well their information is indexed by major search engines within short time, because the search engines are a main information access point for their web sites. Being the crawling policy of each search engine is usually commercial secret, it is useful to estimate each search engine's coverage and delay with known predicator variables. This paper proposes forecasting models for service coverage and delay of search engines in the Australian government area using predictor variables, identified from the crawling policies of academic papers, and statistical regression methods. The Logistic regression method was employed for coverage forecast and Poisson regression method for delay. Our research results show that different explanatory variables were chosen for constructing models and their importance significantly varies among search engines.
History
Publication status
Published
Event title
Pacific Rim Knowledge Acquisition Workshop (PKAW-08). Proceedings
Event Venue
Hanoi, Vietnam
Date of Event (Start Date)
2008-12-01
Date of Event (End Date)
2008-12-01
Rights statement
The original publication is available at www.springerlink.com