Multicasting a Web Repository

Wang Lam and Hector Garcia-Molina

Papers Available

Abstract

Web crawlers generate significant loads on Web servers, and are difficult to operate. Instead of running crawlers at many "client" sites, we propose a central crawler and Web repository that then multicasts appropriate subsets of the central repository to clients. Loads at Web servers are reduced because a single crawler visits the servers, as opposed to all the client crawlers. In this paper we model and evaluate such a central Web multicast facility. We develop multicast algorithms for the facility, comparing them with ones for "broadcasts disks." We also evaluate performance as several factors, such as object granularity and client batching, are varied.

Summary

This paper introduces the performance metric, client delay, that measures the time between a client's initial request for data and the time it first receives all the data it requested. We consider how a number of different data schedulers perform under varying client loads, and how the performance of the system is affected by how much clients' requests coincide. We also examine the efficiency and performance tradeoff in having clients wait, to form prescheduled client batches with more shared requests. We quantify how the granularity of a data request affects the performance of a multicast system.

Citation (BibTeX)

@inproceedings{lg-webdb01,
 author      = {Wang Lam and Hector Garcia-Molina},
 title       = {Multicasting a {Web} Repository},
 booktitle   = {Fourth International Workshop on the Web and Databases (WebDB)},
 year        = {2001},
 pages       = {25--30},
 note        = {Available at http://dbpubs.stanford.edu/pub/2001-28}
}
@techreport{lg-multicast-extended,
 author      = {Wang Lam and Hector Garcia-Molina},
 title       = {Multicasting a {Web} Repository (extended version)},
 institution = {Stanford University},
 year        = {2000},
 note        = {Available at http://dbpubs.stanford.edu/pub/2001-55}
}

Wang Lam - source Tue Aug 5 01:40:38 2003 - generated Thu Aug 5 02:57:15 2004