Multicasting a Changing Repository

Wang Lam and Hector Garcia-Molina

Papers Available

Abstract

Web crawlers generate significant loads on Web servers, and are difficult to operate. Instead of repeatedly running crawlers at many "client" sites, we propose a central crawler and Web repository that multicasts appropriate subsets of the central repository, and their subsequent changes, to subscribing clients. Loads at Web servers are reduced because a single crawler visits the servers, as opposed to all the client crawlers. In this paper we model and evaluate such a central Web multicast facility for subscriber clients, and for mixes of subscriber and one-time downloader clients. We consider different performance metrics and multicast algorithms for such a multicast facility, and develop guidelines for its design under various conditions.

Summary

If the server's data changes over time, then clients may be interested in subscribing to data items--requesting the data items if they don't have them, and remaining connected to the server to request new versions of the data items as they become available. For these subscriber clients, metrics based on data freshness and age become important. We consider how to schedule data for subscriber clients under these freshness and age metrics (averaged across clients and weighted-averaged by request size) for a variety of client loads. We also consider how a server should operate when providing data for a mix of both one-time-download clients and subscriber clients.

Citation (BibTeX)

@inproceedings{lg-icde03,
 author      = {Wang Lam and Hector Garcia-Molina},
 title       = {Multicasting a Changing Repository},
 booktitle   = {Proceedings of the 19th International Conference on Data Enginee
ring (ICDE 2003)},
 month       = {March},
 year        = {2003},
 publisher   = {IEEE Computer Society},
 isbn        = {0-7803-7665-X},
 pages       = {215--226},
 note        = {Available at http://dbpubs.stanford.edu/pub/2003-48}
}
@techreport{lg-subscribers-extended,
 author      = {Wang Lam and Hector Garcia-Molina},
 title       = {Multicasting a Changing Repository (extended version)},
 institution = {Stanford University},
 year        = {2000},
 note        = {Available at http://dbpubs.stanford.edu/pub/2001-55}
}

Wang Lam - source Tue Aug 5 01:40:42 2003 - generated Thu Aug 5 02:57:15 2004