Multicasting a Changing Repository

Wang Lam and Hector Garcia-Molina

Papers Available


Web crawlers generate significant loads on Web servers, and are difficult to operate. Instead of repeatedly running crawlers at many "client" sites, we propose a central crawler and Web repository that multicasts appropriate subsets of the central repository, and their subsequent changes, to subscribing clients. Loads at Web servers are reduced because a single crawler visits the servers, as opposed to all the client crawlers. In this paper we model and evaluate such a central Web multicast facility for subscriber clients, and for mixes of subscriber and one-time downloader clients. We consider different performance metrics and multicast algorithms for such a multicast facility, and develop guidelines for its design under various conditions.


If the server's data changes over time, then clients may be interested in subscribing to data items--requesting the data items if they don't have them, and remaining connected to the server to request new versions of the data items as they become available. For these subscriber clients, metrics based on data freshness and age become important. We consider how to schedule data for subscriber clients under these freshness and age metrics (averaged across clients and weighted-averaged by request size) for a variety of client loads. We also consider how a server should operate when providing data for a mix of both one-time-download clients and subscriber clients.

Citation (BibTeX)

