Computer Science References

Comprehensive Exams
Behavior of NULLs in SQL
Multicast Data Dissemination
Stanford WebBase
Online Reference Material

Multicast Data Dissemination

Despite the growth of the Internet, dissemination of popular data to a large number of users over a network remains an expensive proposition. The expense comes from the data server repeatedly sending a separate copy of a requested piece of data to every user client that needs it. Consequently, a popular server becomes a victim of its own success.

Instead, a data server could use its network's multicast capability to send a single packet of data once, and have it addressed to reach an entire group of clients at once. Though successful delivery is not guaranteed, this network feature can still dramatically reduce the server's network consumption.

Multicast is already available natively on the Internet's multicast backbone (MBone), Internet2, and native IPv6 networks. Multicast is supported on modern operating systems including Windows XP, Mac OS X, FreeBSD, GNU/Linux, and modern commercial flavors of Unix.

In work with Professor Hector Garcia-Molina, we study the challenges in designing and building such a multicast data server. We must make it efficient, fast, reliable, and scalable for a variety of clients.

Papers on Data Scheduling

How should the server use its network connection to order the transmission of a large number of data requests? Clients' requests may vary in size, and may coincide with other clients' requests to varying degrees. Given a stream of client requests, the server must decide which of its requested data items to send next.

Multicasting a Web Repository (WebDB 2001, Santa Barbara)
Multicasting a Changing Repository (ICDE 2003, Bangalore)

Papers on Network Transmission

How should the server cope with its clients' very different network connections? Some clients may have more reliable (less loss-prone) connections than others; some clients may have higher-throughput connections from the server than others. The server must optimize performance for such heterogeneous clients while ensuring that every client will still receive the entirety of the data that it requests.

Reliably Networking a Multicast Repository (SRDS 2003, Florence)
Slicing Broadcast Disks (technical report)

Implementation

An in-development version of a file-sharing multicast facility is available upon request.

Implementing Multicast Data Dissemination (technical report)

Dissertation

The dissertation was submitted to Stanford University on 24 Sep 2004, and is available online.

Multicast Data Dissemination (Ph.D. dissertation)

Stanford WebBase

The Stanford WebBase is a World Wide Web hypertext repository designed to aid research and analysis. This project aims to develop a large-scale repository that cleanly and effectively supports a variety of research on World Wide Web pages, while conserving disk and main memory usage. This repository seeks to allow the building of new feature indices on Web pages, the graph analysis of a large slice of the Web, and the flexible multicast distribution of Web data (and computational workload).

Implementation work with Taher Haveliwala, Sriram Raghavan, Gary Wesley.
Crawler originally by Junghoo Cho, with work from Pranav Kantawala.
Initial multicast-tree code from Ashish Goel, Kameshwar Munagala.
A part of the Digital Library project: Professor Hector Garcia-Molina, Dr. Andreas Paepcke.

For information about this project, including how to get Web data from, and the software for, our repository, please see the project's main page.

About the Implementation

The Stanford WebBase crawler must crawl the Web scalably, quickly, easily, and efficiently, while minimizing load on Web servers being crawled. Users must be able to tap the WebBase distribution infrastructure easily to fetch and process large volumes of Web data for experiments or analysis. As an example, WebBase's local indexing uses the distribution facility, just as outside users would, to create compact indexes from Web crawls. We describe how we implement WebBase to meet the features mentioned above, and measure the performance and scalability of the running resulting.

Stanford WebBase Components and Applications (technical report at dbpubs; abstract and citation to be posted here soon)
slides: (OpenOffice.org Presentation, source format) and (generated PDF)
(OpenOffice.org is a free office suite available for many platforms.)

Online Reference Material

CiteSeer search for computer science papers
DBLP (mirror at ACM SIGMOD) search database-related bibliographies
Specifications: HTTP/1.1 (RFC 2616) - HTML 4.01 - CSS Level 1 - XPath 1.0 - XSLT 1.0
SQL for PostgreSQL 7.3

Wang Lam - source Sun Oct 23 00:15:22 2005 - generated Sun Oct 23 19:55:08 2005