Javier Sanchez
Rohit Singh

Status: Draft proposal
Last modified: 02/06/2001

Abstract

Right now the search of documents in the internet has two main difficulties: Suppose a proxy routes the web traffic of a community of users who share common interests, for example CS students and faculty, then the proxy can be used to: This could be extended to provide tailored suggestions to the user based upon her explicitly specified  surfing/search preferences and even better would be the situation if the proxy could deduce these on its own.
 

User Experience and Motivation

Scenario#1 : Suppose Drew is interested in ubiquitous computing related articles on the web, but she does not know the right keywords/search-engines nor she has the time to do an extensive search and hence can not get the most relevant results using a traditional approach. But the proxy (say the CS dept proxy) which Drew is using, has information about the pages visited by other users. Among these users are people who are also interested in ubiquitous computing and the proxy has information about the pages they visited in this regard. What the proxy can do now is to suggest Drew these pages which are more likely to be relevant to what she needs. Note that the server need not explicitly "know" what Drew is looking for and then match it. The power of this approach comes from the fact that users in a community would often look for the same resources and this information can be easily aggregated and de-personalized.  Another motivation is that Drew is not only interested in what she can find now about ubiquitous computing but she would also like to know about new sites that are related to that topic that she didn't find today ( and were not known to the proxy either). In the traditional approach, this will involve a new search or someone sending her a link. But if the proxy, that monitors traffic of the CS community , "knows" that Drew is interested in that topic, it can notify her about new links that were not reported to her previously.

Scenario#2 :  In the previous scenario, there was very little active participation by Drew to contribute to the information that the proxy accumulates. The information accumulated by the proxy is totally automated. But we can do better. Suppose,  Drew ranks the pages he visits on the basis of their usefulness or relevance vis-a-vis her search. Then others, say Cameron, can take advantage of this ranking. This ranking is especially useful because it has been produced by Cameron's peers and so it is much more likely to be relevant to her. Of course, this will require the users to proactively rank pages they visit (or otherwise review them)
 

System Architecture

[hold on, we're getting there] Basically, there will be a client side plug-in (like the Google toolbar).  The user will need to get a username (to enable user specific services). This might not be necessary if the user chooses to only use de-personalized services (scenario#1). On the proxy, there will be document classification, matching and retrieval module. At this stage, we have not further elaborated upon the proxy design.

Team Members

  1. Javier Sanchez
  2. Rohit Singh

Technical Challenges and Open Issues

Lot of work has been done in related topics in Datamining, Information Retrieval and Text Document Matching. We hope to be able use a lot of the algorithms and code from these domains. We will need to borrow public domain code for document classification, search and retrieval.
 

Demo

The demonstration will include two main deliverables: