Paths: Overview

Rough Draft March 10, 2000
(Based on material presented to IW-Infrastructure group, February 23rd)
Comments are greatly appreciated. emrek@cs.stanford.edu Also, you might be interested in an architecture overview.

What are Paths?

A Path is a data flow through a graph of operators and connectors.

Operators perform computation on data
Connectors move data between machines

A path can be distributed across a cluster or wide-area, or contained on a single machine. An important feature of paths is how the composability of operators enables easy (automatically!) generation of new paths/services.

The data that flows through a path is packetized as an Application Data Unit (ADU). Although there is typing information associated with each ADU, the structure and content of the ADU defined at the application-level, and not by Paths.

Applications of Paths include transformational services to support heterogenous end-devices, telephony-style communication between heterogeneous devices, "clusterization" of database queries, internet services, composable applications, etc.

Paths can perhaps be thought of as "application-level active networks," or even as data-flow computing, where the building blocks are application modules rather than machine instructions.

Operators

Operators are composable, mobile pieces of code, along with an (XML) description of the operator. These descriptions are primarily a strongly-typed interface: the number and types of their inputs and outputs. They also include information on where to get the code from, and how to run it.

Operators can be:

Type Transformers, performing data-type transformations on data, such as GIF to JPG or XML to HTML.
Semantic Transformers, performing some operation on data which does not change the type of the data. e.g., mathematical computation, sorting or filtering.
Aggregators and Disseminators
Data sources and data sinks: e.g., clients consume data, databases "produce" data.

Connectors

Connectors, like operators, are also mobile code. However, connectors are described by their transport characteristics, and are type-neutral. Some of the characteristics that might describe a connector include its reliability, latency, in-order delivery, QoS, and security levels.

Types

The typing system consists of a multiple-inheritance type hierarchy with key/value pairs as type attributes. A type can also be related to some other type (e.g., with a list type would have a "contains" relationship to the type of data it contains). Types do not generally specify any structure to the data they describe. It is assumed that if two applications agree on the name of a type, they also agree on it's structure and semantic interpretation (one way to make this more explicit is to embed a language-specific typename into the path-type).

Here is an illustration of one simple type hierarchy.

It's important to note that the Paths architecture does not specify any particular ontology or type hierarchy.

Since operators, especially type transformers, are to a large extent defined by their type interfaces, it's important to understand how much information can and should be represented in a data type. This is an open issue.

Path Creation and Instantiation

Vocabulary:

Partial Path: an unconnected (or partially connected) path (two adjacent operators in a path are considered unconnected if there is a type mismatch between their inputs and outputs).
Logical Path: A completely connected path.
Physical Path: A logical path whose operators have been assigned to hosts for execution.
Path Instance: A running path. Once a path is running, information about it (it's member operators, etc) is not explicitly maintained, and must be discovered through introspection.

The process of creating a path is divided into three stages:

The Path Finder: Given some request for a path, will generate a a logical path.
The Path Placer: Given a logical path, a path placer will assign each operator to run on a specific host. Generates a physical path.
Path Dispatcher: splits and dispatches the physical path to appropriate hosts.
Path Instantiator: the path instantiator will instantiate operators and connectors and begin data flow.

Because the path creation process is itself a path, new stages can be added and existing stages can be replaced easily. Examples of possible useful stages includes: adding caching operators around operators in a logical path, parallelizing operators within a path, adding supervisor and logging operators.

Automatic Path Creation

Automatic Path Creation (APC) is the process taking a partial path of unconnected operators and connecting them, adding type transformers to the path as necessary to ensure type-correctness. E.g., given a path consisting of a data sink which only accepts images and a data source which generates text, APC would add to the path an operator which rendered the text into an image.

APC is one example of the Path Finder operator in the previous section. The output of APC is a logical path.

Limitations on APC include it's reliance on transformational semantics to decide when to insert operators into the path, and it's non-linear running time, ~O(n²), where n is the length of the path.

Despite this, APC seems like a useful tool to enable any-to-any communication between heterogeneous devices and services.

Application Development using Paths

I can see two different ways of using paths to develop applications:

Paths provide only one piece of an application: e.g., in the global clipboard (for the Interactive Room project) the interaction and control logic is handled by a traditional application. Paths are used solely for data transformation.
The Path is the application: In the Dada project, we developed a simple system where all application/control logic was contained within the path. Only the actual storage and display of the data was handled by non-operators.

Regardless of how paths are used to develop applications, there will be the question of how to connect non-path-aware and non-path devices/programs to paths. To a certain extent, devices can be wrapped by an operator and made to look like part of a path. But how do programs communicate with a path if they are not a part of it? -> Are paths long-lived, allowing clients to send it requests, or are paths short-lived, with a single path per client request? If the latter, who handles creating a path for the client's request? My assumption is that the answer to these questions depends on the particulars of the functionality that's being provided by the path. The architecture should support all options, and what is actually being done should not affect the programming model for individual operators, only the interaction between the "outside world" and the "paths world."

Some Open Issues

The line between type transformers and semantic transformers is very fuzzy. It depends on how much information you are willing to encode in your type system. Is there a point at which you can not add more semantics into your type system? i.e., some operators are fundamentally semantic transformers? Where should we put the line?
How much do we rely on APC? Is there a way to get around the limitations of transformational semantics e.g., a multi-stage APC (first generate a skeleton of semantic transformations, aggregations, etc, and then fill in-the-blanks with type-transformers?)
Longevity of paths: Under what circumstances are short-lived paths (per request?) preferred to long-lived ones? What about the opposite?