Agent Architectures
Patrick Doyle AI Qual Summary June 3, 1997


Table of Contents




Atlantis

[Gat, 1991-1993] ATLANTIS (A Three-Layer Architecture for Navigating Through Intricate Situations). Gat's main thesis is that the argument of planning versus reactivity is really an argument about the use of internal state, which is an argument about making predictions about the world. He proposes that internal state should be maintained only at a high level of abstraction, and that it should be used to guide a robot's actions but not to control them directly.

Internal State, Predictions, and Abstraction

The fundamental problem of state is that every piece of internal state carries with it an implicit prediction that the information contained in this state will continue to be valid for some time to come. These predictions, coupled with the unpredictability of the world, are the root of the problem. The constructive observation is that state should only be used to record facts that are likely to be at least usually correct.

Abstraction is simply a description of something. The level of an abstraction is simply an inverse measure of the precision of the description. The more abstract, the more broadly applicable the information. Gat gives the example of the location of his house (he knows roughly, but not exactly, where it is). By combining abstract state information with local sensors, he is able to find the house or the mustard in his refrigerator.

Actions based on stored internal state sometimes fail. In humans, occasional errors of this sort are not a problem for two reasons: First, people use the information in their world models to guide their actions but not to control them directly. Second, people can tell when things go wrong and can take corrective action. We carefully engineer our environment to eliminate the opportunity to make the sort of mistakes that are difficult to recover from -- falling off cliffs, for example.

The Atlantis Architecture

The ATLANTIS action model is based on operators whose execution consumes negligible time, and thus do not themselves bring about changes in the world but instead initiate processes which then cause change. These processes are called activities, and the operators which initiate (and terminate) them are called decisions.

The architecture consists of three components: the controller, the sequencer, and the deliberator.

The Controller

The controller is responsible for moment-by-moment control of the robot's actuators in response to the current value of the robot's sensors. The controller is a purely reactive control system.

To support the kinds of transfer functions needed to control the reactive robot, the ALFA (A Language For Action) language was developed. ALFA is similar in spirit to REX but the two languages provide very different abstractions. ALFA programs consist of computational models which are connected to each other and to the outside world by means of communications channels. Both the computations performed and their interconnections are specified within module definitions, allowing modules to be inserted and removed without having to restructure the communications network.

The Sequencer

The sequencer is responsible for controlling sequences of primitive activities and deliberative computations. Controlling sequences is difficult primarily because the sequencer must be able to deal effectively with unexpected failures. This requires careful maintenance of a great deal of state information because the sequencer must be able to remember what has been done in the past to decide what to do now.

The fundamental design principle underlying the sequencer is the notion of cognizant failure. A cognizant failure is a failure which the system can detect somehow. Rather than design algorithms which never fail, we instead use algorithms which (almost) never fail to detect a failure.

The sequencer initiates and terminates primitive activities by activating and deactivating sets of modules in the controller. It can also send parameters to the controller via channels.

The sequencer is modeled after Firby's Reactive Action Package (RAP) system. The system maintains a task queue, which is simply a list of tasks the system must perform. Each task contains a list of methods for performing that task, together with annotations describing under what circumstances each method is applicable. A method is either a primitive action or a list of sub-tasks to be installed on the task queue. The system works by successively expanding tasks on the queue until they either finish or fail. When a task fails, an alternate method is tried.

The main difference between the original RAP system and the ATLANTIS sequencer is that the latter controls activities rather than atomic actions. This requires a few modifications to the RAP system. First, the system must insure that two activities that can interfere with one another are not active simultaneously. Second, if a primitive activity must be interrupted the system must be sure it is cleanly terminated (since conflicts are controlled by locking resources).

The Deliberator

The deliberator is responsible for maintaining world models and constructing plans. It performs all manner of time-consuming computations at a high level of abstraction. The results are placed in a database where they are used as an additional source of input data by the sequencer, rather than directly controlling the robot.

Note that all deliberative computations are initiated (and may be terminated before completion) by the sequencer. It typically consists of a set of LISP programs implementing traditional AI algorithms. The function of the sequencer and controller is to provide an interface which connects to physical sensors and actuators on one end, and to classical AI algorithms on the other.


BB1

[Hayes-Roth, 1985-present] BB1 is a blackboard system for general intelligent agent control.

Definitions in BB1

control problem
Which of its potential actions should an AI system perform at each point in the problem-solving process? The control problem is fundamental to all cognitive processes and intelligent systems.


KSAR
Knowledge Source Activation Record. It is similar to an item on a task agenda; it represents a unique triggering of a particular knowledge source by a particular blackboard event. When a KSAR is chosen by the scheduling mechanism, its action executes in the context of its triggering information.


goal
An attribute of a control decision; a predicate (or function) of KSAR attributes.


criterion
A KSAR expiration condition (predicate).


control decision attributes
Characteristics possessed by control decision KSARs, including: Name, Goal, Criterion, Weight, Rationale, Source, Status, etc.


Overview

The central issue to agent design is the control problem: which of its potential actions should a system perform next in the problem-solving process? In solving this problem, an agent system will decide what problems to solve, what knowledge to bring to bear, how to evaluate alternative solutions, when problems are solved, and when to change its focus of attention. In solving the control problem an agent determines its own cognitive behavior.

AI should approach the control problem as a real-time planning problem. BB1 operationalizes intelligent problem solving as achievement of the following goals:

  1. Make explicit control decisions that solve the control problem.
  2. Decide what actions to perform by reconciling independent decisions about what actions are desirable and what actions are feasible.
  3. Adopt variable grain-size control heuristics.
  4. Adopt control heuristics that focus on whatever action attributes are useful in the current problem-solving situation.
  5. Adopt, retain, and discard individual control heuristics in response to dynamic problem-solving situations.
  6. Decide how to integrate multiple control heuristics of varying importance.
  7. Dynamically plan strategic sequences of actions.
  8. Reason about the relative priorities of domain and control actions.

Notice that, in contrast with SOAR, this approach does not consider learning an integral part of intelligent action (it does not purport to address intelligence, just the control problem), and it also makes an explicit distinction between domain and control activity.

The BB1 architecture achieves these goals. Its important features include

Basic Principles of Blackboard Design

The blackboard approach is a problem-solving framework originally developed for the HEARSAY-II speech understanding system. It entails three basic assumptions:

  1. All solution elements generated during problem-solving are recorded in a structured, global database called the blackboard.

    Solution elements are organized according to two axes: level of abstraction, and solution intervals, which are different regions of the solution on some problem-specific dimensions.

  2. Solution elements are generated and recorded on the blackboard by independent processes called knowledge sources.

    Knowledge sources have a condition-action structure. Ordinarily the condition is a particular configuration of elements on the blackboard. The action ordinarily entails the creation or modification of solution elements on the blackboard. Generally the triggering of a knowledge source produces a knowledge-source activation record (KSAR).

    Knowledge sources are independent in that they do not invoke one another and ordinarily have no knowledge of each other's expertise, behavior, or existence. The blackboard architecture achieves simultaneous independence and cooperation among knowledge sources by permitting them to influence one another's problem-solving behavior only indirectly via the information on the blackboard.

  3. On each problem solving cycle, a scheduling mechanism chooses a single KSAR to execute its action.

    The scheduling mechanism determines which KSARs execute and in what order. Commonly the scheduler uses a variety of criteria (e.g., knowledge-source reliability, triggering information credibility, expected value of the action) to choose the KSAR.

Extensions of the Blackboard Control Architecture (BB1)

BB1 extends the basic assumptions of a blackboard system as follows:

  1. The blackboard control architecture defines explicit domain and control blackboards.

    The important point being that the designer can specify the solution intervals, levels of abstraction, and vocabulary on the domain blackboard, while those of the control blackboard are domain-independent and fixed by the architecture.
  2. The blackboard control architecture defines explicit domain and control knowledge sources.

    Note that some of the control knowledge sources are domain-dependent; all KSs are represented as data structures that are available for interpretation and modification.

  3. The blackboard control architecture defines a simple, adaptive scheduling mechanism to manage both domain and control KSARs.

    Three basic control KSs iterate a three-step problem-solving cycle: enumerate pending KSARs, choose one KSAR, and execute the action of that KSAR.

Control Problem-Solving

Control problem solving takes place on the control blackboard. The architecture specifies domain-independent solution intervals (to wit, the position in the sequence of problem-solving cycles) and knowledge sources. It does also allow domain-dependent control knowledge sources.

The abstraction levels on the control blackboard are Problem, Strategy, Focus, Policy, To-Do-Set, and Chosen-Action. Decisions at the first four levels determine desirable actions; To-Do-Set decisions describe feasible actions, and Chosen-Action controls the actions scheduled for execution. In general, heuristics at the top four levels are used to choose among pending KSARs at the To-Do-Set level to schedule them for execution at the Chosen-Action level.

Problem Decisions

A single Problem decision represents the problem the system has decided to solve, and guides the entire problem-solving episode. Creating a Problem decision on the blackboard initiates problem-solving by triggering other domain or control knowledge sources. Changing the Problem's status to solved terminates problem-solving activities.

A Problem's Goal prescribes actions whose knowledge sources apply to its domain (for example, KS Problem-Domain = multiple-task planning). Its Criterion characterizes an acceptable solution (for example, in OPM, this is four requirements: complete solution, all requested tasks planned, all constraints met, plan is efficient).

Strategy Decisions

Strategy decisions establish general sequential plans for problem-solving. Ideally, a single Strategy decision created early in the problem-solving process guides the remainder of the process. Under some circumstances, though, conflicting or complementary Strategies may replace one another or operate simultaneously.

A Strategy's Goal is a prescribed sequence of actions as a predicate on one or more KSAR attribute-description pairs. For example, the Goal Action-Level = successive domain Levels which causes it to generate, in turn, successive Focus levels. The Strategy's Refocus-Criterion is a predicate that, when true, indicates the Strategy should prescribe a new Focus (an example would be complete and satisfactory solution in Focus).

(This is all rather unclear. I think the idea is that the strategy creates a Focus on some level, which holds until some criterion is met, at which point it creates a Focus for another level, and so on until all are satisfied. This implies (to me) a linear approach to strategy, much as you'd expect in a hierarchical planner, by examining successively detailed levels of the plan.)

Focus Decisions

Focus decisions establish local problem-solving objectives to execute KSARs with particular attributes and values. Sometimes a sequence of Focus decisions implements a previously created Strategy decision; other Focus decisions operate independently of one another and prior Strategy decisions. Focus decisions are explicitly temporary and operate during restricted problem-solving time intervals. Several competing or complementary Focus decisions may operate simultaneously.

Focus decisions are used to rate KSARs. A Focus decision's Goal is a predicate or function on one or more KSAR attribute-value pairs (for example, Action-Level = Outcome prescribes execution of KSARs whose actions occur at the Outcome level). Its Criterion is a predicate that, when true, indicates that the Goal is no longer useful (for example, complete solution at the Outcome level). The Weight indicates the expected utility of the prescribed actions.

Policy Decisions

Policy decisions establish global scheduling criteria favoring KSARs with particular attributes and values. In contrast to Focus decisions, Policy decisions usually remain operative from their creation until the end of the problem-solving episode. The only exception is when some other knowledge source independently determines that a Policy should become inoperative.

Like Focus decisions, Policy decisions are used to rate KSARs and can influence scheduling.

To-Do-Set Decisions

To-Do-Set Decisions identify all pending KSARs on each problem-solving cycle. They record a cycle-by-cycle history of all problem-solving actions the system can perform. Because knowledge source Triggers are event-based while Pre-Conditions are state-based (and possibly transient), a To-Do-Set's Goal distinguishes a Triggered-List and an Invocable-List of pending KSARs.

Update-To-Do-Set, a basic control KS (see below) places newly triggered KSARs on the Triggered-List. When their Pre-Conditions become true, it moves them to the Invocable-List. If any Pre-Conditions subsequently become false, it moves them back to the Triggered-List. A To-Do-Set's Only-Cycle is the only problem solving cycle on which it is active.

(I take this to mean that a To-Do-Set decision is actually a device for making a record of all triggered actions in a given cycle. -- POD)

Chosen-Action Decisions

Chosen-Action decisions identify KSARs scheduled to execute on each problem-solving cycle. Thus they record a cycle-by-cycle history of all problem solving actions the system actually performs. A Chosen-Action's Goal is the scheduled KSAR. Its Rationale lists the Focus and Policy decisions that lead to its selection. Its Consequences list the blackboard events produced by executing the KSAR's action. Its Only-Cycle is the only cycle on which it is operative.

Basic Control KSs in Scheduling

The scheduling mechanism comprises three domain-independent, basic control knowledge sources: Update-To-Do-Set, Choose-KSAR, and Interpret-KSAR. The KSARs they generate are invoked directly when their Conditions are satisfied; since they have 'new' or 'changed' decisions in their Conditions, they are guaranteed never to be invoked more than once.

Update-To-Do-Set

Update-To-Do-Set is invoked when the Status of the most recently created Chosen-Action changes to executed. Its Action increments the problem-solving cycle and creates a new To-Do-Set (in practice, it simply updates a single set -- it's space inefficient to keep making new ones) as follows.

Update-To-Do-Set sets the new To-Do-Set's Only-Cycle to the current cycle. It sets its Goal to the Goal of the most recent prior To-Do-Set, removing the executable KSAR from its Invocable-List. It then examines all KSs that specify the current Problem's Problem-Domain and all blackboard events listed as the Chosen-Action's Consequences. For every unique pairing of a KS and an event that satisfies the KS's Trigger, Update-To-Do-Set creates a new KSAR and adds it to the To-Do-Set's Triggered-List. Those whose Pre-Conditions are true are moved to the Invocable-List. Then it evaluates the feasibility of all the system's own potential actions except those of the basic control knowledge sources.

Update-To-Do-Set also computes and records each KSAR's Ratings against operative Focus and Policy decisions, and its Priority.

Choose-KSAR

Choose-KSAR is triggered by the creation of a new To-Do-Set. Its Action schedules a KSAR according to the current scheduling rule (recorded in Policy1 -- I take this to mean that Policy1 has the rule, which may be something like sum of all weights according to any Policy and Focus decisions; very ad hoc, if you ask me).

After choosing a KSAR, Choose-KSAR reevaluates the Pre-Conditions on the chosen KSAR. If any of them is no longer true, it moves the KSAR back to the Triggered-List and chooses another KSAR. When it finds a KSAR whose Pre-Conditions are true, it creates a Chosen-Action whose Goal is the chosen KSAR, Rational is the list of Foci and Policies that favor it, Status is Scheduled and Only-Cycle is the current cycle.

Interpret-KSAR

Interpret-KSAR is triggered by the creation of a new Chosen-Action. Its Action interprets and executes the Action of the KSAR recorded as the Chosen-Action's Goal. It also records the Chosen-Action's Consequences (resulting blackboard events) and changes its Status to 'Executed.' Thus it executes all system actions except those of the three basic control KSs.

Comparisons with Other Systems

HEARSAY-II had a fixed scheduling system that preferred reliable, efficient knowledge sources and credible triggering information. It had no capacity for modifying its scheduling system, so it was not flexible in that respect -- it did not support dynamic control, change in heuristic control granularity, etc.

HEARSAY-III [Earman, London, Fickas] is a general blackboard environment that has separate domain and control blackboards, with a default low-level scheduler that simply schedules any pending KSAR. It provides no specific mechanism for supporting effective control behavior; however, BB1 could be implemented in HEARSAY-III.

BB1 as an Agent Architecture

"Intelligent agents" continuously perform three functions: perception of dynamic conditions in the environment; action to affect conditions in the environment; and reasoning to interpret perceptions, solve problems, draw inferences, and determine actions. In order to function effectively in AIS niches, agents must possess a pervasive property of human behavior: adaptation.

Adaptation of Perceptual Strategy

Two main ideas here. First, the agent maximizes its vigilance while avoiding perceptual overload, by using feedback control and predictive control to manage the global data rate underlying the perceptual strategy. Second, it acquires useful data, while avoiding distraction, by using dynamic control plans to adapt the relevance and abstraction parameters underlying perceptual strategy.

Adaptation of Control Strategies

Important here is the diagram: the axes are environmental uncertainty and constraint on effective action.

Adaptation of Reasoning Methods

Again with the diagrams. Axes are the availability of run-time data and the availability of a model.

BB1 in CAIT

Intelligent agents in CAIT use the BB1 dynamic control architecture as their cognitive architecture. Basically, BB1 runs a three-step cycle: (a) an agenda manager notices situation-appropriate behaviors triggered by recent perceptual and reasoning events; (b) a scheduler chooses the best available behavior based on the current control plan; and (c) the executor executes the chosen behavior.

The control plan is a key feature of BB1. Because it can be more or less abstract, the scheduler may be able to choose among several logically acceptable behaviors. There can be multiple simultaneous control plans, whose combined weights determine which available action is best. Each component plan can have its own temporal, sequential, or hierarchical structure. Finally, since the control plan is a data structure, executed actions can modify it, thereby modifying the criteria used to schedule future actions.

Problems with BB1




Maes' Network Architecture

[Maes, 1989] The network architecture is an approach to solving the problem of action selection in an agent. It integrates conceptual elements of connectionism and traditional AI approaches, though cannot be classified as a "hybrid" in the traditional sense. It uses a connectionist computational model on a symbolic, structured representation.

In particular, from connectionism it takes robustness, intrinsic parallelism, density, and global emergent computation from local interactions, without relying entirely on learning and classification. From symbolic AI, it uses representation and structuring principles, so the links and nodes have specific, well-understood causal meanings, but it avoids the brittleness and rigidity of traditional AI systems.

The architecture is motivated by Minsky's "Society of Mind" theory, which argues that intelligent systems can be built as societies of interacting, mindless agents, each having its own specific competence.

Architecture of the Network

The network consists of a set of nodes connected via links. Each node represents a competence module, which has an add list, a delete list, a precondition list, and an activation level. Nodes are connected to other nodes via three kinds of links:

Activation energy is spread to the nodes from the environment, from the global goals of the system, and between nodes. In addition, inhibition (removal of activation) is produced by protected goals and via conflicter links. Specifically,

The basic algorithm is to spread activation from goals, state and protected goals; compute activation among modules; use a decay function to keep activation constant overall; lastly, pick the executable node with the highest activation level that is over some threshold If no such node exists the activation threshold is reduced by 10%.

Two points about balance. To keep activation fair with respect to the number of preconditions, etc., the amount of activation spread is divided by n, where n is the number of (preconditions/add list propositions/delete list propositions) in the target, depending on the kind of spreading. Also, when spreading does occur to multiple competence modules, the amount of energy spread is evenly divided among the target modules.

In the reasoning process, goal-relevance of the selected action is obtained through the input from the goals and the backward spreading of activation. Situation relevance and opportunistic behavior are obtained through the input of the state and the spreading of activation forward.

Learning in Behavior Networks

The learning algorithm models much of what is called operant conditioning in the animal behavior literature. Rather than learning new behaviors, it instead learns their effects, the better to decide when they should be applied.

The learning task can be defined as follows. Given a set of competence modules with an initial condition list, a set of initial links (with reliability estimates) and a set of motivations (positive and negative ones), and given a noisy, dynamic, non-deterministic environment, the task is to have every module learn its conflictor links, successor links, and predecessor links with their respective reliabilities.

Modifications to the Behavior Network

All links are augmented with weights, fractions of the form S/T, where S represents the number of successes, and T the number of trials with that link. E.g., if there is a predecessor link for a condition from one module to another, then the weight represents the number of times activating the latter achieved the condition for the former. The weights of the links represent the reliability of a particular causal relation between modules.

The weights of links are used in the spreading activation process: when one module spreads activation energy to another one, it not only spreads an amount proportional to its own activation level, but also multiplies this amount by the weight of the link.

A second change made to the model is that instead of selecting the most active module whose activation level surpasses the threshold, we now wait until the network comes to an equilibrium, and then select one of the executable modules randomly with a probability proportional to their activation levels.

The Learning Algorithm

Link weights are initially set to some value S0/T0. The smaller the value of T0, the more significant any changes will be, so 1/2 is very different from 5000/10000, for example.

Creating Predecessor or Conflictor Links

Every competence module constantly monitors its conditions. Whenever a condition changes state, the module holds the module that was most recently active as responsible for this change of state. If there was no link yet from the module to the last module for the condition that changed value, a new link is created with initial weight S'/T'. If the change of state was such that the condition changed from being observable (true) to being unobservable, then the link created is a conflictor link. If the condition became observable, the link is a predecessor link.

Reinforcing Predecessor or Conflictor Links

If there already exists a link between the monitoring module and the most recently active module, then the weight of the link is increased; specifically, the weight goes from S/T to S+1/T+1. Notice that such a value will converge to 1 if there is a strong correlation.

Weakening Predecessor or Conflictor Links

Every module monitors its existing predecessor and conflictor links. Whenever it notices a counter-example for a link, i.e., the condition did not change state as the predecessor or conflictor module became active, the link weight is changed from S/T to S/T+1.

Handling Successor Links

Every module also monitors which module is active right after the monitoring module has been active itself. If it did not yet have a successor link to that module, it creates one with initial weight S'/T'. If there was already a weight, it is increased from S/T to S+1/T+1. The weights of all other successor links are decreased from S/T to S/T+1.


The Oz Architecture

[Bates et al., 1984-present] The first (?) comprehensive believable agent architecture, used in the Oz project at Carnegie-Mellon University. Their central goal is to allow users to suspend disbelief; thus instead of demanding that their agents be especially active and smart, they only require that they not be clearly stupid or unreal.

The primary capabilities Oz agents should have are: perception, reactivity, goal-directed behavior, emotion, social behavior, natural language analysis, and natural language generation.

Oz Worlds

An Oz world has four components. There is a simulated physical environment, a set of automated agents to populate the world, a user interface to allow one or more users to participate in the world, and a planner concerned with the long-term structure of the user's experience.

Tok

Tok is the Oz agent architecture. It assigns the agent's tasks to several communicating components. Perception, while partially task specific, is also in part handled by a pair of systems called the Sensory Routines and the Integrated Sense Model. Reactivity and goal-directed behavior are handled by Hap. Emotion and social relationships are the domain of Em. Language analysis and generation are performed by Gump and Glinda, respectively.

Perception

Raw sense data is extracted from the world and recorded by Sensory Routines. The relationships between objects are represented as links, thus creating a topological graph of the newly encountered world fragment. The new data is marked with the agent's internal notion of time. When Hap behaviors execute, this low-level memory of raw sense data can be queried for information such as "have I seen food in the kitchen in the last ten minutes?".

After raw data is recorded in the Sensory Routines, an attempt is made to merge them into the Integrated Sense Model (ISM), which maintains the agent's best guess about the physical structure of the whole world.

Note that sense data are either active or passive in Oz worlds. Active sense data model events of sufficient intensity and intrusiveness that they will be sensed by the agent. Passive data is intended to model events which, in order to sense, require action on the part of the agent.

Hap

Hap is Tok's goal-directed, reactive engine. It is similar in intent to the work of Brooks and Agre and Chapman, but aims to capture a "richer range" of high-level actions. It continuously chooses the agent's next action based on perception, current goals, emotional state, behavioral features and other aspects of internal state. Goals in Hap contain an atomic name and a set of parameters that are instantiated when the goal becomes active; for example, (goto <object>).

Goals do not characterize world states to accomplish, and Hap does no explicit planning. Instead, sets of actions (called plans) are chosen from an unchanging plan memory which may contain one or more plans for each goal. These plans are either ordered or unordered collections of subgoals and actions which can be used to accomplish the invoking goal. Plans have testable preconditions which are true when the plan could apply in the current state of the world. If a plan fails, Hap will attempt any alternate plans for the given goal.

Active Plan Tree

Hap stores all active goals and plans in a structure called the active plan tree (APT). This is an AND-OR tree, with alternating layers of goals and plans representing the current execution state. Each plan node in the tree has some number of goal nodes as its children. These goal nodes are the subgoals which must be satisfied for the plan to succeed. Each goal node either has no children (and is therefore a leaf node) or its child is the current plan for the goal. The APT expands and contracts as goals and plans succeed and fail.

A plan node accomplishes its purpose (it succeeds) once all of its children succeed. It fails if there is no way for one of its subgoals to succeed. A plan can have an explicit success test which must also evaluate to true for the plan to succeed. This test is evaluated after the plan has finished. Plan nodes can either be sequential or parallel; the subgoals of a sequential plan node must be accomplished in order, while the subgoals of a parallel plan may be accomplished in any order.

At the root of the tree is a special node, parent to all of the top-level goals of the agent. These top-level goals are classified according to whether they are persistent. Persistent goals are continuous goals and are never removed from the tree.

Each goal node has an associated success-test and priority. The success-test is used to recognize when a goal's purpose in the enclosing plan is spontaneously achieved. It is an expression about the state of the world. If it succeeds, the node and the subtree rooted at the node are removed. Success tests are sufficient but not necessary for a goal to succeed. The priority is a number representing how important the goal is to the agent. Preferring higher priority goals is one of the methods used by the architecture to arbitrate between multiple goals.

Each plan node carries an associated context condition and specificity. The context condition must always be true for the plan to have any chance of making sense and achieving its goal. Thus it is necessary but not sufficient for the plan to work. Specificity is a measure of how specific each plan is.

Every instance of a goal has a priority number used when choosing a goal to execute and an importance number used by Em when considering the significance of the goal. These annotations are assigned to instances rather than types of goals because goals may have different values depending on the context in which they arise.

In the Lyotard system, related goals and plans are clustered into behaviors. Each behavior represents a recognizable, internally coherent unit of action. These behaviors are usually activated by a single goal.

Plan Memory

Plan memory is a set of production rules. The left-hand side of each production is composed of a goal expression and a precondition expression. The goal expression contains the goal name and zero or more variables. For a subgoal to match a goal expression, the names must match and the number of variables must match. In addition, the precondition expression must be true in the world state for the production to be applicable. The precondition expression can reference the values from the goal expression variables in its tests.

The right-hand side of a production contains its context condition, specificity and plan expression. Together they are instantiated to create the plan and subgoal nodes for the APT. Each plan step contains a goal expression, a priority modifier and a success-test expression. The goal expression and success test expression are used directly to create the appropriate goal node, and the priority modifier is added to the priority of the parent goal node to create the priority of this node. In this way a subgoal can be more, less or the same level of importance as the goal that spawned it.

Theory of Activity

The execution loop consists of three steps: adjust the APT based on changes in the world, pick a leaf goal to execute, and execute the goal. Executing the goal can take the form of performing a primitive act, or expanding a subgoal.

After sense data is processed, Hap revises the APT. All context conditions and success tests are evaluated. Goals whose success test is true and plans whose context condition is false are removed. Next a leaf node is chosen by a goal arbiter that prefers high priority goals and prefers continuing a line of expansion among goals of equal priority. If the chosen goal is a primitive action, it is executed. Otherwise, it is a subgoal, in which case the plan library is indexed and the plan arbiter chooses a plan for the new goal from those whose preconditions are true. The plan arbiter will not choose plans which have already failed to achieve this goal instance, and prefers more specific plans over less specific ones. After execution, the execution loop repeats.

Em

Em models social and emotional aspects of the agent, based on the work of Ortony. Like that work, Em develops emotions from a cognitive base: external events are compared with goals, actions are compared with standards, and objects are compared with attitudes.

As Hap runs, goals are created, succeed, and fail. As these events occur, Hap informs Em, and Em in turn uses this information to generate many of its emotions. Happiness and sadness occur when the agent's goals succeed or fail. The degree of happiness or sadness depends upon the importance of the goal. Hope and fear occur when Em believes there is some chance of an active goal succeeding or failing. The amount is determined by a function of the goal's importance and the believed likelihood of success or failure.

Pride, shame, reproach, and admiration arise when an action is either approved or disapproved. These judgments are made according to the agent's standards, which represent moral beliefs and personal standards of performance. Pride and shame occur when the agent itself performs the action; admiration and reproach in response to others' actions.

Anger, gratitude, remorse and gratification arise from combinations of other emotions. So, for example, sadness and reproach combine to produce the composite emotion of anger.

Em's final two emotions, love and hate, arise from noticing objects toward which the agent has positive or negative attitudes. Emotions (but not attitudes), should fade with time, and Em models this decay.

Behavioral Features

Behavioral features modulate the activity of Hap. Em adjusts the features to express emotional influence on behavior. It continuously evaluates a set of functions that control certain features based on the agent's emotional state. Hap modifies the features when it wants to force a style of action. Note that either Hap or Em can change the features to control how Hap achieves its goals.

Features may influence several aspects of Hap's execution. They may trigger demons that create new top-level goals. They may occur in the preconditions, success tests, and context conditions of plans, and so influence how Hap chooses to achieve its goals. Finally, they may affect the precise style in which an action is performed.

Examples of features in Lyotard include curious, belligerent, persistent, timid, and arrogant.


Pengi

Definitions in Pengi

[Agre and Chapman, 1988-1990] Human beings routinely engage in the process of planning, which involves rearrangement, interpolation, disambiguation, and substitution. Computer-based planners instead often presumes an intelligent Planning system feeds its results to a dumb Executor. But this sort of approach does not scale to the properties of the real world -- it is complex, uncertain, and immediate.

The hypothesis is that most activity derives from very simple sorts of mental machinery interacting with the immediate situation, producing apparently planful activity without building explicit models of the world.

routines
Patterns of interaction between an agent and its world. It is not a plan or procedure, and is typically not represented by the agent; an agent can enter into a particular routine without representing it because of regularities in the world. (Example of kicking blocks and running.)


indexical-functional aspects
Properties of the immediate situation. Registering and acting on aspects is an alternative to representing and reasoning about complex domains, and avoids combinatorial explosion. An example is I'm-adjacent-to-my-chosen-projectile.


indexical-functional entities
Basically, entities referred to by terms whose targets vary with the situation (e.g., the-block-I'm-pushing). Conceptually, entities are the noun phrases to the aspects' sentences.


Pengo: The Game

Pengo is a video game played on a two-dimensional maze made of unit-sized ice blocks. The player navigates a penguin around the maze. Bees chase the penguin, and kill him if they get close enough. The penguin and the bees can change the maze by kicking ice blocks to make them slide; if a block slides into a creature, it dies.

Indexical-Functional Aspects

Pengi's activity is guided by relevant properties of the immediate situation (called indexical-functional aspects). The indexicality of these aspects arises from their dependence on the player's current circumstances. They are functional because they are relative to the player's purposes.

Note that this approach does not represent individuals (the-bee-I'm-chasing might change from moment to moment), and therefore avoids the overhead of instantiation: binding constants to variables. In most KR systems, to decide to kick a block at a bee requires reasoning from some general statement that in every situation satisfying certain requirements there will be some bee and some block that should be kicked at it. To make this concretely useful, it must be instantiated with particular individuals.

Entities are not logical categories because they are indexical: their extension depends on the circumstances.

Machinery of the System

The simple architecture used in Pengi consists of a central system and peripheral systems. The central system is responsible, roughly, for cognition: registering and acting on relevant aspects of the situation. The peripheral systems are responsible for perception and effector control.

It's assumed that combinational networks are an adequate central system for most activities.

Visual Routines

[Ullman, 1963] developed a theory of vision based on visual routines which are patterns of interaction between a central system and a visual routines processor (VRP). The VRP maintains several modified internal copies of the 2-D sketch produced by early vision. It can perform operations on these images, such as coloring in regions, tracing curves, keeping track of locations in the image, indexing interesting features, and tracking moving objects.

Outputs from the central network guides the VRP's operations, and its outputs are inputs to the network. A visual routine, then, is a process whereby the VRP, guided by the network, finds entities and registers aspects of the situation, and finally injects them into the inputs of the network.

Action Arbitration

Actions are suggested on the basis of local plausibility, but suggested actions may conflict. In such cases, only one action must be selected. Usually, which action to take will depend on aspects of the situation; sometimes, certain actions always take precedence over others. The system employs levels of arbitration in which actions are suggested, may be overruled; the overruling can be overruled or counter-proposals may be suggested, and so on.

This approach is more efficient than Planning because it neither requires representation nor search of future worlds. Note this implies a lack of state; this is not to say that state is never needed, but that it is much less critical than is often assumed.

Plans

There are two fundamental kinds of plans: plans-as-programs, in which plan use is simply a matter of executing a fixed sequence of actions, and plans-as-communications, which require improvisation in a dynamic environment.

Plans as Programs

This view takes plan use to be like program execution, in which a set of primitives (such as PUT-ON) are composed into a structure which is then executed in a mechanical manner, with little or no new reasoning performed during execution. This view implies domain-independent plan execution.

Four things weighing against this view are:

  1. It poses computationally intractable problems.
  2. It is inadequate for a world characterized by unpredictable events.

    In particular, if interactions don't take place as expected, the plan will fail. Reasons to abort or revise a plan can be divided into two categories: contingencies (the door is closed) and opportunities (Fred's driving that way already). Plan executives with no domain knowledge cannot effectively make use of such opportunities, and similarly cannot detect all contingencies, because some will be independent of precondition failures.
  3. It requires that plans be too detailed.
  4. It fails to address the problem of relating the plan text to the concrete situation.

Perhaps it is better to think of agents as participating in the flow of events. An embodied agent must lead a life, not solve problems; the real world does not provide such neat packages.

Plans as Communication

The details of the thesis are obscure to me. The idea is that, rather than being given a worked-out set of instructions to follow, instead the user is given a set of situated recommendations or guidelines, that are situationally appropriate either for the time they are given or the time they are meant to be used.

The domain of video games is explored. Video game instructions from one player to another can be highly compact because their possible meanings are constrained by

Communication doesn't pick up a "meaning" from my head and set it down in yours. Instead, communication is part of the work of maintaining a common reality.


PRODIGY

[Carbonell, Knoblock, and Minton, 1990's] The PRODIGY system is an integrated architecture unifying problem solving, planning and multiple learning methods.

Definitions in PRODIGY

operator
Operators map states into states. A PRODIGY operator is an FOL expression describing the operator's preconditions, coupled with conditional add and delete lists representing changes.


inference rule
Inference rules add information explicitly to the state that was implicitly in the deductive closure. PRODIGY represents inference rules as simplified operators without delete lists, though their epistemological status is different.


control rule
Control rules map a set of candidate decisions into a smaller or prioritized decision set. Control rules can access any knowledge source because they have access to the internal state of the problem solver. Control rules may be entered by hand or acquired by the learning process.


control knowledge
Any kind of information that reduces search.


solution
A sequence of instantiated operators which, if applied to the initial state, will produce a state satisfying the goal.


problem solving trace
The entire search tree, including failed attempts, generated in the process of discovering the final solution sequence.


Hypotheses of PRODIGY

PRODIGY as an Integrated Architecture

PRODIGY has the following characteristics:

Architectural Overview

A summary of each PRODIGY module:

These modules all use a common representation based on first-order logic, and all access global knowledge sources. Any learned knowledge produced by one module is directly exploitable by the general problem solver and open to examination by another learning module.

The Problem Solver

PRODIGY's basic reasoning engine is a general-purpose problem solver and planner that searches for sequences of operators (i.e., plans) that accomplish a set of goals from an initial state description. Search control in PRODIGY is governed by a set of control rules that apply at each decision point: selection of which node to expand, which goal to work on, which operator to apply, and which objects to use. Search control rules may be general or domain specific, hand-coded or automatically acquired, and may consist of heuristic preferences or definitive selections. In the absence of any search control, PRODIGY defaults to depth-first means-ends analysis.

Each problem-solving operator has a precondition expression that must be satisfied before the operator can be applied, and an effects-list that describes how the application of the operator changes the world. Precondition expressions are well-formed formulae in PDL, a form of predicate logic encompassing negation, conjunction, disjunction, and existential quantification, as well as universal quantification over sets. The effects-list indicates atomic formulae that should be added or deleted from the current state when the operator is applied.

At any particular moment, the current state of the world is represented by a database of ground atomic formulae. There are two types of relations used in the system, primitive relations and defined relations. Primitive relations are directly observable or "closed-world," so their truth values can be immediately ascertained by a call to the state database. Primitive relations may be added to and deleted from the state database by operators. Primitive relations that are static (such as time relations, for example) cannot be changed by any operator and can be computed on demand.

Defined relations (such as IDLE or CLAMPABLE) are inferred on demand using inference rules. The purpose of defined relations is to represent useful abstractions in the domain, allowing operator preconditions to be expressed more concisely. Inference rules only change the system's internal knowledge state, whereas operators change both the external world and the system's internal state.

Control Structure

The problem-solving process begins with a search tree containing one node representing the initial state, and a set of initial goals that must be achieved to satisfy the goal expression. The tree expands by repeating the following steps:

  1. Decision phase -- There are four types of decisions. First is what node to expand next (defaulting to depth-first). Each node is a set of goals and a state describing the world. After a node is chosen, one of the node's goals must be selected, and then an operator relevant to the goal must be chosen. Finally, a set of bindings for the parameters of the operator must be chosen.

  2. Expansion phase -- If the instantiated operator's preconditions are satisfied, the operator is applied. Otherwise, PRODIGY subgoals on the unmatched preconditions. In either case a new node is created.

Search terminates after creating a node whose state satisfies the top-level goal expression.

Control Rules

As PRODIGY attempts to solve a problem, it must make decisions about which operator to use and which subgoal to pursue. These decisions can be influenced by control rules for the following purposes:

  1. To increase the efficiency of the problem solver's search.
  2. To improve the quality of the solutions that are found. (e.g., preferences for more reliable, less costly, etc.)
  3. To direct the problem solver along paths it would not explore otherwise.

PRODIGY's reliance on explicit control rules distinguishes it from most domain-independent problem solvers. PRODIGY expects that any important decisions will be guided by the presence of appropriate control knowledge. If no control rules are relevant, PRODIGY makes a quick, arbitrary choice. If a wrong choice is made and costly backtracking proved necessary, PRODIGY will attempt to learn the control knowledge that must be missing. The rationale for this casual commitment strategy is that control rules should be present for significant decisions. (The problem solver should not attempt to be clever without knowledge.)

Control rules can be employed to guide the four decisions mentioned in the previous section. Each control rule has a left-hand side condition testing applicability and a right-hand side indicating whether to SELECT, REJECT, or PREFER a particular candidate.

To make a control decision, given a set of candidates, PRODIGY first applies applicable selection rules to select a subset of the candidates. Next, rejection rules further filter this set by explicit elimination of particular remaining candidates. Finally preference rules are used to order the remaining alternatives. The most preferred candidate is one over which no other candidate is preferred; if backtracking is needed, candidates are tried in order of preference.

Explanation-Based Learning in PRODIGY

Explanation-based learning is a way for a problem solver to improve effectiveness by learning methods for solving problems. However, EBL is not guaranteed to improve problem solving. Indeed, in may cases performance may actually degrade. The trouble is that there is a hidden cost associated with such learned control knowledge -- the cost of determining whether it is applicable. To produce real efficiency improvement, an EBL program must generate control knowledge that is effective -- its benefits must outweigh its costs. PRODIGY considers what kinds of concepts are worth learning, how best to represent the knowledge, and how useful it is with respect to the set of training problems. In this sense PRODIGY is a deliberative rather than reflexive learner such as SOAR.

EBL and the Utility Problem

EBL begins with a high-level target concept and a training example for that concept. Using a domain theory (a set of axioms describing the domain) one can explain why the training example is an instance of the target concept. By finding the weakest conditions under which the explanation holds, EBL will produce a learned description that is both a generalization of the training example and a specialization of the target concept. The learned description must satisfy the operationality criterion, a test insuring that the description will be an efficient recognizer for the target concept.

The actual purpose of EBL is not to learn more about the target concept, but to re-express the target concept in a more "operational" manner. The following factors can contribute to performance degradation:

Addressing the Utility Problem

After each problem-solving episode, explanations are generated in a three-stage process. First, the system considers what to learn from a problem-solving trace, then it considers how to best represent the learned information, and finally it empirically tests the utility of the resulting control rules to insure that they are actually useful.

Selecting a Target Concept

In PRODIGY, EBL can either be interleaved with problem solving or postponed until after the current problem has been solved. In either case, the EBL process begins by examining the problem-solving trace to pick out examples of PRODIGY's target concepts. There are four kinds of concepts; each kind has a variation for nodes, goals, operators and bindings.

  1. SUCCEEDS: A control choice succeeds if it leads to a solution. Learning about successes results in preference rules.

  2. FAILS: A choice fails if there is no solution consistent with that choice. Learning about failures results in rejection rules.

  3. SOLE-ALTERNATIVE: A choice is a sole alternative if all other candidates fail. Learning about sole alternatives results in selection rules.

  4. GOAL-INTERFERENCE: A choice results in goal interference if all solutions consistent with that choice delete a condition that must subsequently be reachieved. Learning about goal interactions results in preference rules.

After an example of a target concept is selected, PRODIGY constructs an explanation using a set of architecture-level axioms that serve as its theory of the problem solver, and a set of domain-level axioms that are automatically derived from the domain operators.

PRODIGY constructs the explanation by specializing the target concept in accordance with the example using a method called explanation-based specialization (EBS) [Minton]. To specialize a target concept, EBS retrieves an axiom that implies the concept (all axioms are implications) and recursively specializes all its nonprimitive subconcepts. The algorithm operates in a manner similar to that of a top-down parser. In order to choose among appropriate axioms, PRODIGY allows explicit discriminator functions to be associated with the theory. These functions aid the explanation construction process by examining the search tree and selecting the axioms corresponding to the example.

PRODIGY learns meta-level concepts (such as SUCCEEDS) that describe arbitrary problem-solving phenomena, rather than learning from solution sequences. This is a significant difference between it and other systems.

Compression: Improving an Explanation

The purpose of compression is to reduce the match cost of the learned descriptions (and thereby increase the utility of the resulting search control rule). After the initial learned description is formed, PRODIGY's compression module employs partial evaluation, simple logical transformations, and finally a theorem prover which can use domain-specific simplification axioms.

Evaluating the Utility of an Explanation

Utility is given by this formula:

Utility = (AverageSavings x ApplicationFrequency) - AverageMatchCost

After learning a control rule, the system produces an initial estimate of the rule's utility based on the training example that produced it. PRODIGY compares the cost of matching the rule against the cost of exploring the portion of the search tree that need not be considered because of it. If the new rule appears useful, it is included in the active set. The utility estimate is empirically validated by maintaining statistics on the rule during execution, and if utility becomes negative the rule is forgotten.

Abstraction in PRODIGY

PRODIGY is capable of planning at different levels of abstraction. Basically, it picks an "appropriate" level of abstraction, then refines its abstract plans. Abstraction spaces are formed by ignoring certain properties in the original problem space. That is, each level is an abstract model of lower ones, rather than simply dropping preconditions or goals.

(There is some lengthy discussion of using the ALPINE system [Knoblock] in generating abstraction spaces, which I don't summarize here. -- POD)

Additional Learning Methods in PRODIGY

Applications of PRODIGY

PRODIGY has been applied to many domains: robotic path planning and assembly, the blocks world, the STRIPS domain, an augmented version of the STRIPS domain, matrix algebra manipulation, machine-shop scheduling, multi-agent robot planning, and others.


PRS

[Georgeff and Lansky, 1987] The Procedural Reasoning System (PRS) is a reasoning system that can reason and plan in continuously changing environments. Emphasis is placed on the fact that it can reason effectively about its actions and plans while taking into account bounded resources. The system was initially developed to deal with malfunction handling in the space shuttle's reaction control system (RCS).

PRS is compared to Firby's RAP, which is similar but lacks "sufficiently powerful" mechanisms for balancing decision-making requirements against time and information constraints. Brooks and the situated automata approach, they argue, are reactive but do not deal with general problem-solving and commonsense reasoning issues. All are needed in a general agent architecture.

Definitions in PRS

intrinsic goal
A goal that is not a means to an already intended end.


operational goal
A subgoal of some already existing intention.


Requirements for Situated Reasoning Systems

The essential features of a situated reasoner are:


The System Architecture

PRS consists of

  1. a database containing current beliefs or facts about the world
  2. a set of current goals to be realized
  3. a set of plans (called Knowledge Areas) describing how certain sequences of actions and tests may be performed to achieve given goals or to react to particular situations
  4. an intention structure containing those plans that have been chosen for [eventual] execution

An interpreter (or inference mechanism) manipulates these components, selecting appropriate plans based on the system's beliefs and goals, placing those selected on the intention structure, and executing them.

The system interacts with the environment through the database (which acquires new beliefs in response to changes in the environment) and through the actions it performs as it carries out its intentions.

The System Database

The contents of the database represent the current beliefs of the system. These include static facts about the world (e.g., the boiling point of water) as well as beliefs acquired during execution, such as current observations about the world or conclusions derived by the system from such observations. The knowledge contained in the database is represented in first-order calculus.

State descriptions that describe internal system states are called metalevel expressions. These typically describe the beliefs, goals, and intentions of the system, as well as other processing information.

Goals

Goals are expressed as conditions over some interval of time (i.e., some sequence of world states) and are described by applying various temporal operators to state descriptions. This allows the representation of a wide variety of goals, including goals of achievement, maintenance, and tests of conditions. A given action (or sequence of actions) is said to succeed in achieving a goal if its execution results in a behavior that satisfies the goal description.

As with state descriptions, goal descriptions are not restricted to specifying desired behaviors of the external environment but can also characterize the internal behavior of the system. Such descriptions are called metalevel goal specifications.

Knowledge Areas

Knowledge about how to accomplish goals or react to situations is represented by declarative procedure specifications called Knowledge Areas (KAs). Each KA consists of a body, which describes the steps of the procedure, and an invocation condition, which specifies under what situations the KA is applicable. Together, the invocation condition and body of a KA express a declarative fact about the results and utility of performing certain sequences of action under certain conditions.

The body of a KA is a plan or plan schema. It is represented as a graph with one distinguished start node and one or more end nodes. The arcs in the graph are labeled with the subgoals to be achieved in carrying out the plan. Successful execution of a KA consists of achieving each of the subgoals labeling a path from the start to an end node. If multiple arcs emanate from a node, any arc may be followed, and if it fails other arcs may be tried. This formalism allows conditional selection, iteration, and recursion, among other things.

The invocation condition contains a triggering part describing the events that must occur for the KA to be executed. Usually these consist of the acquisition of some new goal or some change in system beliefs, and may involve both.

Some KAs have no bodies; these so-called primitive KAs have some primitive action associated with them that is directly performable by the system.

PRS not only has KAs with procedural knowledge about a particular task domain, but also includes metalevel KAs -- that is, information about the manipulation of the beliefs, desires, and intentions of PRS itself. For example, typical metalevel KAs encode various methods for choosing among multiple applicable KAs, modifying and manipulating intentions, and computing the amount of reasoning that can be undertaken given the real-time domain constraints.

The Intention Structure

The intention structure contains all those tasks the system has chosen for execution, either immediately or at some later time. These tasks are called intentions. A single intention consists of some initial KA together with all the [sub] KAs that are being used in an attempt to successfully execute that KA.

At any given moment, the intention structure may contain a number of intentions, some of which may be suspended or deferred, some of which may be waiting for certain conditions, and some of which may be metalevel intentions for deciding among various alternative courses of action.

The set of intentions comprising the intention structure form a partial ordering with possibly multiple least elements (roots). An intention earlier in the ordering must either be realized or dropped (i.e., removed from the intention structure) before intentions appearing later in the structure can be executed. This precedence relationship between intentions allows the system to establish priorities, etc.

The System Interpreter

The interpreter runs the entire system. At any particular time, certain goals are active in the system and certain beliefs are held in the database. Given these, a subset of KAs will be applicable (i.e., invoked). One or more of these applicable KAs will be chosen for execution and thus placed in the intention structure.

To determine a KA's applicability, beliefs and goals are matched with invocation conditions via unification, rather than using deduction. However, PRS is always able to perform deductions by invoking appropriate metalevel KAs. These metalevel KAs are themselves interruptible, to maintain the system's reactivity.

Once selected, the chosen KAs are inserted into the intention structure. If a selected KA arose due to the acquisition of a new intrinsic goal or a new belief, it is inserted into the structure as a new intention. Otherwise, the KA instance must have arisen as a result of some operational goal of some existing intention, and will be pushed onto the stack of KAs comprising that intention.

Finally, one of the roots of the intention structure is selected for further execution. The next step of that intention will comprise either a primitive action or one or more unelaborated subgoals. If the former, the action is performed; if the latter, these subgoals are posted as new operational goals of the system.

At this point the interpreter cycle begins again: the newly established goals and beliefs trigger new KAs, one or more of these are selected and placed on the intention structure, and finally an intention is selected from that structure and partially executed.

Integrating Decision-Making and Means-Ends Reasoning

PRS is designed to function acceptably as an embedded reasoning system no matter how little decision knowledge is provided; the more knowledge, the more effective it is, but always with a guaranteed upper bound on reaction time.

This is accomplished by incorporating into the interpreter certain fixed decision-making processes that are stringently bounded in execution time, yet which can be overridden whenever the system can bring more powerful decision-making knowledge to bear. In PRS, the fixed decision-making processes are hardwired into the basic system interpreter, where the knowledge to override or elaborate these decisions is contained in appropriate metalevel KAs.

The way metalevel KAs are brought to bear on a problem is through their invocation criteria. Typically, such conditions depend on the system's internal state (though they may also depend on the world state). Such conditions might include, for example, the applicability of multiple KAs in the current situation, the failure to achieve a certain goal, or the awakening of some previously suspended intention.

In the process of execution, metalevel KAs typically make choices about the adoption of lower-level KAs, the posting of alternative subgoals, or the best way to respond to goal failures. Thus, when decisions have to be made in the absence of information, the interpreter defaults to some fixed decision procedure. However, if metalevel KAs can bring knowledge to bear on the problem, this will override the default.

This allows PRS to have a simple, basic interpreter, without it needing to consider more than the most common situations. Note, though, that the decision-making behavior of PRS is strongly influenced by the choice of invocation conditions of meta-level KAs; if they are such that meta-KAs are often invoked, the system will spend more time making decisions than acting (i.e., behaving cautiously). If, on the other hand, meta-level KAs are rarely invoked the system will behave in a bold manner with respect to its world.

Intentions in Detail

Intention States

In PRS, an intention can be in one of three possible states: active, suspended, or conditionally suspended. An active intention is one that is available for execution as soon as it becomes a root of the intention structure. A suspended intention is one that PRS has adopted but for which no decision has been made as to when it should be carried out; that is, it must be explicitly activated before it can be executed. A conditionally suspended intention (or simply conditional intention) is one that is temporarily suspended until some other specified condition (its activation condition) is satisfied.

Intentions are suspended by means of metalevel KAs. When a conditional intention is reactivated, it is necessary to decide whether or not to reorder the intention structure, and what to do, if anything, with current executing intentions. In the absence of any decision knowledge, it is desirable to ensure immediate response to the condition that activated the intention. Thus the basic interpreter is designed to begin execution of the reactivated intention immediately, suspending all currently executing intentions. Of course, metalevel KAs can override this decision.

Various metalevel KAs exist for reordering intentions on the structure. In the absence of any decision knowledge the basic interpreter inserts new intentions at the root of the existing structure, and removes them when completed. However, metalevel KAs can alter the ordering based on priorities, time, etc.

The Role of Intentions

The role of commitment to previously adopted plans or intentions has been claimed to be a critical component of rational agency. Unless some metalevel KA intervenes, PRS will perform its means-ends reasoning and other planning in the context of its existing intentions. So, for example, if it has a plan for achieving a goal, it will not reconsider other ways of achieving it while it works on the plan. The gain is in reducing decision time -- in highly dynamic domains it is not possible to continually reassess one's plans of action.

Establishing and Dropping Goals

Once a goal is established, PRS will try to solve it. If its solution should fail, it will try another one, and so on until it has exhausted all possibilities (or a meta-level KA changes the situation). The basic system interpreter tries each possible solution exactly once. Obviously the meta-level KAs can change the search strategy, cause it to attempt a solution more than once, etc.

Planning in Real Time

Guaranteed Reactivity

Response time is the time the system takes to recognize and respond to an external event. Thus, a bound on reaction time (that is, the ability of the system to recognize or notice changes in its environment) is a prerequisite for providing a bound on response time.

If p is an upper bound on the execution of primitive actions, n is the upper bound on the number of events that can occur within a time unit, and t is the most time the PRS interpreter takes to select the set of applicable KAs, then the maximum reactivity delay is DR = p/(1 - nt) where we assume that t < 1/n.

Comments

In testing, PRS met every criterion outlined by Laffey et al. for evaluating real-time reasoning systems: high performance, guaranteed response, temporal reasoning capabilities, support for synchronous inputs, interrupt handling, continuous operation, handling of noisy data, and shift of focus attention.

The features of PRS that the authors believe contributed most to its success in the RCS domain were (1) its partial planning strategy, (2) its reactivity, (3) its use of procedural knowledge, and (4) its meta-level (reflective) capabilities. In particular, the manner in which the system integrates its means-ends reasoning with the use of decision knowledge is considered an important part of rational activity.


RCS

[Albus, 1991] The Reference Control System (RCS) is a reference model architecture for intelligent real-time control systems. It partitions the control problem into four basic elements: task decomposition, world modeling, sensory processing, and value judgment.

RCS-1 was developed by Barbera in the mid 1970's. RCS-2 was developed for NIST by Barbera and others in the mid 80's. RCS-3 was created by Albus in 1988 for the Multiple Autonomous Undersea Vehicle project of DARPA and adapted for the NASA Standard Reference Model Telerobot Control System Architecture.




Situated Automata

[Rosenschein, Kaelbling, late 1980's] The theoretical foundation of the situated automata approach is based on modeling the world as a pair of interacting automata, one corresponding to the physical environment and the other to the embedded agent.

Definitions in Situated Automata

embedded agent
A computer system that senses and acts on its environment, monitors complex dynamic conditions and affects the environment in goal-directed ways.


perception
The problem of acquiring information about the world.


action
The problem of acting appropriately relative to perceptions.


primitive goals
There are three kinds: achievement goals, maintenance goals, and execution goals.


universal plan
[Schoppers] A universal plan is a function that, for a given goal, maps every possible input situation of the agent into an action that leads to (in an informal sense) the goal.


Design of Situated Automata

The world and the agent are each modeled as an automaton with local state that varies as a function of signals projected from the other. Situated-automata theory provides a principled way of interpreting data values in the agent as encoding facts about the world expressed in some language whose semantics are clear to the designer. Such semantics of data structures are defined in terms of objective correlations with external reality.

In this approach, a machine variable x is said to carry the information that p in world state s, written s |= K(x, p), if for all world states in which x has the same value as it does in s, the proposition p is true.

Perception-Action Split

A useful way to structure the design process.

Brooks has instead advocated horizontal decompositions that cut across perception and action; this allows the designer to consider simultaneously those limited aspects of perception and action needed to support particular behaviors. This discourages the pursuit of "spurious generality" that can stall real progress.

These attractive features are offset, however, by the degree to which horizontal decomposition encourages linear thinking. It does not encourage the development of elements that can recombine with one another to produce complex patterns of behavior. The alternative is a vertical decomposition based on separate system modules that recover information from multiple sources and use it for multiple purposes; this is a more attractive approach to making efficient use of programmer effort.

Goals

One way to characterize an agent's states semantically is in terms of the information they embody. The perception component delivers information, and the action component maps this information to action. In many cases, however, it is more natural to describe actions as functions both of information and goals.

There are two kinds of goals: static and dynamic. Static goals are statements the agent's behavior is designed to make true. Dynamic goals are the more normal meaning, and we can define them this way. Having a goal p is having information that p implies a fixed top-level goal called N for "Nirvana." Formally, a goal operator G is defined as follows:

G(x, p) == (is equivalent to) K(x, p -> N).

In this model, x has the goal p if x carries the information that p implies Nirvana. Note that p can be an indexical statement whose truth varies with time. Under this definition, goals and information are dual concepts.

The deductive closure axiom:

K(x, p -> q) -> (K(x, p) -> K(x, q)).

The subgoaling axiom:

K(x, p -> q) -> (G(x, q) -> G(x, p)).

This says that if the agent has q as a goal, and carries information that q is implied by some other, more specific condition p, the agent is justified in adopting p as a goal.

Software Tools for Design

Rex is a language with LISP's recursive power, used at compile time to specify a synchronous digital circuit.t However, it is only a low-level, operational language more akin to standard programming languages than declarative AI languages. Ruler is a higher, declarative language based on "informational" semantics used to specify the perception component of the agent. Gapps is based on "goal" semantics and is meant to be used to specify the action component of the agent.

Gapps

Gapps is a language for specifying behaviors of computer agents that retains the advantage of declarative specification, but generates run-time programs that are reactive, do parallel actions, and carry out strategies made up of very low-level actions.

The Gapps compiler takes as input a declarative specification of the agent's top-level goal and a set of goal-reduction rules, and transforms them into the description of a circuit that has the output of the perception component as its input, and the output of the agent as a whole as its output. The agent's output may be divided into a number of separately controllable actions.

Goals and Programs

There are three primitive goal types: goals of execution, achievement, and maintenance.

Goals of execution are of the form do(a), where a is an instantaneous action the agent can perform; the goal is simply to perform the action. Maintenance goals, called maint(p), strive to maintain the truth of proposition p as long as possible. Achievement goals, of the form ach(p), try to bring about the truth of p as soon as possible.

The set of goals is made up of the primitive goal types, closed under the Boolean operators. The notions of achievement and maintenance are dual, so ~ach(p) == maint(~p) and ~maint(p) == ach(~p).

We need the notion of an action leading to a goal. Informally, an action a leads to a goal G (a ~> G) if it constitutes a correct step towards the satisfaction of the goal. For an achievement goal, the action must be consistent with the goal's eventually being true; for a maintenance goal, if the condition is already true, the action must imply that it will be true at the next instant of time.

The leads to operator must have the following additional properties:

Goal reduction rules are of the form (defgoalr G G'), and mean that goal G can be reduced to G'; that is, G' is a specialization of G and implies G. By the definition of leads to, any action that leads to G' will also lead to G.

A program is a finite set of condition-action pairs, in which the condition is a run-time expression (a piece of Rex circuitry) and an action is a vector of run-time expressions, one corresponding to each primitive output field.

A program P consisting of condition-action pairs { <c1, a1>, ... <cn, an>} is said to weakly satisfy a goal G if, for every condition ci, if that condition is true, then the corresponding action ai leads to G.

The domain of a program is the class of situations in which it does specify an action, e.g.,

dom(P) = Vi ci

A goal G is strongly satisfied by program P if it is weakly satisfied by P and dom(P) = true; that is, if for every situation, P supplies an action that leads to G. It may be that multiple conditions are true simultaneously; in that case a program execution may choose among them nondeterministically.

Recursive Goal Evaluation Procedure

A Gapps program is made up of a set of goal-reduction rules and a top-level goal expression. The general form of a goal reduction rule is (defgoal goal-pat goal-expr) where

goal-pat ::= (ach pat rex-params) | (maint pat rex-params)

goal-expr ::= (do index rex-expr) |
(and goal-expr goal-expr) |
(or goal-expr goal-expr) |
(not goal-expr) |
(if rex-expr goal-expr goal-expr) |
(ach pat rex-expr) |
(maint pat rex-expr)

where index is a keyword, pat is a compile-time pattern with unifiable variables, rex-expr is a Rex expression specifying a run-time function of input variables, and rex-params is a structure of variables that becomes bound to the result of a rex-expr.

The main evaluation procedure operates upon these rules as you would expect (I won't type them all in), so do creates a primitive condition-action rule that requires the function defined by rex-expr to occur at the index element of the vector, for example, e.g.,

make-primitive-program(i, rex-expr)
{ < true, < 0, ..., rex-expr, ..., 0 >> }

is the function called by eval(G) when first G is a do.

Generating a Circuit

Once a goal expression has been evaluated, yielding a program, a circuit is generated that instantiates the program. (Example is given in the paper of inputs feeding into a cascading sequence of IF circuits.) Because any action whose associated condition is true is sufficient for correctness, the conditions are tested in an arbitrary order that is chosen at compile time. The output of the circuit is the action corresponding to the first condition that is true. (If no condition is satisfied, an error action is output to signal the programmer that he has made an error.) If, at the final state of circuit generation, there are still 0 components in an action vector, they must be instantiated with an arbitrary value.

Prioritized Goal Lists

It is possible to specify goals in terms of prioritized lists or prioritized conjunctions. In the former, the goal consists of a list of subgoals ordered preferentially (so the first is preferable to the second, etc.); this is implemented at circuit generation time by concatenating programs in priority order, and executing the first action whose corresponding condition is satisfied.

Prioritized conjunctions are similar to prioritized sets of goals in which the most preferred goal is the entire conjunction, and less preferred goals are conjunctions of shorter and shorter prefixes of the goal sequence. This is a convenient high-level construct for programming flexible reactive behavior without the need for the programmer to explicitly envision every combination of conditions in the world.

Extending Gapps

Universal Planning with Goal-Reduction Schema

The program resulting from the Gapps-evaluation of a goal can be thought of as a universal plan, mapping situations to actions in service of the top-level goal. However, Schoppers' approach differs from Gapps in that the user specifies the capabilities of the agent in an operation-description language. This language allows the user to specify a set of atomic capabilities of the agent, called operators, and the expected effect that executing each of the operators will have on the world, depending possibly on the state of the world in which the operator was executed.


SOAR

[Laird, Newell, et al., 1982-present] SOAR is an architecture for a system that is intended to be capable of general intelligence.

Definitions in SOAR

problem space
[SOAR] A space in which a set of operators applies to a current state to generate a new state.


problem space hypothesis
[SOAR] Search in problem spaces is the fundamental organization for all goal-oriented symbolic activity. This is a fundamental hypothesis underlying the SOAR architecture.


universal subgoaling
In SOAR, subgoals can be set up for any decision in the system.


current situation
The combination of the current goal, problem space, state, and operator.


preference
Search control data indicating how SOAR should behave in the current situation. Preferences are on only a few concepts: acceptability, rejection, better (best, worse, worst), and indifference. Preferences are statements about the selection of an object for a slot (or set of slots).


automatic subgoaling
The SOAR architecture introduces subgoals to resolve impasses reached by the decision procedure; the process of detecting impasses and generating subgoals is entirely architectural and task-domain independent. Described as the "most novel" feature of the architecture.


decision cycle
Controls the replacement of objects in working memory. It consists of two phases, the elaboration phase (in which new objects, new augmentations, and preferences are added), and the decision procedure, which either performs replacements of objects in slots, or creates new subgoals.


condition
Antecedent in a production rule that examines the context stack and the rest of working memory. Condition patterns are based on constants, variables, negations, pattern-ands, and disjunctions of constants.


refractory inhibition
SOAR's only conflict resolution scheme; all successfully instantiated productions fire, but an instantiation of a production fires only once.


Overview

SOAR is the end of a long line of intelligent reasoners that began with the Logic Theorist in 1956. The line runs through GPS and the OPS series (and others). SOAR's roots include the concept of a cognitive architecture, the idea of an instructable production system, and the extension of problem spaces to routine behavior.

SOAR uses explicit symbolic representation of its tasks, which are manipulated by symbolic processes. It uses a general scheme of goals and subgoals to represent what the system wants to achieve and to control its behavior. In this section the fundamental design choices are outlined.

Principle: Uniform task representation by problem spaces. Every task in SOAR (choosing states, selecting operators, etc.) is formulated as finding a desired state in a problem space. Thus all tasks take the form of heuristic search. If immediate knowledge is inadequate, subgoals are set up to make the decision.

Principle: Any decision can be an object of goal-oriented attention. In fact, subgoals can be set up for any problematic decision; this is called universal subgoaling. Since subgoals can be made of subgoals, and goals can be made to search for any needed information, SOAR can reflect on its own problem solving behavior to arbitrary levels.

Principle: Uniform representation of all long-term knowledge by a production system. All long-term knowledge in SOAR is represented in a production system. Productions can contain control knowledge or procedural knowledge. All data structures examinable by productions (that is, data in declarative form) is in the production system's short-term working memory. The long-term storage of this knowledge is in productions that have actions that generate these data structures.

SOAR's production system (based upon OPS5) fires all satisfied productions in parallel, without conflict resolution. Productions can only add data to working memory. Modification and removal of data elements is done by the architecture. SOAR 5 retracts the actions of production instantiations that no longer match.

Principle: Search control knowledge expressed in preferences. Search control knowledge arises in data generated by firing productions. One such data element, the preference, represents knowledge about how SOAR should behave in the current situation. A fixed decision procedure interprets the accumulated preferences to determine the next action. This procedure is simply the embodiment of the semantics of basic preference concepts, and contains no task-dependent knowledge.

Principle: Goals are created only to cope with impasses. When problem solving cannot continue -- that is, when the decision procedure fails because it doesn't know what to do next -- the system has reached an impasse. At this point the architecture creates a goal for overcoming the impasse. Thus only the architecture introduces subgoals; this principle is called automatic subgoaling.

Principle: Continuous monitoring of goal termination. SOAR continuously monitors all active goals for their termination. When they terminate, it immediately proceeds from the point of termination, and all working memory local to the terminated goals is removed. This is normally an expensive process in AI systems, and its efficient realization in SOAR depends largely on automatic subgoaling.

Principle: Basic problem-solving methods arise from task knowledge. SOAR realizes weak methods (e.g., hill-climbing, alpha-beta search, etc.) as search control productions that express knowledge about the task.

Principle: Continuous learning-by-experience through chunking. SOAR learns automatically by permanently caching the results of subgoals as productions. In essence, it stores the preferential result of the subgoal, abstracting only the relevant features, so the next time it is encountered it can use the preference immediately. This is directly related to the human phenomenon of chunking for which it is named.

The SOAR Architecture

The Problem Solving Architecture

Because SOAR accomplishes all of its tasks in problem spaces, it is a problem solving architecture.

To realize a task as a search in problem space requires a fixed set of task-implementation functions involving the retrieval or generation of

  1. problem spaces
  2. problem space operators
  3. an initial state (representing the current situation)
  4. new states that result from applying operators to existing states

To control the search requires a fixed set of search-control functions that select

  1. a problem space
  2. a state from those directly available
  3. an operator to apply to the state

The task-implementation and search-control functions together are sufficient for problem space search. Usually they are interleaved, with task implementation generating new problem spaces with associated states and operators, and search control choosing from the alternatives.

Working Memory

Working memory holds all processing information for the current problem. It has three components:

  1. context stack -- lists the hierarchy of active goals, problem spaces, states and operators
  2. objects -- such as goals and states (and their subobjects)
  3. preferences -- that encode procedural search knowledge

The context stack contains the hierarchy of active contexts. Each context contains four slots, one for each of the four roles: goal, problem space, state, and operator. Each slot either contains an object or the undecided symbol. The top context contains the highest goal in the hierarchy; goals in lower contexts are subgoals. Since each context has only one goal during its existence, the context stack doubles as a goal stack.

The basic representation is object-oriented. Each object (goals, problem spaces, etc.) contains an identifier symbol and a set of augmentations, which are essentially attribute/value pairs. The augmentations of an object may change over time.

Preferences are more complex data structures with a specific collection of eight architecturally-defined relations between objects.

Working memory can be modified by

The Processing Structure

The processing structure implements the functions required for search in a problem space, to wit, using task-implementation knowledge to generate objects, and using search-control knowledge to select between them.

Search-control functions are all realized by a single generic control act: the replacement of an object in a slot by another object from working memory. Specifically, the representation of a problem is changed by replacing the current problem space with a new one. Returning to a prior state is accomplished by replacing the current state with a preexisting one in working memory. An operator is selected by replacing the current one (often undecided) by a new one. A step is taken in the problem space when the current operator is applied to the current state, yielding a new one, which is then selected to replace the existing one.

Replacement can occur anywhere in the context stack. When an object in a slot is replaced, all of the objects in the slots below it are reinitialized to undecided, since they depend upon objects higher up for their validity. All contexts below the context of the object are terminated because they depended on the original context for their existence.

Replacement is driven by the decision cycle. Each cycle involves two parts. The first part, the elaboration phase, adds new objects, new augmentations of old objects, and preferences to working memory. Then the decision procedure examines the accumulated preferences and the context stack, and it either replaces some object in a slot, or it creates a subgoal in response to an impasse.

The Elaboration Phase

The elaboration phase adds new objects, augmentations of existing objects, and preferences, based upon the contents of working memory. Elaborations are generated in parallel, but may still require multiple steps, as information generated in one step may allow further elaborations in later steps. This is a monotonic process (it does not delete or modify working-memory elements) that runs until it reaches quiescence (which usually happens quickly, but in practice there is a cutoff number of iterations). Note however that this is only syntactic monotonicity, so semantic nonmonotonicity or conflicts may arise.

The elaboration phase is encoded as productions of the form

If C1 and C2 and ... and Cm then add A1, A2, ... An.

The C's are conditions that examine the context stack and working memory. The A's are actions that add augmentations or preferences.

A production is successfully instantiated if the conjunction of its conditions is satisfied with a consistent binding of variables. There can be any number of concurrently successful instantiations of a production; all successful instantiations fire concurrently during the elaboration phase. The only conflict resolution principle used in SOAR is refractory inhibition -- an instantiation of a production is fired only once. (Instead of control at the level of conflict resolution in productions, SOAR performs control via the decision procedure.)

The Decision Procedure

After the elaboration phase has reached quiescence, the decision procedure determines which slot in the context stack should have its content replaced, and by which object. This is done by examining the context stack from the oldest context to the newest (i.e., from the highest goal to the lowest one). Within each context, each role is considered, in the order problem space, state, and operator. This process terminates when a slot is found that requires an action. Changing a higher slot results in reinitializing the lower slots, so they do not need to be processed.

The ordering on slots is a fixed desirability ordering on changes in which it is always preferable to change slots higher up. Processing is controlled by the knowledge contained in the preferences in working memory at the end of the elaboration phase. Each preference is a statement about the selection of an object for a slot (or set of slots). There are three primitive concepts available to make preference statements:

The acceptability and rejection preferences determine which objects will be considered; the desirability preferences place a partial order on that set. The result of processing a slot, if successful, is a single object that is new, acceptable, not rejected, and more desirable than any other such choice.

All desirability rankings require a reference object to compare against. For better, worse, and some indifferent preferences, the reference is another object being considered for the slot. For best, worst, and the other indifferent preferences, the reference is an abstract anchor point.

Note that rejection is not the logical negation of acceptability; rejection removes an object from all consideration. Conflicts of desirability may arise, so a distinction is made between being better and being dominant (when one object is better than another, and the other object is not better than it). A version of the closed world assumption is invoked, so that positive knowledge is required to state a preference; no assertions are made by default.

Once the set of maximal choices is computed, the decision procedure makes a choice for the slot, with the current choice acting as the default. If there are no maximal choices, the current one is maintained. If the elements of the maximal set are indifferent, one is chosen at random.

If the value of the slot is changed, the change is installed and all of the lower slots are changed to undecided. If there is no final choice, or all of the slots have been exhausted, then the decision procedure fails and an impasse occurs. There are four kinds of impasses in SOAR:

  1. tie -- There are multiple maximal choices that are not mutually indifferent and do not conflict. These are competitors for the same slot for which insufficient knowledge exists to discriminate among them.

  2. conflict -- There are conflicting choices in the set of maximal choices (so, for example, x and y are in the maximal set, but preferences have that (x > y) and (y > x)).

  3. no-change -- The current value of every slot is maintained.

  4. rejection -- The current value is rejected and there are no maximal choices; that is, there are no viable choices for the slot. This usually means all the alternatives have been tried and found inadequate.

These rules are mutually exclusive, so at most one impasse will occur during the decision procedure. SOAR's response to an impasse is to set up a subgoal to resolve it.

Impasses and Subgoals

In general, impasses occur because of incomplete or inconsistent information; incomplete information may yield a rejection, tie, or no-change impasse, while inconsistent information produces a conflict impasse.

When an impasse occurs, a subgoal a new context is created. In this way SOAR can deliberately search for information to resolve the impasse. All types of knowledge (task-implementation and search-control) can be encoded in the problem space for a subgoal.

If the impasse is a tie, problem solving to find a best object usually results in creating desirability preferences, so the subgoal is aimed at using search-control knowledge to select among the objects. Ties can be resolved in several ways: one object is determined to lead to the goal, so a best preference is created; one object is found to be better than another, so a better preference is created; there is no difference between objects, so an indifference preference is created; or one object is found to lead away from the goal, so a worst preference is created.

If a no-change impasse arises, the problem solving in the subgoal usually involves operator implementation, terminating when an acceptable preference is created for a new state in the parent's problem space. Subgoals can create problem spaces or initial states when the required knowledge is more easily encoded as a goal to be achieved through problem space search, rather than as a set of elaboration productions.

In general, impasses are resolved by the addition of preferences that change the results of the decision procedure. When an impasse is resolved, the subgoal created for it is done and can be terminated. Subgoals will also be terminated if the roles in their supergoals are changed, since they depended upon the original context. Resolution of an impasse terminates all goals below it.

When subgoals are terminated, the working-memory manager removes all leftover memory elements (terminated subgoals and processing elements they've created) in essentially the same way LISP removes inaccessible CONS cells. Only the results of the subgoal are retained -- the objects and preferences in working memory that are still linked to the unterminated contexts.

The architecture defines the concept of goal termination rather than goal success or failure. This is because there are many reasons why a goal should disappear, and many ways the preferences reflect this.

Automatic subgoal termination is normally an expensive, though desirable, process in goal systems. In SOAR it is not expensive because the architecture creates all goals, so it has the knowledge and organization necessary to terminate them.

Default Knowledge for Subgoals

SOAR has a body of task-independent knowledge about its operation and how to attain goals encoded as a set of (about fifty) productions. With it, SOAR exhibits reasonable default behavior, which of course can be overridden by additional knowledge that adds other preferences.

Common Search-Control Knowledge

SOAR provides search-control rules for three common situations requiring preference creation.

  1. Backup from a failed state. If there is a reject preference for the current state, create an acceptable preference for the previous state. This implements a basic kind of backtracking.

  2. Make all operators acceptable. If there is a fixed set of operators that can apply in a problem space, they should be candidates for every state. This is done by creating acceptable preferences for those operators.

  3. No operator retry. Since SOAR is deterministic, it will produce the same result whenever a given operator is applied to a given state. Thus once an operator has created a result for a state in some context, create a preference to reject that operator whenever that state is current for a context with the same goal and problem space.
Diagnose Impasses

When an impasse occurs, the architecture creates a new goal and context containing some specific knowledge about the nature of the impasse. Then task-dependent search-control knowledge can be used to decide how to invoke appropriate problem-solving behavior. If no task-dependent knowledge is available, there are default rules.

  1. Tie impasse. Assume additional knowledge is needed to discriminate among the tied objects. The selection problem space is made acceptable to work on the problem, but a worst preference is generated for it, so any other problem space will be preferred.

  2. Conflict impasse. Assume additional knowledge is needed to resolve the conflict. The same approach is taken as with ties.

  3. No-change impasse.

    1. For goal, problem space, and state roles. Assume the next higher object in the context is responsible for the impasse. Create a reject preference for that object in the context or supercontext. This action is taken only when a problem space is not selected for the subgoal that was generated because of the impasse; this allows it to be overridden through problem solving in a problem space selected for the no-change impasse. If there is a no-change impasse for the top goal, no further progress is possible.

    2. For the operator role. Try to find a state in the problem space to which the operator will apply.

  4. Rejection impasse. The same approach is taken as in 3.1) above.
The Selection Problem Space

This space is used to resolve ties and conflicts. The states of the space contain the candidate objects from the supercontext (the items associated with the subgoal). There is only one operator in the selection space, evaluate-object. It is a general operator that is instantiated with each tying (or conflicting) object. Each state in the selection space is a set of evaluations, each of which allows the creation of preferences involving the objects being evaluated. This requires task-specific knowledge, so either productions must exist that evaluate the objects, or subgoals will be created to perform the evaluations. Indifferent preferences are created for all of the evaluate-object operators so that a selection can be made between them.

If all evaluate-object operators are rejected, and no selection can be made, problem solving in the selection space has failed. In this case, default knowledge makes all of the tied alternatives indifferent, allowing problem solving to continue.

The Evaluation Subgoal

In the selection problem space, each evaluate-object operator must evaluate the item with which it is instantiated. If no task-dependent knowledge is available to do this, a no-change impasse will occur, leading to a subgoal to try to implement the operator.

One task-independent technique for doing this is look-ahead -- try the item out temporarily to gather information. This is the default approach. To apply it, productions reconstruct the task context, making acceptable preferences for the objects selected in the context and augmenting the new goal with information from the original goal. Once the task context has been reconstructed, the item being evaluated is selected (it gets a best preference). This allows it to be tried, and possibly an evaluation can be produced based upon its progress toward the goal.

When knowledge is available to evaluate states in the task space, the new state produced in the evaluation subgoal will be evaluated, and that evaluation can be backed up to serve as the evaluation of the operator that produced it. If no knowledge to evaluate states is available except when a goal is achieved, a depth-first search arises. In a two-player game, mini-max search arises. (Other searches can arise depending on when and what knowledge becomes available to evaluate the states in the task space.)

Chunking

Chunking is an explanation-based learning scheme incorporated into SOAR. It is based on the observation that (1) an impasse arises because of a lack of directly available knowledge, and (2) problem solving in the resultant subgoal produces new information on how to resolve the impasse.

In chunking, chunks are acquired that generate the results of a goal, given the goal and its parameters, where "parameters" is defined to be those aspects of the system existing prior to the goal's creation that were examined during the processing of the goal. Chunks are compiled into single productions, and can be generated either for terminal subgoals or for every subgoal, depending on SOAR's configuration.

Chunking in SOAR is powerful because SOAR generates goals automatically for problems in any aspect of its problem-solving behavior. Subgoals are created exactly in situations where SOAR requires learning, since they are created exactly when available knowledge is insufficient for the next step in problem solving.

The Chunking Mechanism

A chunk production summarizes the processing in a subgoal. Its actions generate those working-memory elements that eliminated the impasse responsible for the subgoal (and thus terminated it). The conditions test those aspects of the current task that were relevant to the actions being performed. The chunk is created when the subgoal terminates. The chunk's actions are based on the results of the subgoal -- those working-memory elements created in the subgoal that are accessible from a supergoal. An augmentation results if its identifier either existed before the subgoal was created or is in another result. A preference results if all of its specified context objects either existed before the subgoal was created or are in another result.

The chunk's conditions are based on a dependency analysis on traces of productions that were fired during the subgoal. Each trace contains the working-memory elements that the production matched (called the condition elements) and those it generated (action elements). Only productions that actually add something to working memory have their traces saved.

Once a trace is created, it is stored on a list associated with the goal in which the production fired. This is taken to be the lowest goal in the hierarchy matched by conditions of the production.

When the subgoal terminates, its results are put into independent subgroups, where two results are considered dependent if they are linked together or both link to a third object. Each subgroup forms the basis for the actions of one production, and the conditions of each production are determined by independent dependency analyses. The overall effect of breaking the results into subgroups is to produce more productions with fewer conditions and actions in each, and therefore more generality than if a single production had all together.

For each set of results, the dependency analysis finds those traces that have one of the results as an action element. The condition elements of those traces are then divided up into those that existed before the subgoal was created, and those created in the subgoal. The ones created before the subgoal become conditions of the chunk. The others are recursively analyzed as if they were results, to determine the pre-subgoal elements responsible for their creation.

Once the set of chunk productions is determined, they are generalized by replacing the identifiers in the working-memory elements with variables. Different identifiers are replaced by different variables which are forced to match distinct identifiers. Note that this may sometimes err by creating productions that will not match when two elements just happen to have the same identifier, but it always errs by being too constraining.

The final step in chunking is to perform a pair of optimizations on the chunk productions. The first optimization simplifies productions learned for the implementation of a complex operator. As part of creating the new state, much of the prior state's substructure may need to be copied, and the chunk for this subgoal will have its own condition and associated action, for each such substructure, which could result in a large number of condition-action pairs identical except for the variable names. The optimization eliminates this problem by removing the conditions that copy substructure from the original production and instead, for each type of substructure being copied, a new production is created with a single condition-action pair to copy substructures of that type.

The other optimization is to order the production conditions to try to make matching faster. Each condition is like a query, returning all working-memory elements matching it, and the overall match returns all production instantiations that match the conjunctive queries. The efficiency of the process depends heavily on the order of the queries, so by automatically ordering them SOAR can improve overall efficiency.

Applications of SOAR

SOAR has applications in many areas, including:

SOAR is still missing: deliberate planning, automatic task acquisition, creating representation, recovering from overgeneralization, interaction with external environment.

Robo-SOAR

[Laird et al., 1989] Robo-SOAR controls a Puma robot arm using a camera vision system. The arm manipulates blocks on a table in front of it, while the camera gives a view from above. The arm must align blocks in desired configurations; however, there is a light next to the table, and if the light goes on, the arm must push a button as soon as possible.

Note than in such a system, reactive behavior is supported by productions that test the environment without examining the current operator, and act in response to the environment's status. Such productions are quick, but exhibit a loss of control knowledge in decision making.

Hero-SOAR

Hero-SOAR controls a Hero 2000 robot, a mobile robot with an arm. Its job is to pick up cups and deposit them in a waste basket.

TacAir-SOAR

[Tambe, et al., 1995] An extension of SOAR that controls aircraft in military simulations. The purpose is to allow the construction of a variety of military aircraft that can participate in large-scale (i.e., multiple thousands of units), distributed combat simulations.

TacAir-SOAR raises some of the standard issues in building intelligent agents for simulated environments; they need to be flexible, adaptable, and not rely too heavily on plans (because of the environment's dynamism), but at the same time they can avoid dealing with complex, noisy sensors. Issues instead are: multi-agent coordination, agent modeling, reactive planning.

TacAir-SOAR is the first AI system to have participated directly in an operational military exercise.

Hypotheses of SOAR

SOAR embodies eleven basic hypotheses about the structure of an architecture for general intelligence:




Subsumption Architecture

Rodney Brooks of MIT created the subsumption architecture to contrast with high-level approaches to agent design. It uses a bottom-up approach, beginning with the most simple behaviors and adding more complex ones on top, rather than trying to design a complete and comprehensive system at the highest level. His architecture is decomposed into task-achieving behaviors (e.g., identifying objects, avoiding objects, building maps) rather than functional modules (e.g., planning, perception, motor control).

Definitions in Subsumption

task-achieving behaviors
[Subsumption] An alternative to breaking a system down into functional units (e.g., sensing, planning, etc.) is to break it down into components that are in charge of distinct tasks (e.g., map-building, wandering, object identification).


module
[Subsumption] A finite state machine, together with a set of instance variables. Layers are composed of modules connected by wires.


layer
[Subsumption] A decomposed element of an agent architecture. An architecture is made up of layers, each devoted to a particular task or set of tasks, and each one having control over the ones below.


wire
[Subsumption] A path along which modules communicate with one another by sending messages. Input wires may be suppressed from a higher layer, and output wires may be inhibited.


symbol system hypothesis
[Subsumption] Intelligence operates on a system of symbols. The central reasoning engine operates on domain-independent groups of symbols to handle perception, motor skills, etc.


physical grounding hypothesis
[Subsumption] To build a system that is intelligent, it is necessary to have its representations grounded in the physical world.


Requirements

The subsumption system is meant to be used to build mobile robots. Brooks sees four main requirements for the design of a robot architecture:

Requirement Problem Subsumption Solution
multiple goals the control system must be responsive to high-priority goals while still servicing low-level goals (e.g., it must be able to get off the railroad tracks, but must also make sure it maintains its balance) individual layers work concurrently on different problems; the subsumption mechanism mediates actions in case of conflicting goals
multiple sensors the robot must handle multiple, possibly contradictory, sources of input since there isn't necessarily a central representation, this is not really a problem; each layer may process and use sensors according to its needs and irrespective of other layers
robustness the robot should cope with loss of some sensors, and should be able to handle drastic changes in the environment with graceful degradation; ideally, it will also gracefully degrade as parts of its system become inoperative handles multiple sensors well; if higher levels can't act, lower levels will continue to perform their fast-reaction actions
extensibility the robot should improve as new sensors and capabilities are added to the system, along with more processing power each layer can be given its own processor, since they are very loosely coupled; conversely, one layer may be spread across many processors


Dogmatic Principles

Brooks believes that the symbol system hypothesis is fundamentally flawed. Instead, he argues for the physical grounding hypothesis: to build a system that is intelligent it is necessary to have its representations grounded in the physical world.

The former decomposes intelligence into functional information processing modules whose combination produces system behavior. The latter decomposes intelligence into behavior-generating modules whose coexistence and cooperation give rise to emergent complex behaviors.

(Basically, this argument strikes me as a form of the top-down vs. bottom-up argument. Classical top-down AI never gets to the bottom levels and so doesn't achieve real-world operations, while Brooks has a hard time getting to the upper levels, though his implemented basic procedures do produce some behavior. -- POD)

There are several problems with the symbol system hypothesis, according to Brooks. It implies that intelligence operates on symbols that are representations divorced from the real world. Descriptions come in terms of named, typed entities and their relationships. This makes sets of symbols poorly suited to adapting to one problem or another (makes me think of elaboration tolerance issues -- POD). Such systems become increasingly complex as they deal with belief, knowledge, different problem domains, etc.

The "nouvelle" approach of the physical grounding hypothesis avoids these problems. Traditional symbolic representations are unnecessary; the key observation is the world is its own best model. The trick is to sense it appropriately. Everything must be explicit and directly related to the world.

Brooks contrasts "traditional AI" and "nouvelle AI" in two ways. Traditional AI has tried to demonstrate sophisticated reasoning in impoverished domains. Nouvelle AI tries to demonstrate less sophisticated reasoning in noisy, complex domains. Traditional AI uses heuristics to guide search, while nouvelle AI relies upon the interaction of small behavioral units to create global behavior.

Design Principles

Brooks bases his design decisions upon nine principles:

Levels of Competence

Standard decompositions of the robot problem involve breaking it down into "vertical slices" -- information is processed in each slice and passed on to the next slice for a separate kind of processing. Brooks argues this is a bad approach because it means that every piece must be built before the system can function, and whenever a piece is changed the changes must not interfere with its interface, or if it does, changes must be made to other modules.

Brooks' approach is to decompose the problem into levels of competence. Each level contains certain desired behaviors, and higher levels subsume the behaviors of the lower levels. The levels used in Brooks' earlier agents are:

Level Competence
0 avoid contact with objects
1 wander around without hitting things
2 "explore" by seeing distant places and moving toward them
3 build maps and plan routes
4 notice changes in "static" environment
5 reason about objects in world; perform object-related tasks
6 formulate and execute plans that involve changing the world state
7 reason about the behavior of objects in the world and modify plans accordingly

Layers are built from the bottom up, and each layer is completed before the next is added. Layers never need to be altered once they are debugged. Each layer operates unaware of the higher layers, which may interfere with its data paths.

Design of the Architecture

A subsumption architecture consists of a series of layers, going from the bottom-level layer of basic functionality, and adding a new task layer above it as many times as needed. Each layer consists of a small set of processors that send messages to one another. A processor is a finite state machine together with a data store. Processors send messages along wires to one another, though there is no acknowledgment when messages are received (so messages can conceivably be lost).

Processors (also called modules) run asynchronously and have no central control module. The inputs to modules can be suppressed and the outputs inhibited by wires coming from other modules. This is how higher-level modules control those on lower levels.

Finite State Machines

Each module is a finite state machine, augmented with some data storage (registers). It has a number of input and output lines. Input lines have single-element buffers, so the most recent message is always available for inspection. (Messages can be lost if a more recent one overwrites an older one before it is examined.) Each module has a distinguished input called reset.

Each state of the FSM is named. Initially, all modules start in the distinguished state NIL. There are four types of states:

State Function
output an output message is computed from input and registers, and put on an output line; then a new specified state is entered
side effect an register's value is changed to a new value computed from inputs and registers; then a new specified state is entered
conditional dispatch a predicate on the module's registers and inputs is computed, and one of two subsequent states is entered depending on the outcome
event dispatch a sequence of (condition, state) pairs are monitored until one event is true, when the machine goes to its corresponding state; events are combinations of arrivals of messages on input lines and expiration of time delays

Communications

Layers consist of modules and wires. An output line from one module is connected to the input lines of one or more other modules. Outputs may be inhibited, and inputs may be suppressed.

A wire may terminate at the output site of a module. Any signal traveling along this wire inhibits output messages from the module along that line for a predetermined time. (NOTE that this was true only in early versions. Later versions inhibited only as long as a message was being sent along the inhibiting wire.) Any messages sent from the module along the output during that time are lost.

Similarly, a wire may terminate at an input site of a module. In addition to preventing input data from reaching the module along that wire, the signal sent along this wire is actually fed through as if it were the original input. Thus it suppresses the usual input and provides a replacement. If more than one suppressing wire is present they are ORed together.

Lastly, an input may be connected directly to a register in a FSM.

The New Subsumption

Later versions of the subsumption architecture allowed behaviors to be specified. These behaviors were groups of tightly-bound AFSMs (augmented finite state machines) that could selectively be activated and deactivated. The behavior compiler takes behaviors and produces subsumption AFSMs; these are then compiled into machine language.

Within a behavior, there can be message passing, suppression, and inhibition (and between behaviors, of course). Behaviors act as abstraction barriers, as one cannot reach into another. Behaviors differ from AFSMs in that they can share registers and alarm clocks.

Implemented Systems

Allen

The first subsumption robot. It used an offboard LISP machine to simulate the architecture. It had sonar range sensors and odometry onboard. Allen's first layer let it avoid obstacles; the second made it randomly wander around, choosing a new direction every few seconds. The third layer made it look for distant places (with sonar) and tried to move to them.

Tom and Jerry

Identical, small robots designed to demonstrate how little machinery is necessary to support subsumption. They had a three-layer subsumption, but it fit on a single 256-gate PAL chip. They were toy cars with infrared proximity sensors onboard. The first two layers were similar to Allen's; avoid obstacles and wander to new places, implemented as summed vector forces.

The third layer detected moving objects with the three sensors and created a following behavior. When something was detected, the robot was attracted and moved toward it. The lower level avoidance behaviors kept them from colliding.

Herbert

Herbert had an onboard, 24-processor system, in which the AFSM connections were actually copper wires. Herbert used infrared proximity sensors and laser light systems to navigate. It was programmed to wander around office areas, go into offices, and take soda cans from their desks. A set of 15 behaviors drove its arm to physically search for soda cans, locate them, and pick them up.

Interestingly, there was no internal communication between the behavior modules. An arbitration network attached to the outputs drove the actuators. The object-finder drove the robot to line its arm up in front of a can; when the arm sensed that there was no motion, it started to move, which caused cascading sequences of actions until it actually grasped the can.

Genghis

Genghis is a 1 Kg, six-legged robot that walks under subsumption control. It can handle obstacles, rough terrain, and can follow sources of radiation. It has a highly-distributed system; different controllers on each leg are controlled by different parts of the system.

Squirt

A very small 50 g robot. It operates by hiding in dark corners, and ventures out when it hears noise, only after the noise is gone, trying to find a new place to hide near where the noise originated.

The lowest-level behavior monitors a light sensor and causes Squirt to move in a spiral pattern searching for darkness. It stops when it finds a dark spot.

The second level of behavior triggers when a dark spot is located. It monitors two microphones. Squirt waits until it hears a sharp noise, followed by several minutes of silence. Squirt moves toward the noise, suppressing its desire to stay in the dark, and after this motion times out, it returns to looking for a dark place.

Toto

First system using new subsumption. Toto uses an intermediate-level set of behaviors to recognize particular types of landmarks such as walls,corridors and clutter. Another network is made up of identical behaviors with each layer waiting for new landmarks to be recognized. When this happens a behavior allocates itself to be the "place" of the landmark. The behaviors that correspond to physically adjacent locations have neighbor links activated between them.

Nodes in this graph structure become active according to their correspondence to the current location. (This strikes me as similar to various neural network models -- POD) When behavior to go to some place is activated, spreading activation is used to spread from the goal via the activation links; this process continues as the robot moves. The generation and use of these maps is totally distributed.

Seymour

Seymour contains a video processing system that uses multiple, unreliable vision routines to produce reliable behavior for following objects. One process tracks a moving blob; the other overlays the blob image with the places motion is detected. The robot then tries to keep the blob within a fixed location in image coordinates.


Systems Not on the List

Some other important recent work on agents and agent systems that isn't included here but is worth examining is:



Sources

The sources for this document are the papers listed on my Depth Area Reading List. Please note that most of the text consists of direct quotations or paraphrases of material to be found in these papers. These notes are intended only for my use in reviewing this material.


Patrick Doyle pdoyle@cs.stanford.edu