Agent Architectures | ||
Patrick Doyle | AI Qual Summary | June 3, 1997 |
[Gat, 1991-1993] ATLANTIS (A Three-Layer Architecture for Navigating Through Intricate Situations). Gat's main thesis is that the argument of planning versus reactivity is really an argument about the use of internal state, which is an argument about making predictions about the world. He proposes that internal state should be maintained only at a high level of abstraction, and that it should be used to guide a robot's actions but not to control them directly.
The fundamental problem of state is that every piece of internal state carries with it an implicit prediction that the information contained in this state will continue to be valid for some time to come. These predictions, coupled with the unpredictability of the world, are the root of the problem. The constructive observation is that state should only be used to record facts that are likely to be at least usually correct.
Abstraction is simply a description of something. The level of an abstraction is simply an inverse measure of the precision of the description. The more abstract, the more broadly applicable the information. Gat gives the example of the location of his house (he knows roughly, but not exactly, where it is). By combining abstract state information with local sensors, he is able to find the house or the mustard in his refrigerator.
Actions based on stored internal state sometimes fail. In humans, occasional errors of this sort are not a problem for two reasons: First, people use the information in their world models to guide their actions but not to control them directly. Second, people can tell when things go wrong and can take corrective action. We carefully engineer our environment to eliminate the opportunity to make the sort of mistakes that are difficult to recover from -- falling off cliffs, for example.
The ATLANTIS action model is based on operators whose execution consumes negligible time, and thus do not themselves bring about changes in the world but instead initiate processes which then cause change. These processes are called activities, and the operators which initiate (and terminate) them are called decisions.
The architecture consists of three components: the controller, the sequencer, and the deliberator.
The controller is responsible for moment-by-moment control of the robot's actuators in response to the current value of the robot's sensors. The controller is a purely reactive control system.
To support the kinds of transfer functions needed to control the reactive robot, the ALFA (A Language For Action) language was developed. ALFA is similar in spirit to REX but the two languages provide very different abstractions. ALFA programs consist of computational models which are connected to each other and to the outside world by means of communications channels. Both the computations performed and their interconnections are specified within module definitions, allowing modules to be inserted and removed without having to restructure the communications network.
The sequencer is responsible for controlling sequences of primitive activities and deliberative computations. Controlling sequences is difficult primarily because the sequencer must be able to deal effectively with unexpected failures. This requires careful maintenance of a great deal of state information because the sequencer must be able to remember what has been done in the past to decide what to do now.
The fundamental design principle underlying the sequencer is the notion of cognizant failure. A cognizant failure is a failure which the system can detect somehow. Rather than design algorithms which never fail, we instead use algorithms which (almost) never fail to detect a failure.
The sequencer initiates and terminates primitive activities by activating and deactivating sets of modules in the controller. It can also send parameters to the controller via channels.
The sequencer is modeled after Firby's Reactive Action Package (RAP) system. The system maintains a task queue, which is simply a list of tasks the system must perform. Each task contains a list of methods for performing that task, together with annotations describing under what circumstances each method is applicable. A method is either a primitive action or a list of sub-tasks to be installed on the task queue. The system works by successively expanding tasks on the queue until they either finish or fail. When a task fails, an alternate method is tried.
The main difference between the original RAP system and the ATLANTIS sequencer is that the latter controls activities rather than atomic actions. This requires a few modifications to the RAP system. First, the system must insure that two activities that can interfere with one another are not active simultaneously. Second, if a primitive activity must be interrupted the system must be sure it is cleanly terminated (since conflicts are controlled by locking resources).
The deliberator is responsible for maintaining world models and constructing plans. It performs all manner of time-consuming computations at a high level of abstraction. The results are placed in a database where they are used as an additional source of input data by the sequencer, rather than directly controlling the robot.
Note that all deliberative computations are initiated (and may be
terminated before completion) by the sequencer. It typically consists
of a set of LISP programs implementing traditional AI algorithms. The
function of the sequencer and controller is to provide an interface
which connects to physical sensors and actuators on one end, and to
classical AI algorithms on the other.
[Hayes-Roth, 1985-present] BB1 is a blackboard system for general intelligent agent control.
The central issue to agent design is the control problem: which of its potential actions should a system perform next in the problem-solving process? In solving this problem, an agent system will decide what problems to solve, what knowledge to bring to bear, how to evaluate alternative solutions, when problems are solved, and when to change its focus of attention. In solving the control problem an agent determines its own cognitive behavior.
AI should approach the control problem as a real-time planning problem. BB1 operationalizes intelligent problem solving as achievement of the following goals:
Notice that, in contrast with SOAR, this approach does not consider learning an integral part of intelligent action (it does not purport to address intelligence, just the control problem), and it also makes an explicit distinction between domain and control activity.
The BB1 architecture achieves these goals. Its important features include
The blackboard approach is a problem-solving framework originally developed for the HEARSAY-II speech understanding system. It entails three basic assumptions:
BB1 extends the basic assumptions of a blackboard system as follows:
Control problem solving takes place on the control blackboard. The architecture specifies domain-independent solution intervals (to wit, the position in the sequence of problem-solving cycles) and knowledge sources. It does also allow domain-dependent control knowledge sources.
The abstraction levels on the control blackboard are Problem, Strategy, Focus, Policy, To-Do-Set, and Chosen-Action. Decisions at the first four levels determine desirable actions; To-Do-Set decisions describe feasible actions, and Chosen-Action controls the actions scheduled for execution. In general, heuristics at the top four levels are used to choose among pending KSARs at the To-Do-Set level to schedule them for execution at the Chosen-Action level.
A single Problem decision represents the problem the system has decided to solve, and guides the entire problem-solving episode. Creating a Problem decision on the blackboard initiates problem-solving by triggering other domain or control knowledge sources. Changing the Problem's status to solved terminates problem-solving activities.
A Problem's Goal prescribes actions whose knowledge sources apply to its domain (for example, KS Problem-Domain = multiple-task planning). Its Criterion characterizes an acceptable solution (for example, in OPM, this is four requirements: complete solution, all requested tasks planned, all constraints met, plan is efficient).
Strategy decisions establish general sequential plans for problem-solving. Ideally, a single Strategy decision created early in the problem-solving process guides the remainder of the process. Under some circumstances, though, conflicting or complementary Strategies may replace one another or operate simultaneously.
A Strategy's Goal is a prescribed sequence of actions as a predicate on one or more KSAR attribute-description pairs. For example, the Goal Action-Level = successive domain Levels which causes it to generate, in turn, successive Focus levels. The Strategy's Refocus-Criterion is a predicate that, when true, indicates the Strategy should prescribe a new Focus (an example would be complete and satisfactory solution in Focus).
(This is all rather unclear. I think the idea is that the strategy creates a Focus on some level, which holds until some criterion is met, at which point it creates a Focus for another level, and so on until all are satisfied. This implies (to me) a linear approach to strategy, much as you'd expect in a hierarchical planner, by examining successively detailed levels of the plan.)
Focus decisions establish local problem-solving objectives to execute KSARs with particular attributes and values. Sometimes a sequence of Focus decisions implements a previously created Strategy decision; other Focus decisions operate independently of one another and prior Strategy decisions. Focus decisions are explicitly temporary and operate during restricted problem-solving time intervals. Several competing or complementary Focus decisions may operate simultaneously.
Focus decisions are used to rate KSARs. A Focus decision's Goal is a predicate or function on one or more KSAR attribute-value pairs (for example, Action-Level = Outcome prescribes execution of KSARs whose actions occur at the Outcome level). Its Criterion is a predicate that, when true, indicates that the Goal is no longer useful (for example, complete solution at the Outcome level). The Weight indicates the expected utility of the prescribed actions.
Policy decisions establish global scheduling criteria favoring KSARs with particular attributes and values. In contrast to Focus decisions, Policy decisions usually remain operative from their creation until the end of the problem-solving episode. The only exception is when some other knowledge source independently determines that a Policy should become inoperative.
Like Focus decisions, Policy decisions are used to rate KSARs and can influence scheduling.
To-Do-Set Decisions identify all pending KSARs on each problem-solving cycle. They record a cycle-by-cycle history of all problem-solving actions the system can perform. Because knowledge source Triggers are event-based while Pre-Conditions are state-based (and possibly transient), a To-Do-Set's Goal distinguishes a Triggered-List and an Invocable-List of pending KSARs.
Update-To-Do-Set, a basic control KS (see below) places newly triggered KSARs on the Triggered-List. When their Pre-Conditions become true, it moves them to the Invocable-List. If any Pre-Conditions subsequently become false, it moves them back to the Triggered-List. A To-Do-Set's Only-Cycle is the only problem solving cycle on which it is active.
(I take this to mean that a To-Do-Set decision is actually a device for making a record of all triggered actions in a given cycle. -- POD)
Chosen-Action decisions identify KSARs scheduled to execute on each problem-solving cycle. Thus they record a cycle-by-cycle history of all problem solving actions the system actually performs. A Chosen-Action's Goal is the scheduled KSAR. Its Rationale lists the Focus and Policy decisions that lead to its selection. Its Consequences list the blackboard events produced by executing the KSAR's action. Its Only-Cycle is the only cycle on which it is operative.
The scheduling mechanism comprises three domain-independent, basic control knowledge sources: Update-To-Do-Set, Choose-KSAR, and Interpret-KSAR. The KSARs they generate are invoked directly when their Conditions are satisfied; since they have 'new' or 'changed' decisions in their Conditions, they are guaranteed never to be invoked more than once.
Update-To-Do-Set is invoked when the Status of the most recently created Chosen-Action changes to executed. Its Action increments the problem-solving cycle and creates a new To-Do-Set (in practice, it simply updates a single set -- it's space inefficient to keep making new ones) as follows.
Update-To-Do-Set sets the new To-Do-Set's Only-Cycle to the current cycle. It sets its Goal to the Goal of the most recent prior To-Do-Set, removing the executable KSAR from its Invocable-List. It then examines all KSs that specify the current Problem's Problem-Domain and all blackboard events listed as the Chosen-Action's Consequences. For every unique pairing of a KS and an event that satisfies the KS's Trigger, Update-To-Do-Set creates a new KSAR and adds it to the To-Do-Set's Triggered-List. Those whose Pre-Conditions are true are moved to the Invocable-List. Then it evaluates the feasibility of all the system's own potential actions except those of the basic control knowledge sources.
Update-To-Do-Set also computes and records each KSAR's Ratings against operative Focus and Policy decisions, and its Priority.
Choose-KSAR is triggered by the creation of a new To-Do-Set. Its Action schedules a KSAR according to the current scheduling rule (recorded in Policy1 -- I take this to mean that Policy1 has the rule, which may be something like sum of all weights according to any Policy and Focus decisions; very ad hoc, if you ask me).
After choosing a KSAR, Choose-KSAR reevaluates the Pre-Conditions on the chosen KSAR. If any of them is no longer true, it moves the KSAR back to the Triggered-List and chooses another KSAR. When it finds a KSAR whose Pre-Conditions are true, it creates a Chosen-Action whose Goal is the chosen KSAR, Rational is the list of Foci and Policies that favor it, Status is Scheduled and Only-Cycle is the current cycle.
Interpret-KSAR is triggered by the creation of a new Chosen-Action. Its Action interprets and executes the Action of the KSAR recorded as the Chosen-Action's Goal. It also records the Chosen-Action's Consequences (resulting blackboard events) and changes its Status to 'Executed.' Thus it executes all system actions except those of the three basic control KSs.
HEARSAY-II had a fixed scheduling system that preferred reliable, efficient knowledge sources and credible triggering information. It had no capacity for modifying its scheduling system, so it was not flexible in that respect -- it did not support dynamic control, change in heuristic control granularity, etc.
HEARSAY-III [Earman, London, Fickas] is a general blackboard environment that has separate domain and control blackboards, with a default low-level scheduler that simply schedules any pending KSAR. It provides no specific mechanism for supporting effective control behavior; however, BB1 could be implemented in HEARSAY-III.
"Intelligent agents" continuously perform three functions: perception of dynamic conditions in the environment; action to affect conditions in the environment; and reasoning to interpret perceptions, solve problems, draw inferences, and determine actions. In order to function effectively in AIS niches, agents must possess a pervasive property of human behavior: adaptation.
Two main ideas here. First, the agent maximizes its vigilance while avoiding perceptual overload, by using feedback control and predictive control to manage the global data rate underlying the perceptual strategy. Second, it acquires useful data, while avoiding distraction, by using dynamic control plans to adapt the relevance and abstraction parameters underlying perceptual strategy.
Important here is the diagram: the axes are environmental uncertainty and constraint on effective action.
Again with the diagrams. Axes are the availability of run-time data and the availability of a model.
Intelligent agents in CAIT use the BB1 dynamic control architecture as their cognitive architecture. Basically, BB1 runs a three-step cycle: (a) an agenda manager notices situation-appropriate behaviors triggered by recent perceptual and reasoning events; (b) a scheduler chooses the best available behavior based on the current control plan; and (c) the executor executes the chosen behavior.
The control plan is a key feature of BB1. Because it can be more or less abstract, the scheduler may be able to choose among several logically acceptable behaviors. There can be multiple simultaneous control plans, whose combined weights determine which available action is best. Each component plan can have its own temporal, sequential, or hierarchical structure. Finally, since the control plan is a data structure, executed actions can modify it, thereby modifying the criteria used to schedule future actions.
[Maes, 1989] The network architecture is an approach to solving the problem of action selection in an agent. It integrates conceptual elements of connectionism and traditional AI approaches, though cannot be classified as a "hybrid" in the traditional sense. It uses a connectionist computational model on a symbolic, structured representation.
In particular, from connectionism it takes robustness, intrinsic parallelism, density, and global emergent computation from local interactions, without relying entirely on learning and classification. From symbolic AI, it uses representation and structuring principles, so the links and nodes have specific, well-understood causal meanings, but it avoids the brittleness and rigidity of traditional AI systems.
The architecture is motivated by Minsky's "Society of Mind" theory, which argues that intelligent systems can be built as societies of interacting, mindless agents, each having its own specific competence.
The network consists of a set of nodes connected via links. Each node represents a competence module, which has an add list, a delete list, a precondition list, and an activation level. Nodes are connected to other nodes via three kinds of links:
Activation energy is spread to the nodes from the environment, from the global goals of the system, and between nodes. In addition, inhibition (removal of activation) is produced by protected goals and via conflicter links. Specifically,
The basic algorithm is to spread activation from goals, state and protected goals; compute activation among modules; use a decay function to keep activation constant overall; lastly, pick the executable node with the highest activation level that is over some threshold If no such node exists the activation threshold is reduced by 10%.
Two points about balance. To keep activation fair with respect to the number of preconditions, etc., the amount of activation spread is divided by n, where n is the number of (preconditions/add list propositions/delete list propositions) in the target, depending on the kind of spreading. Also, when spreading does occur to multiple competence modules, the amount of energy spread is evenly divided among the target modules.
In the reasoning process, goal-relevance of the selected action is obtained through the input from the goals and the backward spreading of activation. Situation relevance and opportunistic behavior are obtained through the input of the state and the spreading of activation forward.
The learning algorithm models much of what is called operant conditioning in the animal behavior literature. Rather than learning new behaviors, it instead learns their effects, the better to decide when they should be applied.
The learning task can be defined as follows. Given a set of competence modules with an initial condition list, a set of initial links (with reliability estimates) and a set of motivations (positive and negative ones), and given a noisy, dynamic, non-deterministic environment, the task is to have every module learn its conflictor links, successor links, and predecessor links with their respective reliabilities.
All links are augmented with weights, fractions of the form S/T, where S represents the number of successes, and T the number of trials with that link. E.g., if there is a predecessor link for a condition from one module to another, then the weight represents the number of times activating the latter achieved the condition for the former. The weights of the links represent the reliability of a particular causal relation between modules.
The weights of links are used in the spreading activation process: when one module spreads activation energy to another one, it not only spreads an amount proportional to its own activation level, but also multiplies this amount by the weight of the link.
A second change made to the model is that instead of selecting the most active module whose activation level surpasses the threshold, we now wait until the network comes to an equilibrium, and then select one of the executable modules randomly with a probability proportional to their activation levels.
Link weights are initially set to some value S0/T0. The smaller the value of T0, the more significant any changes will be, so 1/2 is very different from 5000/10000, for example.
Every competence module constantly monitors its conditions. Whenever a condition changes state, the module holds the module that was most recently active as responsible for this change of state. If there was no link yet from the module to the last module for the condition that changed value, a new link is created with initial weight S'/T'. If the change of state was such that the condition changed from being observable (true) to being unobservable, then the link created is a conflictor link. If the condition became observable, the link is a predecessor link.
If there already exists a link between the monitoring module and the most recently active module, then the weight of the link is increased; specifically, the weight goes from S/T to S+1/T+1. Notice that such a value will converge to 1 if there is a strong correlation.
Every module monitors its existing predecessor and conflictor links. Whenever it notices a counter-example for a link, i.e., the condition did not change state as the predecessor or conflictor module became active, the link weight is changed from S/T to S/T+1.
Every module also monitors which module is active right after the
monitoring module has been active itself. If it did not yet have a
successor link to that module, it creates one with initial weight
S'/T'. If there was already a weight, it is increased from
S/T to S+1/T+1. The weights of all other successor
links are decreased from S/T to S/T+1.
[Bates et al., 1984-present] The first (?) comprehensive believable agent architecture, used in the Oz project at Carnegie-Mellon University. Their central goal is to allow users to suspend disbelief; thus instead of demanding that their agents be especially active and smart, they only require that they not be clearly stupid or unreal.
The primary capabilities Oz agents should have are: perception, reactivity, goal-directed behavior, emotion, social behavior, natural language analysis, and natural language generation.
An Oz world has four components. There is a simulated physical environment, a set of automated agents to populate the world, a user interface to allow one or more users to participate in the world, and a planner concerned with the long-term structure of the user's experience.
Tok is the Oz agent architecture. It assigns the agent's tasks to several communicating components. Perception, while partially task specific, is also in part handled by a pair of systems called the Sensory Routines and the Integrated Sense Model. Reactivity and goal-directed behavior are handled by Hap. Emotion and social relationships are the domain of Em. Language analysis and generation are performed by Gump and Glinda, respectively.
Raw sense data is extracted from the world and recorded by Sensory Routines. The relationships between objects are represented as links, thus creating a topological graph of the newly encountered world fragment. The new data is marked with the agent's internal notion of time. When Hap behaviors execute, this low-level memory of raw sense data can be queried for information such as "have I seen food in the kitchen in the last ten minutes?".
After raw data is recorded in the Sensory Routines, an attempt is made to merge them into the Integrated Sense Model (ISM), which maintains the agent's best guess about the physical structure of the whole world.
Note that sense data are either active or passive in Oz worlds. Active sense data model events of sufficient intensity and intrusiveness that they will be sensed by the agent. Passive data is intended to model events which, in order to sense, require action on the part of the agent.
Hap is Tok's goal-directed, reactive engine. It is similar in intent to the work of Brooks and Agre and Chapman, but aims to capture a "richer range" of high-level actions. It continuously chooses the agent's next action based on perception, current goals, emotional state, behavioral features and other aspects of internal state. Goals in Hap contain an atomic name and a set of parameters that are instantiated when the goal becomes active; for example, (goto <object>).
Goals do not characterize world states to accomplish, and Hap does no explicit planning. Instead, sets of actions (called plans) are chosen from an unchanging plan memory which may contain one or more plans for each goal. These plans are either ordered or unordered collections of subgoals and actions which can be used to accomplish the invoking goal. Plans have testable preconditions which are true when the plan could apply in the current state of the world. If a plan fails, Hap will attempt any alternate plans for the given goal.
Hap stores all active goals and plans in a structure called the active plan tree (APT). This is an AND-OR tree, with alternating layers of goals and plans representing the current execution state. Each plan node in the tree has some number of goal nodes as its children. These goal nodes are the subgoals which must be satisfied for the plan to succeed. Each goal node either has no children (and is therefore a leaf node) or its child is the current plan for the goal. The APT expands and contracts as goals and plans succeed and fail.
A plan node accomplishes its purpose (it succeeds) once all of its children succeed. It fails if there is no way for one of its subgoals to succeed. A plan can have an explicit success test which must also evaluate to true for the plan to succeed. This test is evaluated after the plan has finished. Plan nodes can either be sequential or parallel; the subgoals of a sequential plan node must be accomplished in order, while the subgoals of a parallel plan may be accomplished in any order.
At the root of the tree is a special node, parent to all of the top-level goals of the agent. These top-level goals are classified according to whether they are persistent. Persistent goals are continuous goals and are never removed from the tree.
Each goal node has an associated success-test and priority. The success-test is used to recognize when a goal's purpose in the enclosing plan is spontaneously achieved. It is an expression about the state of the world. If it succeeds, the node and the subtree rooted at the node are removed. Success tests are sufficient but not necessary for a goal to succeed. The priority is a number representing how important the goal is to the agent. Preferring higher priority goals is one of the methods used by the architecture to arbitrate between multiple goals.
Each plan node carries an associated context condition and specificity. The context condition must always be true for the plan to have any chance of making sense and achieving its goal. Thus it is necessary but not sufficient for the plan to work. Specificity is a measure of how specific each plan is.
Every instance of a goal has a priority number used when choosing a goal to execute and an importance number used by Em when considering the significance of the goal. These annotations are assigned to instances rather than types of goals because goals may have different values depending on the context in which they arise.
In the Lyotard system, related goals and plans are clustered into behaviors. Each behavior represents a recognizable, internally coherent unit of action. These behaviors are usually activated by a single goal.
Plan memory is a set of production rules. The left-hand side of each production is composed of a goal expression and a precondition expression. The goal expression contains the goal name and zero or more variables. For a subgoal to match a goal expression, the names must match and the number of variables must match. In addition, the precondition expression must be true in the world state for the production to be applicable. The precondition expression can reference the values from the goal expression variables in its tests.
The right-hand side of a production contains its context condition, specificity and plan expression. Together they are instantiated to create the plan and subgoal nodes for the APT. Each plan step contains a goal expression, a priority modifier and a success-test expression. The goal expression and success test expression are used directly to create the appropriate goal node, and the priority modifier is added to the priority of the parent goal node to create the priority of this node. In this way a subgoal can be more, less or the same level of importance as the goal that spawned it.
The execution loop consists of three steps: adjust the APT based on changes in the world, pick a leaf goal to execute, and execute the goal. Executing the goal can take the form of performing a primitive act, or expanding a subgoal.
After sense data is processed, Hap revises the APT. All context conditions and success tests are evaluated. Goals whose success test is true and plans whose context condition is false are removed. Next a leaf node is chosen by a goal arbiter that prefers high priority goals and prefers continuing a line of expansion among goals of equal priority. If the chosen goal is a primitive action, it is executed. Otherwise, it is a subgoal, in which case the plan library is indexed and the plan arbiter chooses a plan for the new goal from those whose preconditions are true. The plan arbiter will not choose plans which have already failed to achieve this goal instance, and prefers more specific plans over less specific ones. After execution, the execution loop repeats.
Em models social and emotional aspects of the agent, based on the work of Ortony. Like that work, Em develops emotions from a cognitive base: external events are compared with goals, actions are compared with standards, and objects are compared with attitudes.
As Hap runs, goals are created, succeed, and fail. As these events occur, Hap informs Em, and Em in turn uses this information to generate many of its emotions. Happiness and sadness occur when the agent's goals succeed or fail. The degree of happiness or sadness depends upon the importance of the goal. Hope and fear occur when Em believes there is some chance of an active goal succeeding or failing. The amount is determined by a function of the goal's importance and the believed likelihood of success or failure.
Pride, shame, reproach, and admiration arise when an action is either approved or disapproved. These judgments are made according to the agent's standards, which represent moral beliefs and personal standards of performance. Pride and shame occur when the agent itself performs the action; admiration and reproach in response to others' actions.
Anger, gratitude, remorse and gratification arise from combinations of other emotions. So, for example, sadness and reproach combine to produce the composite emotion of anger.
Em's final two emotions, love and hate, arise from noticing objects toward which the agent has positive or negative attitudes. Emotions (but not attitudes), should fade with time, and Em models this decay.
Behavioral features modulate the activity of Hap. Em adjusts the features to express emotional influence on behavior. It continuously evaluates a set of functions that control certain features based on the agent's emotional state. Hap modifies the features when it wants to force a style of action. Note that either Hap or Em can change the features to control how Hap achieves its goals.
Features may influence several aspects of Hap's execution. They may trigger demons that create new top-level goals. They may occur in the preconditions, success tests, and context conditions of plans, and so influence how Hap chooses to achieve its goals. Finally, they may affect the precise style in which an action is performed.
Examples of features in Lyotard include curious, belligerent,
persistent, timid, and arrogant.
[Agre and Chapman, 1988-1990] Human beings routinely engage in the process of planning, which involves rearrangement, interpolation, disambiguation, and substitution. Computer-based planners instead often presumes an intelligent Planning system feeds its results to a dumb Executor. But this sort of approach does not scale to the properties of the real world -- it is complex, uncertain, and immediate.
The hypothesis is that most activity derives from very simple sorts of mental machinery interacting with the immediate situation, producing apparently planful activity without building explicit models of the world.
Pengo is a video game played on a two-dimensional maze made of unit-sized ice blocks. The player navigates a penguin around the maze. Bees chase the penguin, and kill him if they get close enough. The penguin and the bees can change the maze by kicking ice blocks to make them slide; if a block slides into a creature, it dies.
Pengi's activity is guided by relevant properties of the immediate situation (called indexical-functional aspects). The indexicality of these aspects arises from their dependence on the player's current circumstances. They are functional because they are relative to the player's purposes.
Note that this approach does not represent individuals (the-bee-I'm-chasing might change from moment to moment), and therefore avoids the overhead of instantiation: binding constants to variables. In most KR systems, to decide to kick a block at a bee requires reasoning from some general statement that in every situation satisfying certain requirements there will be some bee and some block that should be kicked at it. To make this concretely useful, it must be instantiated with particular individuals.
Entities are not logical categories because they are indexical: their extension depends on the circumstances.
The simple architecture used in Pengi consists of a central system and peripheral systems. The central system is responsible, roughly, for cognition: registering and acting on relevant aspects of the situation. The peripheral systems are responsible for perception and effector control.
It's assumed that combinational networks are an adequate central system for most activities.
[Ullman, 1963] developed a theory of vision based on visual routines which are patterns of interaction between a central system and a visual routines processor (VRP). The VRP maintains several modified internal copies of the 2-D sketch produced by early vision. It can perform operations on these images, such as coloring in regions, tracing curves, keeping track of locations in the image, indexing interesting features, and tracking moving objects.
Outputs from the central network guides the VRP's operations, and its outputs are inputs to the network. A visual routine, then, is a process whereby the VRP, guided by the network, finds entities and registers aspects of the situation, and finally injects them into the inputs of the network.
Actions are suggested on the basis of local plausibility, but suggested actions may conflict. In such cases, only one action must be selected. Usually, which action to take will depend on aspects of the situation; sometimes, certain actions always take precedence over others. The system employs levels of arbitration in which actions are suggested, may be overruled; the overruling can be overruled or counter-proposals may be suggested, and so on.
This approach is more efficient than Planning because it neither requires representation nor search of future worlds. Note this implies a lack of state; this is not to say that state is never needed, but that it is much less critical than is often assumed.
There are two fundamental kinds of plans: plans-as-programs, in which plan use is simply a matter of executing a fixed sequence of actions, and plans-as-communications, which require improvisation in a dynamic environment.
This view takes plan use to be like program execution, in which a set of primitives (such as PUT-ON) are composed into a structure which is then executed in a mechanical manner, with little or no new reasoning performed during execution. This view implies domain-independent plan execution.
Four things weighing against this view are:
Perhaps it is better to think of agents as participating in the flow of events. An embodied agent must lead a life, not solve problems; the real world does not provide such neat packages.
The details of the thesis are obscure to me. The idea is that, rather than being given a worked-out set of instructions to follow, instead the user is given a set of situated recommendations or guidelines, that are situationally appropriate either for the time they are given or the time they are meant to be used.
The domain of video games is explored. Video game instructions from one player to another can be highly compact because their possible meanings are constrained by
Communication doesn't pick up a "meaning" from my head and set it
down in yours. Instead, communication is part of the work of
maintaining a common reality.
[Carbonell, Knoblock, and Minton, 1990's] The PRODIGY system is an integrated architecture unifying problem solving, planning and multiple learning methods.
PRODIGY has the following characteristics:
A summary of each PRODIGY module:
These modules all use a common representation based on first-order logic, and all access global knowledge sources. Any learned knowledge produced by one module is directly exploitable by the general problem solver and open to examination by another learning module.
PRODIGY's basic reasoning engine is a general-purpose problem solver and planner that searches for sequences of operators (i.e., plans) that accomplish a set of goals from an initial state description. Search control in PRODIGY is governed by a set of control rules that apply at each decision point: selection of which node to expand, which goal to work on, which operator to apply, and which objects to use. Search control rules may be general or domain specific, hand-coded or automatically acquired, and may consist of heuristic preferences or definitive selections. In the absence of any search control, PRODIGY defaults to depth-first means-ends analysis.
Each problem-solving operator has a precondition expression that must be satisfied before the operator can be applied, and an effects-list that describes how the application of the operator changes the world. Precondition expressions are well-formed formulae in PDL, a form of predicate logic encompassing negation, conjunction, disjunction, and existential quantification, as well as universal quantification over sets. The effects-list indicates atomic formulae that should be added or deleted from the current state when the operator is applied.
At any particular moment, the current state of the world is represented by a database of ground atomic formulae. There are two types of relations used in the system, primitive relations and defined relations. Primitive relations are directly observable or "closed-world," so their truth values can be immediately ascertained by a call to the state database. Primitive relations may be added to and deleted from the state database by operators. Primitive relations that are static (such as time relations, for example) cannot be changed by any operator and can be computed on demand.
Defined relations (such as IDLE or CLAMPABLE) are inferred on demand using inference rules. The purpose of defined relations is to represent useful abstractions in the domain, allowing operator preconditions to be expressed more concisely. Inference rules only change the system's internal knowledge state, whereas operators change both the external world and the system's internal state.
The problem-solving process begins with a search tree containing one node representing the initial state, and a set of initial goals that must be achieved to satisfy the goal expression. The tree expands by repeating the following steps:
Search terminates after creating a node whose state satisfies the top-level goal expression.
As PRODIGY attempts to solve a problem, it must make decisions about which operator to use and which subgoal to pursue. These decisions can be influenced by control rules for the following purposes:
PRODIGY's reliance on explicit control rules distinguishes it from most domain-independent problem solvers. PRODIGY expects that any important decisions will be guided by the presence of appropriate control knowledge. If no control rules are relevant, PRODIGY makes a quick, arbitrary choice. If a wrong choice is made and costly backtracking proved necessary, PRODIGY will attempt to learn the control knowledge that must be missing. The rationale for this casual commitment strategy is that control rules should be present for significant decisions. (The problem solver should not attempt to be clever without knowledge.)
Control rules can be employed to guide the four decisions mentioned in the previous section. Each control rule has a left-hand side condition testing applicability and a right-hand side indicating whether to SELECT, REJECT, or PREFER a particular candidate.
To make a control decision, given a set of candidates, PRODIGY first applies applicable selection rules to select a subset of the candidates. Next, rejection rules further filter this set by explicit elimination of particular remaining candidates. Finally preference rules are used to order the remaining alternatives. The most preferred candidate is one over which no other candidate is preferred; if backtracking is needed, candidates are tried in order of preference.
Explanation-based learning is a way for a problem solver to improve effectiveness by learning methods for solving problems. However, EBL is not guaranteed to improve problem solving. Indeed, in may cases performance may actually degrade. The trouble is that there is a hidden cost associated with such learned control knowledge -- the cost of determining whether it is applicable. To produce real efficiency improvement, an EBL program must generate control knowledge that is effective -- its benefits must outweigh its costs. PRODIGY considers what kinds of concepts are worth learning, how best to represent the knowledge, and how useful it is with respect to the set of training problems. In this sense PRODIGY is a deliberative rather than reflexive learner such as SOAR.
EBL begins with a high-level target concept and a training example for that concept. Using a domain theory (a set of axioms describing the domain) one can explain why the training example is an instance of the target concept. By finding the weakest conditions under which the explanation holds, EBL will produce a learned description that is both a generalization of the training example and a specialization of the target concept. The learned description must satisfy the operationality criterion, a test insuring that the description will be an efficient recognizer for the target concept.
The actual purpose of EBL is not to learn more about the target concept, but to re-express the target concept in a more "operational" manner. The following factors can contribute to performance degradation:
After each problem-solving episode, explanations are generated in a three-stage process. First, the system considers what to learn from a problem-solving trace, then it considers how to best represent the learned information, and finally it empirically tests the utility of the resulting control rules to insure that they are actually useful.
In PRODIGY, EBL can either be interleaved with problem solving or postponed until after the current problem has been solved. In either case, the EBL process begins by examining the problem-solving trace to pick out examples of PRODIGY's target concepts. There are four kinds of concepts; each kind has a variation for nodes, goals, operators and bindings.
After an example of a target concept is selected, PRODIGY constructs an explanation using a set of architecture-level axioms that serve as its theory of the problem solver, and a set of domain-level axioms that are automatically derived from the domain operators.
PRODIGY constructs the explanation by specializing the target concept in accordance with the example using a method called explanation-based specialization (EBS) [Minton]. To specialize a target concept, EBS retrieves an axiom that implies the concept (all axioms are implications) and recursively specializes all its nonprimitive subconcepts. The algorithm operates in a manner similar to that of a top-down parser. In order to choose among appropriate axioms, PRODIGY allows explicit discriminator functions to be associated with the theory. These functions aid the explanation construction process by examining the search tree and selecting the axioms corresponding to the example.
PRODIGY learns meta-level concepts (such as SUCCEEDS) that describe arbitrary problem-solving phenomena, rather than learning from solution sequences. This is a significant difference between it and other systems.
The purpose of compression is to reduce the match cost of the learned descriptions (and thereby increase the utility of the resulting search control rule). After the initial learned description is formed, PRODIGY's compression module employs partial evaluation, simple logical transformations, and finally a theorem prover which can use domain-specific simplification axioms.
Utility is given by this formula:
After learning a control rule, the system produces an initial estimate of the rule's utility based on the training example that produced it. PRODIGY compares the cost of matching the rule against the cost of exploring the portion of the search tree that need not be considered because of it. If the new rule appears useful, it is included in the active set. The utility estimate is empirically validated by maintaining statistics on the rule during execution, and if utility becomes negative the rule is forgotten.
PRODIGY is capable of planning at different levels of abstraction. Basically, it picks an "appropriate" level of abstraction, then refines its abstract plans. Abstraction spaces are formed by ignoring certain properties in the original problem space. That is, each level is an abstract model of lower ones, rather than simply dropping preconditions or goals.
(There is some lengthy discussion of using the ALPINE system [Knoblock] in generating abstraction spaces, which I don't summarize here. -- POD)
PRODIGY has been applied to many domains: robotic path planning
and assembly, the blocks world, the STRIPS domain, an augmented
version of the STRIPS domain, matrix algebra manipulation,
machine-shop scheduling, multi-agent robot planning, and others.
[Georgeff and Lansky, 1987] The Procedural Reasoning System (PRS) is a reasoning system that can reason and plan in continuously changing environments. Emphasis is placed on the fact that it can reason effectively about its actions and plans while taking into account bounded resources. The system was initially developed to deal with malfunction handling in the space shuttle's reaction control system (RCS).
PRS is compared to Firby's RAP, which is similar but lacks "sufficiently powerful" mechanisms for balancing decision-making requirements against time and information constraints. Brooks and the situated automata approach, they argue, are reactive but do not deal with general problem-solving and commonsense reasoning issues. All are needed in a general agent architecture.
The essential features of a situated reasoner are:
PRS consists of
An interpreter (or inference mechanism) manipulates these components, selecting appropriate plans based on the system's beliefs and goals, placing those selected on the intention structure, and executing them.
The system interacts with the environment through the database (which acquires new beliefs in response to changes in the environment) and through the actions it performs as it carries out its intentions.
The contents of the database represent the current beliefs of the system. These include static facts about the world (e.g., the boiling point of water) as well as beliefs acquired during execution, such as current observations about the world or conclusions derived by the system from such observations. The knowledge contained in the database is represented in first-order calculus.
State descriptions that describe internal system states are called metalevel expressions. These typically describe the beliefs, goals, and intentions of the system, as well as other processing information.
Goals are expressed as conditions over some interval of time (i.e., some sequence of world states) and are described by applying various temporal operators to state descriptions. This allows the representation of a wide variety of goals, including goals of achievement, maintenance, and tests of conditions. A given action (or sequence of actions) is said to succeed in achieving a goal if its execution results in a behavior that satisfies the goal description.
As with state descriptions, goal descriptions are not restricted to specifying desired behaviors of the external environment but can also characterize the internal behavior of the system. Such descriptions are called metalevel goal specifications.
Knowledge about how to accomplish goals or react to situations is represented by declarative procedure specifications called Knowledge Areas (KAs). Each KA consists of a body, which describes the steps of the procedure, and an invocation condition, which specifies under what situations the KA is applicable. Together, the invocation condition and body of a KA express a declarative fact about the results and utility of performing certain sequences of action under certain conditions.
The body of a KA is a plan or plan schema. It is represented as a graph with one distinguished start node and one or more end nodes. The arcs in the graph are labeled with the subgoals to be achieved in carrying out the plan. Successful execution of a KA consists of achieving each of the subgoals labeling a path from the start to an end node. If multiple arcs emanate from a node, any arc may be followed, and if it fails other arcs may be tried. This formalism allows conditional selection, iteration, and recursion, among other things.
The invocation condition contains a triggering part describing the events that must occur for the KA to be executed. Usually these consist of the acquisition of some new goal or some change in system beliefs, and may involve both.
Some KAs have no bodies; these so-called primitive KAs have some primitive action associated with them that is directly performable by the system.
PRS not only has KAs with procedural knowledge about a particular task domain, but also includes metalevel KAs -- that is, information about the manipulation of the beliefs, desires, and intentions of PRS itself. For example, typical metalevel KAs encode various methods for choosing among multiple applicable KAs, modifying and manipulating intentions, and computing the amount of reasoning that can be undertaken given the real-time domain constraints.
The intention structure contains all those tasks the system has chosen for execution, either immediately or at some later time. These tasks are called intentions. A single intention consists of some initial KA together with all the [sub] KAs that are being used in an attempt to successfully execute that KA.
At any given moment, the intention structure may contain a number of intentions, some of which may be suspended or deferred, some of which may be waiting for certain conditions, and some of which may be metalevel intentions for deciding among various alternative courses of action.
The set of intentions comprising the intention structure form a partial ordering with possibly multiple least elements (roots). An intention earlier in the ordering must either be realized or dropped (i.e., removed from the intention structure) before intentions appearing later in the structure can be executed. This precedence relationship between intentions allows the system to establish priorities, etc.
The interpreter runs the entire system. At any particular time, certain goals are active in the system and certain beliefs are held in the database. Given these, a subset of KAs will be applicable (i.e., invoked). One or more of these applicable KAs will be chosen for execution and thus placed in the intention structure.
To determine a KA's applicability, beliefs and goals are matched with invocation conditions via unification, rather than using deduction. However, PRS is always able to perform deductions by invoking appropriate metalevel KAs. These metalevel KAs are themselves interruptible, to maintain the system's reactivity.
Once selected, the chosen KAs are inserted into the intention structure. If a selected KA arose due to the acquisition of a new intrinsic goal or a new belief, it is inserted into the structure as a new intention. Otherwise, the KA instance must have arisen as a result of some operational goal of some existing intention, and will be pushed onto the stack of KAs comprising that intention.
Finally, one of the roots of the intention structure is selected for further execution. The next step of that intention will comprise either a primitive action or one or more unelaborated subgoals. If the former, the action is performed; if the latter, these subgoals are posted as new operational goals of the system.
At this point the interpreter cycle begins again: the newly established goals and beliefs trigger new KAs, one or more of these are selected and placed on the intention structure, and finally an intention is selected from that structure and partially executed.
PRS is designed to function acceptably as an embedded reasoning system no matter how little decision knowledge is provided; the more knowledge, the more effective it is, but always with a guaranteed upper bound on reaction time.
This is accomplished by incorporating into the interpreter certain fixed decision-making processes that are stringently bounded in execution time, yet which can be overridden whenever the system can bring more powerful decision-making knowledge to bear. In PRS, the fixed decision-making processes are hardwired into the basic system interpreter, where the knowledge to override or elaborate these decisions is contained in appropriate metalevel KAs.
The way metalevel KAs are brought to bear on a problem is through their invocation criteria. Typically, such conditions depend on the system's internal state (though they may also depend on the world state). Such conditions might include, for example, the applicability of multiple KAs in the current situation, the failure to achieve a certain goal, or the awakening of some previously suspended intention.
In the process of execution, metalevel KAs typically make choices about the adoption of lower-level KAs, the posting of alternative subgoals, or the best way to respond to goal failures. Thus, when decisions have to be made in the absence of information, the interpreter defaults to some fixed decision procedure. However, if metalevel KAs can bring knowledge to bear on the problem, this will override the default.
This allows PRS to have a simple, basic interpreter, without it needing to consider more than the most common situations. Note, though, that the decision-making behavior of PRS is strongly influenced by the choice of invocation conditions of meta-level KAs; if they are such that meta-KAs are often invoked, the system will spend more time making decisions than acting (i.e., behaving cautiously). If, on the other hand, meta-level KAs are rarely invoked the system will behave in a bold manner with respect to its world.
In PRS, an intention can be in one of three possible states: active, suspended, or conditionally suspended. An active intention is one that is available for execution as soon as it becomes a root of the intention structure. A suspended intention is one that PRS has adopted but for which no decision has been made as to when it should be carried out; that is, it must be explicitly activated before it can be executed. A conditionally suspended intention (or simply conditional intention) is one that is temporarily suspended until some other specified condition (its activation condition) is satisfied.
Intentions are suspended by means of metalevel KAs. When a conditional intention is reactivated, it is necessary to decide whether or not to reorder the intention structure, and what to do, if anything, with current executing intentions. In the absence of any decision knowledge, it is desirable to ensure immediate response to the condition that activated the intention. Thus the basic interpreter is designed to begin execution of the reactivated intention immediately, suspending all currently executing intentions. Of course, metalevel KAs can override this decision.
Various metalevel KAs exist for reordering intentions on the structure. In the absence of any decision knowledge the basic interpreter inserts new intentions at the root of the existing structure, and removes them when completed. However, metalevel KAs can alter the ordering based on priorities, time, etc.
The role of commitment to previously adopted plans or intentions has been claimed to be a critical component of rational agency. Unless some metalevel KA intervenes, PRS will perform its means-ends reasoning and other planning in the context of its existing intentions. So, for example, if it has a plan for achieving a goal, it will not reconsider other ways of achieving it while it works on the plan. The gain is in reducing decision time -- in highly dynamic domains it is not possible to continually reassess one's plans of action.
Once a goal is established, PRS will try to solve it. If its solution should fail, it will try another one, and so on until it has exhausted all possibilities (or a meta-level KA changes the situation). The basic system interpreter tries each possible solution exactly once. Obviously the meta-level KAs can change the search strategy, cause it to attempt a solution more than once, etc.
Response time is the time the system takes to recognize and respond to an external event. Thus, a bound on reaction time (that is, the ability of the system to recognize or notice changes in its environment) is a prerequisite for providing a bound on response time.
If p is an upper bound on the execution of primitive actions, n is the upper bound on the number of events that can occur within a time unit, and t is the most time the PRS interpreter takes to select the set of applicable KAs, then the maximum reactivity delay is DR = p/(1 - nt) where we assume that t < 1/n.
In testing, PRS met every criterion outlined by Laffey et al. for evaluating real-time reasoning systems: high performance, guaranteed response, temporal reasoning capabilities, support for synchronous inputs, interrupt handling, continuous operation, handling of noisy data, and shift of focus attention.
The features of PRS that the authors believe contributed most to
its success in the RCS domain were (1) its partial planning strategy,
(2) its reactivity, (3) its use of procedural knowledge, and (4) its
meta-level (reflective) capabilities. In particular, the manner in
which the system integrates its means-ends reasoning with the use of
decision knowledge is considered an important part of rational
activity.
[Albus, 1991] The Reference Control System (RCS) is a reference model architecture for intelligent real-time control systems. It partitions the control problem into four basic elements: task decomposition, world modeling, sensory processing, and value judgment.
RCS-1 was developed by Barbera in the mid 1970's. RCS-2 was developed for NIST by Barbera and others in the mid 80's. RCS-3 was created by Albus in 1988 for the Multiple Autonomous Undersea Vehicle project of DARPA and adapted for the NASA Standard Reference Model Telerobot Control System Architecture.
[Rosenschein, Kaelbling, late 1980's] The theoretical foundation of the situated automata approach is based on modeling the world as a pair of interacting automata, one corresponding to the physical environment and the other to the embedded agent.
The world and the agent are each modeled as an automaton with local state that varies as a function of signals projected from the other. Situated-automata theory provides a principled way of interpreting data values in the agent as encoding facts about the world expressed in some language whose semantics are clear to the designer. Such semantics of data structures are defined in terms of objective correlations with external reality.
In this approach, a machine variable x is said to carry the information that p in world state s, written s |= K(x, p), if for all world states in which x has the same value as it does in s, the proposition p is true.
A useful way to structure the design process.
Brooks has instead advocated horizontal decompositions that cut across perception and action; this allows the designer to consider simultaneously those limited aspects of perception and action needed to support particular behaviors. This discourages the pursuit of "spurious generality" that can stall real progress.
These attractive features are offset, however, by the degree to which horizontal decomposition encourages linear thinking. It does not encourage the development of elements that can recombine with one another to produce complex patterns of behavior. The alternative is a vertical decomposition based on separate system modules that recover information from multiple sources and use it for multiple purposes; this is a more attractive approach to making efficient use of programmer effort.
One way to characterize an agent's states semantically is in terms of the information they embody. The perception component delivers information, and the action component maps this information to action. In many cases, however, it is more natural to describe actions as functions both of information and goals.
There are two kinds of goals: static and dynamic. Static goals are statements the agent's behavior is designed to make true. Dynamic goals are the more normal meaning, and we can define them this way. Having a goal p is having information that p implies a fixed top-level goal called N for "Nirvana." Formally, a goal operator G is defined as follows:
In this model, x has the goal p if x carries the information that p implies Nirvana. Note that p can be an indexical statement whose truth varies with time. Under this definition, goals and information are dual concepts.
The deductive closure axiom:
The subgoaling axiom:
This says that if the agent has q as a goal, and carries information that q is implied by some other, more specific condition p, the agent is justified in adopting p as a goal.
Rex is a language with LISP's recursive power, used at compile time to specify a synchronous digital circuit.t However, it is only a low-level, operational language more akin to standard programming languages than declarative AI languages. Ruler is a higher, declarative language based on "informational" semantics used to specify the perception component of the agent. Gapps is based on "goal" semantics and is meant to be used to specify the action component of the agent.
Gapps is a language for specifying behaviors of computer agents that retains the advantage of declarative specification, but generates run-time programs that are reactive, do parallel actions, and carry out strategies made up of very low-level actions.
The Gapps compiler takes as input a declarative specification of the agent's top-level goal and a set of goal-reduction rules, and transforms them into the description of a circuit that has the output of the perception component as its input, and the output of the agent as a whole as its output. The agent's output may be divided into a number of separately controllable actions.
There are three primitive goal types: goals of execution, achievement, and maintenance.
Goals of execution are of the form do(a), where a is an instantaneous action the agent can perform; the goal is simply to perform the action. Maintenance goals, called maint(p), strive to maintain the truth of proposition p as long as possible. Achievement goals, of the form ach(p), try to bring about the truth of p as soon as possible.
The set of goals is made up of the primitive goal types, closed under the Boolean operators. The notions of achievement and maintenance are dual, so ~ach(p) == maint(~p) and ~maint(p) == ach(~p).
We need the notion of an action leading to a goal. Informally, an action a leads to a goal G (a ~> G) if it constitutes a correct step towards the satisfaction of the goal. For an achievement goal, the action must be consistent with the goal's eventually being true; for a maintenance goal, if the condition is already true, the action must imply that it will be true at the next instant of time.
The leads to operator must have the following additional properties:
Goal reduction rules are of the form (defgoalr G G'), and mean that goal G can be reduced to G'; that is, G' is a specialization of G and implies G. By the definition of leads to, any action that leads to G' will also lead to G.
A program is a finite set of condition-action pairs, in which the condition is a run-time expression (a piece of Rex circuitry) and an action is a vector of run-time expressions, one corresponding to each primitive output field.
A program P consisting of condition-action pairs { <c1, a1>, ... <cn, an>} is said to weakly satisfy a goal G if, for every condition ci, if that condition is true, then the corresponding action ai leads to G.
The domain of a program is the class of situations in which it does specify an action, e.g.,
A goal G is strongly satisfied by program P if it is weakly satisfied by P and dom(P) = true; that is, if for every situation, P supplies an action that leads to G. It may be that multiple conditions are true simultaneously; in that case a program execution may choose among them nondeterministically.
A Gapps program is made up of a set of goal-reduction rules and a top-level goal expression. The general form of a goal reduction rule is (defgoal goal-pat goal-expr) where
goal-pat ::= (ach pat rex-params) | (maint pat rex-params)
goal-expr ::= (do index rex-expr) |
(and goal-expr goal-expr) |
(or goal-expr goal-expr) |
(not goal-expr) |
(if rex-expr goal-expr goal-expr) |
(ach pat rex-expr) |
(maint pat rex-expr)
where index is a keyword, pat is a compile-time pattern with unifiable variables, rex-expr is a Rex expression specifying a run-time function of input variables, and rex-params is a structure of variables that becomes bound to the result of a rex-expr.
The main evaluation procedure operates upon these rules as you would expect (I won't type them all in), so do creates a primitive condition-action rule that requires the function defined by rex-expr to occur at the index element of the vector, for example, e.g.,
is the function called by eval(G) when first G is a do.
Once a goal expression has been evaluated, yielding a program, a circuit is generated that instantiates the program. (Example is given in the paper of inputs feeding into a cascading sequence of IF circuits.) Because any action whose associated condition is true is sufficient for correctness, the conditions are tested in an arbitrary order that is chosen at compile time. The output of the circuit is the action corresponding to the first condition that is true. (If no condition is satisfied, an error action is output to signal the programmer that he has made an error.) If, at the final state of circuit generation, there are still 0 components in an action vector, they must be instantiated with an arbitrary value.
It is possible to specify goals in terms of prioritized lists or prioritized conjunctions. In the former, the goal consists of a list of subgoals ordered preferentially (so the first is preferable to the second, etc.); this is implemented at circuit generation time by concatenating programs in priority order, and executing the first action whose corresponding condition is satisfied.
Prioritized conjunctions are similar to prioritized sets of goals in which the most preferred goal is the entire conjunction, and less preferred goals are conjunctions of shorter and shorter prefixes of the goal sequence. This is a convenient high-level construct for programming flexible reactive behavior without the need for the programmer to explicitly envision every combination of conditions in the world.
The program resulting from the Gapps-evaluation of a goal can be
thought of as a universal plan, mapping situations to actions in
service of the top-level goal. However, Schoppers' approach differs
from Gapps in that the user specifies the capabilities of the agent in
an operation-description language. This language allows the user to
specify a set of atomic capabilities of the agent, called operators,
and the expected effect that executing each of the operators will have
on the world, depending possibly on the state of the world in which
the operator was executed.
[Laird, Newell, et al., 1982-present] SOAR is an architecture for a system that is intended to be capable of general intelligence.
SOAR is the end of a long line of intelligent reasoners that began with the Logic Theorist in 1956. The line runs through GPS and the OPS series (and others). SOAR's roots include the concept of a cognitive architecture, the idea of an instructable production system, and the extension of problem spaces to routine behavior.
SOAR uses explicit symbolic representation of its tasks, which are manipulated by symbolic processes. It uses a general scheme of goals and subgoals to represent what the system wants to achieve and to control its behavior. In this section the fundamental design choices are outlined.
Principle: Uniform task representation by problem spaces. Every task in SOAR (choosing states, selecting operators, etc.) is formulated as finding a desired state in a problem space. Thus all tasks take the form of heuristic search. If immediate knowledge is inadequate, subgoals are set up to make the decision.
Principle: Any decision can be an object of goal-oriented attention. In fact, subgoals can be set up for any problematic decision; this is called universal subgoaling. Since subgoals can be made of subgoals, and goals can be made to search for any needed information, SOAR can reflect on its own problem solving behavior to arbitrary levels.
Principle: Uniform representation of all long-term knowledge by a production system. All long-term knowledge in SOAR is represented in a production system. Productions can contain control knowledge or procedural knowledge. All data structures examinable by productions (that is, data in declarative form) is in the production system's short-term working memory. The long-term storage of this knowledge is in productions that have actions that generate these data structures.
SOAR's production system (based upon OPS5) fires all satisfied productions in parallel, without conflict resolution. Productions can only add data to working memory. Modification and removal of data elements is done by the architecture. SOAR 5 retracts the actions of production instantiations that no longer match.
Principle: Search control knowledge expressed in preferences. Search control knowledge arises in data generated by firing productions. One such data element, the preference, represents knowledge about how SOAR should behave in the current situation. A fixed decision procedure interprets the accumulated preferences to determine the next action. This procedure is simply the embodiment of the semantics of basic preference concepts, and contains no task-dependent knowledge.
Principle: Goals are created only to cope with impasses. When problem solving cannot continue -- that is, when the decision procedure fails because it doesn't know what to do next -- the system has reached an impasse. At this point the architecture creates a goal for overcoming the impasse. Thus only the architecture introduces subgoals; this principle is called automatic subgoaling.
Principle: Continuous monitoring of goal termination. SOAR continuously monitors all active goals for their termination. When they terminate, it immediately proceeds from the point of termination, and all working memory local to the terminated goals is removed. This is normally an expensive process in AI systems, and its efficient realization in SOAR depends largely on automatic subgoaling.
Principle: Basic problem-solving methods arise from task knowledge. SOAR realizes weak methods (e.g., hill-climbing, alpha-beta search, etc.) as search control productions that express knowledge about the task.
Principle: Continuous learning-by-experience through chunking. SOAR learns automatically by permanently caching the results of subgoals as productions. In essence, it stores the preferential result of the subgoal, abstracting only the relevant features, so the next time it is encountered it can use the preference immediately. This is directly related to the human phenomenon of chunking for which it is named.
Because SOAR accomplishes all of its tasks in problem spaces, it is a problem solving architecture.
To realize a task as a search in problem space requires a fixed set of task-implementation functions involving the retrieval or generation of
To control the search requires a fixed set of search-control functions that select
The task-implementation and search-control functions together are sufficient for problem space search. Usually they are interleaved, with task implementation generating new problem spaces with associated states and operators, and search control choosing from the alternatives.
Working memory holds all processing information for the current problem. It has three components:
The context stack contains the hierarchy of active contexts. Each context contains four slots, one for each of the four roles: goal, problem space, state, and operator. Each slot either contains an object or the undecided symbol. The top context contains the highest goal in the hierarchy; goals in lower contexts are subgoals. Since each context has only one goal during its existence, the context stack doubles as a goal stack.
The basic representation is object-oriented. Each object (goals, problem spaces, etc.) contains an identifier symbol and a set of augmentations, which are essentially attribute/value pairs. The augmentations of an object may change over time.
Preferences are more complex data structures with a specific collection of eight architecturally-defined relations between objects.
Working memory can be modified by
The processing structure implements the functions required for search in a problem space, to wit, using task-implementation knowledge to generate objects, and using search-control knowledge to select between them.
Search-control functions are all realized by a single generic control act: the replacement of an object in a slot by another object from working memory. Specifically, the representation of a problem is changed by replacing the current problem space with a new one. Returning to a prior state is accomplished by replacing the current state with a preexisting one in working memory. An operator is selected by replacing the current one (often undecided) by a new one. A step is taken in the problem space when the current operator is applied to the current state, yielding a new one, which is then selected to replace the existing one.
Replacement can occur anywhere in the context stack. When an object in a slot is replaced, all of the objects in the slots below it are reinitialized to undecided, since they depend upon objects higher up for their validity. All contexts below the context of the object are terminated because they depended on the original context for their existence.
Replacement is driven by the decision cycle. Each cycle involves two parts. The first part, the elaboration phase, adds new objects, new augmentations of old objects, and preferences to working memory. Then the decision procedure examines the accumulated preferences and the context stack, and it either replaces some object in a slot, or it creates a subgoal in response to an impasse.
The elaboration phase adds new objects, augmentations of existing objects, and preferences, based upon the contents of working memory. Elaborations are generated in parallel, but may still require multiple steps, as information generated in one step may allow further elaborations in later steps. This is a monotonic process (it does not delete or modify working-memory elements) that runs until it reaches quiescence (which usually happens quickly, but in practice there is a cutoff number of iterations). Note however that this is only syntactic monotonicity, so semantic nonmonotonicity or conflicts may arise.
The elaboration phase is encoded as productions of the form
The C's are conditions that examine the context stack and working memory. The A's are actions that add augmentations or preferences.
A production is successfully instantiated if the conjunction of its conditions is satisfied with a consistent binding of variables. There can be any number of concurrently successful instantiations of a production; all successful instantiations fire concurrently during the elaboration phase. The only conflict resolution principle used in SOAR is refractory inhibition -- an instantiation of a production is fired only once. (Instead of control at the level of conflict resolution in productions, SOAR performs control via the decision procedure.)
After the elaboration phase has reached quiescence, the decision procedure determines which slot in the context stack should have its content replaced, and by which object. This is done by examining the context stack from the oldest context to the newest (i.e., from the highest goal to the lowest one). Within each context, each role is considered, in the order problem space, state, and operator. This process terminates when a slot is found that requires an action. Changing a higher slot results in reinitializing the lower slots, so they do not need to be processed.
The ordering on slots is a fixed desirability ordering on changes in which it is always preferable to change slots higher up. Processing is controlled by the knowledge contained in the preferences in working memory at the end of the elaboration phase. Each preference is a statement about the selection of an object for a slot (or set of slots). There are three primitive concepts available to make preference statements:
The acceptability and rejection preferences determine which objects will be considered; the desirability preferences place a partial order on that set. The result of processing a slot, if successful, is a single object that is new, acceptable, not rejected, and more desirable than any other such choice.
All desirability rankings require a reference object to compare against. For better, worse, and some indifferent preferences, the reference is another object being considered for the slot. For best, worst, and the other indifferent preferences, the reference is an abstract anchor point.
Note that rejection is not the logical negation of acceptability; rejection removes an object from all consideration. Conflicts of desirability may arise, so a distinction is made between being better and being dominant (when one object is better than another, and the other object is not better than it). A version of the closed world assumption is invoked, so that positive knowledge is required to state a preference; no assertions are made by default.
Once the set of maximal choices is computed, the decision procedure makes a choice for the slot, with the current choice acting as the default. If there are no maximal choices, the current one is maintained. If the elements of the maximal set are indifferent, one is chosen at random.
If the value of the slot is changed, the change is installed and all of the lower slots are changed to undecided. If there is no final choice, or all of the slots have been exhausted, then the decision procedure fails and an impasse occurs. There are four kinds of impasses in SOAR:
These rules are mutually exclusive, so at most one impasse will occur during the decision procedure. SOAR's response to an impasse is to set up a subgoal to resolve it.
In general, impasses occur because of incomplete or inconsistent information; incomplete information may yield a rejection, tie, or no-change impasse, while inconsistent information produces a conflict impasse.
When an impasse occurs, a subgoal a new context is created. In this way SOAR can deliberately search for information to resolve the impasse. All types of knowledge (task-implementation and search-control) can be encoded in the problem space for a subgoal.
If the impasse is a tie, problem solving to find a best object usually results in creating desirability preferences, so the subgoal is aimed at using search-control knowledge to select among the objects. Ties can be resolved in several ways: one object is determined to lead to the goal, so a best preference is created; one object is found to be better than another, so a better preference is created; there is no difference between objects, so an indifference preference is created; or one object is found to lead away from the goal, so a worst preference is created.
If a no-change impasse arises, the problem solving in the subgoal usually involves operator implementation, terminating when an acceptable preference is created for a new state in the parent's problem space. Subgoals can create problem spaces or initial states when the required knowledge is more easily encoded as a goal to be achieved through problem space search, rather than as a set of elaboration productions.
In general, impasses are resolved by the addition of preferences that change the results of the decision procedure. When an impasse is resolved, the subgoal created for it is done and can be terminated. Subgoals will also be terminated if the roles in their supergoals are changed, since they depended upon the original context. Resolution of an impasse terminates all goals below it.
When subgoals are terminated, the working-memory manager removes all leftover memory elements (terminated subgoals and processing elements they've created) in essentially the same way LISP removes inaccessible CONS cells. Only the results of the subgoal are retained -- the objects and preferences in working memory that are still linked to the unterminated contexts.
The architecture defines the concept of goal termination rather than goal success or failure. This is because there are many reasons why a goal should disappear, and many ways the preferences reflect this.
Automatic subgoal termination is normally an expensive, though desirable, process in goal systems. In SOAR it is not expensive because the architecture creates all goals, so it has the knowledge and organization necessary to terminate them.
SOAR has a body of task-independent knowledge about its operation and how to attain goals encoded as a set of (about fifty) productions. With it, SOAR exhibits reasonable default behavior, which of course can be overridden by additional knowledge that adds other preferences.
SOAR provides search-control rules for three common situations requiring preference creation.
When an impasse occurs, the architecture creates a new goal and context containing some specific knowledge about the nature of the impasse. Then task-dependent search-control knowledge can be used to decide how to invoke appropriate problem-solving behavior. If no task-dependent knowledge is available, there are default rules.
This space is used to resolve ties and conflicts. The states of the space contain the candidate objects from the supercontext (the items associated with the subgoal). There is only one operator in the selection space, evaluate-object. It is a general operator that is instantiated with each tying (or conflicting) object. Each state in the selection space is a set of evaluations, each of which allows the creation of preferences involving the objects being evaluated. This requires task-specific knowledge, so either productions must exist that evaluate the objects, or subgoals will be created to perform the evaluations. Indifferent preferences are created for all of the evaluate-object operators so that a selection can be made between them.
If all evaluate-object operators are rejected, and no selection can be made, problem solving in the selection space has failed. In this case, default knowledge makes all of the tied alternatives indifferent, allowing problem solving to continue.
In the selection problem space, each evaluate-object operator must evaluate the item with which it is instantiated. If no task-dependent knowledge is available to do this, a no-change impasse will occur, leading to a subgoal to try to implement the operator.
One task-independent technique for doing this is look-ahead -- try the item out temporarily to gather information. This is the default approach. To apply it, productions reconstruct the task context, making acceptable preferences for the objects selected in the context and augmenting the new goal with information from the original goal. Once the task context has been reconstructed, the item being evaluated is selected (it gets a best preference). This allows it to be tried, and possibly an evaluation can be produced based upon its progress toward the goal.
When knowledge is available to evaluate states in the task space, the new state produced in the evaluation subgoal will be evaluated, and that evaluation can be backed up to serve as the evaluation of the operator that produced it. If no knowledge to evaluate states is available except when a goal is achieved, a depth-first search arises. In a two-player game, mini-max search arises. (Other searches can arise depending on when and what knowledge becomes available to evaluate the states in the task space.)
Chunking is an explanation-based learning scheme incorporated into SOAR. It is based on the observation that (1) an impasse arises because of a lack of directly available knowledge, and (2) problem solving in the resultant subgoal produces new information on how to resolve the impasse.
In chunking, chunks are acquired that generate the results of a goal, given the goal and its parameters, where "parameters" is defined to be those aspects of the system existing prior to the goal's creation that were examined during the processing of the goal. Chunks are compiled into single productions, and can be generated either for terminal subgoals or for every subgoal, depending on SOAR's configuration.
Chunking in SOAR is powerful because SOAR generates goals automatically for problems in any aspect of its problem-solving behavior. Subgoals are created exactly in situations where SOAR requires learning, since they are created exactly when available knowledge is insufficient for the next step in problem solving.
A chunk production summarizes the processing in a subgoal. Its actions generate those working-memory elements that eliminated the impasse responsible for the subgoal (and thus terminated it). The conditions test those aspects of the current task that were relevant to the actions being performed. The chunk is created when the subgoal terminates. The chunk's actions are based on the results of the subgoal -- those working-memory elements created in the subgoal that are accessible from a supergoal. An augmentation results if its identifier either existed before the subgoal was created or is in another result. A preference results if all of its specified context objects either existed before the subgoal was created or are in another result.
The chunk's conditions are based on a dependency analysis on traces of productions that were fired during the subgoal. Each trace contains the working-memory elements that the production matched (called the condition elements) and those it generated (action elements). Only productions that actually add something to working memory have their traces saved.
Once a trace is created, it is stored on a list associated with the goal in which the production fired. This is taken to be the lowest goal in the hierarchy matched by conditions of the production.
When the subgoal terminates, its results are put into independent subgroups, where two results are considered dependent if they are linked together or both link to a third object. Each subgroup forms the basis for the actions of one production, and the conditions of each production are determined by independent dependency analyses. The overall effect of breaking the results into subgroups is to produce more productions with fewer conditions and actions in each, and therefore more generality than if a single production had all together.
For each set of results, the dependency analysis finds those traces that have one of the results as an action element. The condition elements of those traces are then divided up into those that existed before the subgoal was created, and those created in the subgoal. The ones created before the subgoal become conditions of the chunk. The others are recursively analyzed as if they were results, to determine the pre-subgoal elements responsible for their creation.
Once the set of chunk productions is determined, they are generalized by replacing the identifiers in the working-memory elements with variables. Different identifiers are replaced by different variables which are forced to match distinct identifiers. Note that this may sometimes err by creating productions that will not match when two elements just happen to have the same identifier, but it always errs by being too constraining.
The final step in chunking is to perform a pair of optimizations on the chunk productions. The first optimization simplifies productions learned for the implementation of a complex operator. As part of creating the new state, much of the prior state's substructure may need to be copied, and the chunk for this subgoal will have its own condition and associated action, for each such substructure, which could result in a large number of condition-action pairs identical except for the variable names. The optimization eliminates this problem by removing the conditions that copy substructure from the original production and instead, for each type of substructure being copied, a new production is created with a single condition-action pair to copy substructures of that type.
The other optimization is to order the production conditions to try to make matching faster. Each condition is like a query, returning all working-memory elements matching it, and the overall match returns all production instantiations that match the conjunctive queries. The efficiency of the process depends heavily on the order of the queries, so by automatically ordering them SOAR can improve overall efficiency.
SOAR has applications in many areas, including:
SOAR is still missing: deliberate planning, automatic task acquisition, creating representation, recovering from overgeneralization, interaction with external environment.
[Laird et al., 1989] Robo-SOAR controls a Puma robot arm using a camera vision system. The arm manipulates blocks on a table in front of it, while the camera gives a view from above. The arm must align blocks in desired configurations; however, there is a light next to the table, and if the light goes on, the arm must push a button as soon as possible.
Note than in such a system, reactive behavior is supported by productions that test the environment without examining the current operator, and act in response to the environment's status. Such productions are quick, but exhibit a loss of control knowledge in decision making.
Hero-SOAR controls a Hero 2000 robot, a mobile robot with an arm. Its job is to pick up cups and deposit them in a waste basket.
[Tambe, et al., 1995] An extension of SOAR that controls aircraft in military simulations. The purpose is to allow the construction of a variety of military aircraft that can participate in large-scale (i.e., multiple thousands of units), distributed combat simulations.
TacAir-SOAR raises some of the standard issues in building intelligent agents for simulated environments; they need to be flexible, adaptable, and not rely too heavily on plans (because of the environment's dynamism), but at the same time they can avoid dealing with complex, noisy sensors. Issues instead are: multi-agent coordination, agent modeling, reactive planning.
TacAir-SOAR is the first AI system to have participated directly in an operational military exercise.
SOAR embodies eleven basic hypotheses about the structure of an architecture for general intelligence:
Rodney Brooks of MIT created the subsumption architecture to contrast with high-level approaches to agent design. It uses a bottom-up approach, beginning with the most simple behaviors and adding more complex ones on top, rather than trying to design a complete and comprehensive system at the highest level. His architecture is decomposed into task-achieving behaviors (e.g., identifying objects, avoiding objects, building maps) rather than functional modules (e.g., planning, perception, motor control).
The subsumption system is meant to be used to build mobile robots. Brooks sees four main requirements for the design of a robot architecture:
Requirement | Problem | Subsumption Solution |
---|---|---|
multiple goals | the control system must be responsive to high-priority goals while still servicing low-level goals (e.g., it must be able to get off the railroad tracks, but must also make sure it maintains its balance) | individual layers work concurrently on different problems; the subsumption mechanism mediates actions in case of conflicting goals |
multiple sensors | the robot must handle multiple, possibly contradictory, sources of input | since there isn't necessarily a central representation, this is not really a problem; each layer may process and use sensors according to its needs and irrespective of other layers |
robustness | the robot should cope with loss of some sensors, and should be able to handle drastic changes in the environment with graceful degradation; ideally, it will also gracefully degrade as parts of its system become inoperative | handles multiple sensors well; if higher levels can't act, lower levels will continue to perform their fast-reaction actions |
extensibility | the robot should improve as new sensors and capabilities are added to the system, along with more processing power | each layer can be given its own processor, since they are very loosely coupled; conversely, one layer may be spread across many processors |
Brooks believes that the symbol system hypothesis is fundamentally flawed. Instead, he argues for the physical grounding hypothesis: to build a system that is intelligent it is necessary to have its representations grounded in the physical world.
The former decomposes intelligence into functional information processing modules whose combination produces system behavior. The latter decomposes intelligence into behavior-generating modules whose coexistence and cooperation give rise to emergent complex behaviors.
(Basically, this argument strikes me as a form of the top-down vs. bottom-up argument. Classical top-down AI never gets to the bottom levels and so doesn't achieve real-world operations, while Brooks has a hard time getting to the upper levels, though his implemented basic procedures do produce some behavior. -- POD)
There are several problems with the symbol system hypothesis, according to Brooks. It implies that intelligence operates on symbols that are representations divorced from the real world. Descriptions come in terms of named, typed entities and their relationships. This makes sets of symbols poorly suited to adapting to one problem or another (makes me think of elaboration tolerance issues -- POD). Such systems become increasingly complex as they deal with belief, knowledge, different problem domains, etc.
The "nouvelle" approach of the physical grounding hypothesis avoids these problems. Traditional symbolic representations are unnecessary; the key observation is the world is its own best model. The trick is to sense it appropriately. Everything must be explicit and directly related to the world.
Brooks contrasts "traditional AI" and "nouvelle AI" in two ways. Traditional AI has tried to demonstrate sophisticated reasoning in impoverished domains. Nouvelle AI tries to demonstrate less sophisticated reasoning in noisy, complex domains. Traditional AI uses heuristics to guide search, while nouvelle AI relies upon the interaction of small behavioral units to create global behavior.
Brooks bases his design decisions upon nine principles:
Standard decompositions of the robot problem involve breaking it down into "vertical slices" -- information is processed in each slice and passed on to the next slice for a separate kind of processing. Brooks argues this is a bad approach because it means that every piece must be built before the system can function, and whenever a piece is changed the changes must not interfere with its interface, or if it does, changes must be made to other modules.
Brooks' approach is to decompose the problem into levels of competence. Each level contains certain desired behaviors, and higher levels subsume the behaviors of the lower levels. The levels used in Brooks' earlier agents are:
Level | Competence |
---|---|
0 | avoid contact with objects |
1 | wander around without hitting things |
2 | "explore" by seeing distant places and moving toward them |
3 | build maps and plan routes |
4 | notice changes in "static" environment |
5 | reason about objects in world; perform object-related tasks |
6 | formulate and execute plans that involve changing the world state |
7 | reason about the behavior of objects in the world and modify plans accordingly |
Layers are built from the bottom up, and each layer is completed before the next is added. Layers never need to be altered once they are debugged. Each layer operates unaware of the higher layers, which may interfere with its data paths.
A subsumption architecture consists of a series of layers, going from the bottom-level layer of basic functionality, and adding a new task layer above it as many times as needed. Each layer consists of a small set of processors that send messages to one another. A processor is a finite state machine together with a data store. Processors send messages along wires to one another, though there is no acknowledgment when messages are received (so messages can conceivably be lost).
Processors (also called modules) run asynchronously and have no central control module. The inputs to modules can be suppressed and the outputs inhibited by wires coming from other modules. This is how higher-level modules control those on lower levels.
Each module is a finite state machine, augmented with some data storage (registers). It has a number of input and output lines. Input lines have single-element buffers, so the most recent message is always available for inspection. (Messages can be lost if a more recent one overwrites an older one before it is examined.) Each module has a distinguished input called reset.
Each state of the FSM is named. Initially, all modules start in the distinguished state NIL. There are four types of states:
State | Function |
---|---|
output | an output message is computed from input and registers, and put on an output line; then a new specified state is entered |
side effect | an register's value is changed to a new value computed from inputs and registers; then a new specified state is entered |
conditional dispatch | a predicate on the module's registers and inputs is computed, and one of two subsequent states is entered depending on the outcome |
event dispatch | a sequence of (condition, state) pairs are monitored until one event is true, when the machine goes to its corresponding state; events are combinations of arrivals of messages on input lines and expiration of time delays |
Layers consist of modules and wires. An output line from one module is connected to the input lines of one or more other modules. Outputs may be inhibited, and inputs may be suppressed.
A wire may terminate at the output site of a module. Any signal traveling along this wire inhibits output messages from the module along that line for a predetermined time. (NOTE that this was true only in early versions. Later versions inhibited only as long as a message was being sent along the inhibiting wire.) Any messages sent from the module along the output during that time are lost.
Similarly, a wire may terminate at an input site of a module. In addition to preventing input data from reaching the module along that wire, the signal sent along this wire is actually fed through as if it were the original input. Thus it suppresses the usual input and provides a replacement. If more than one suppressing wire is present they are ORed together.
Lastly, an input may be connected directly to a register in a FSM.
Later versions of the subsumption architecture allowed behaviors to be specified. These behaviors were groups of tightly-bound AFSMs (augmented finite state machines) that could selectively be activated and deactivated. The behavior compiler takes behaviors and produces subsumption AFSMs; these are then compiled into machine language.
Within a behavior, there can be message passing, suppression, and inhibition (and between behaviors, of course). Behaviors act as abstraction barriers, as one cannot reach into another. Behaviors differ from AFSMs in that they can share registers and alarm clocks.
The first subsumption robot. It used an offboard LISP machine to simulate the architecture. It had sonar range sensors and odometry onboard. Allen's first layer let it avoid obstacles; the second made it randomly wander around, choosing a new direction every few seconds. The third layer made it look for distant places (with sonar) and tried to move to them.
Identical, small robots designed to demonstrate how little machinery is necessary to support subsumption. They had a three-layer subsumption, but it fit on a single 256-gate PAL chip. They were toy cars with infrared proximity sensors onboard. The first two layers were similar to Allen's; avoid obstacles and wander to new places, implemented as summed vector forces.
The third layer detected moving objects with the three sensors and created a following behavior. When something was detected, the robot was attracted and moved toward it. The lower level avoidance behaviors kept them from colliding.
Herbert had an onboard, 24-processor system, in which the AFSM connections were actually copper wires. Herbert used infrared proximity sensors and laser light systems to navigate. It was programmed to wander around office areas, go into offices, and take soda cans from their desks. A set of 15 behaviors drove its arm to physically search for soda cans, locate them, and pick them up.
Interestingly, there was no internal communication between the behavior modules. An arbitration network attached to the outputs drove the actuators. The object-finder drove the robot to line its arm up in front of a can; when the arm sensed that there was no motion, it started to move, which caused cascading sequences of actions until it actually grasped the can.
Genghis is a 1 Kg, six-legged robot that walks under subsumption control. It can handle obstacles, rough terrain, and can follow sources of radiation. It has a highly-distributed system; different controllers on each leg are controlled by different parts of the system.
A very small 50 g robot. It operates by hiding in dark corners, and ventures out when it hears noise, only after the noise is gone, trying to find a new place to hide near where the noise originated.
The lowest-level behavior monitors a light sensor and causes Squirt to move in a spiral pattern searching for darkness. It stops when it finds a dark spot.
The second level of behavior triggers when a dark spot is located. It monitors two microphones. Squirt waits until it hears a sharp noise, followed by several minutes of silence. Squirt moves toward the noise, suppressing its desire to stay in the dark, and after this motion times out, it returns to looking for a dark place.
First system using new subsumption. Toto uses an intermediate-level set of behaviors to recognize particular types of landmarks such as walls,corridors and clutter. Another network is made up of identical behaviors with each layer waiting for new landmarks to be recognized. When this happens a behavior allocates itself to be the "place" of the landmark. The behaviors that correspond to physically adjacent locations have neighbor links activated between them.
Nodes in this graph structure become active according to their correspondence to the current location. (This strikes me as similar to various neural network models -- POD) When behavior to go to some place is activated, spreading activation is used to spread from the goal via the activation links; this process continues as the robot moves. The generation and use of these maps is totally distributed.
Seymour contains a video processing system that uses multiple,
unreliable vision routines to produce reliable behavior for following
objects. One process tracks a moving blob; the other overlays the
blob image with the places motion is detected. The robot then tries
to keep the blob within a fixed location in image coordinates.
Some other important recent work on agents and agent systems that isn't included here but is worth examining is:
The sources for this document are the papers listed on my Depth
Area Reading List. Please note that most of the text consists of
direct quotations or paraphrases of material to be found in these
papers. These notes are intended only for my use in reviewing this material.
Patrick Doyle | pdoyle@cs.stanford.edu |