AI Qual Summary: Natural Language

Natural Language
Patrick Doyle	AI Qual Summary	May 14, 1997

Table of Contents

Definitions
Overview
Natural Language Processing
Grammars
Parsing
Machine Translation
Speech Understanding
- The ARPA Speech Understanding Research Program
Natural Language Understanding Systems
Speech Recognition and Understanding Systems
Sources

Definitions

grammar: A grammar of a language is a scheme for specifying the sentences in that language. It indicates the syntactic rules for combining words into well-formed phrases and clauses. The theory of generative grammar [Chomsky, 1957] had a profound effect on linguistic research, including AI work in computational linguistics.
parsing: Parsing is the "de-linearization" of linguistic input; that is, the use of grammatical rules and other knowledge sources to determine the functions of words in the input sentence. Usually a parser produces a data structure like a derivation tree to represent the structural meaning of a sentence.
definite clause grammars: Grammars in which rules are written as PROLOG clauses and the PROLOG interpreter is used to perform top-down, depth-first parsing.
ATN: [Woods, 1970] An augmented transition network is one in which parsing is described as the transition from a start state to a final state in a transition network corresponding to an English grammar. ATN was first used in LUNAR. An ATN is similar to a FSM in which the class of labels that can be attached to transition arcs has been augmented. Arcs in an ATN may contain words, categories (e.g., noun), they may push to other networks, or perform procedures. ATNs may contain registers. ATN grammars can generate all recursively enumerable sets.

One problem with ATNs (especially in speech recognition) is that their procedurality limits their effectiveness. It is sometimes better to parse a sentence based on clearly recognized parts when earlier parts are still unknown.
RTN: Recursive transition networks generalize finite state automata by allowing nondeterministic transitions between two states to be taken via a recursive jump to a start state. RTNs recognize exactly the class of CFLs. ATNs are RTNs that also have a finite set of registers and actions that can set registers to input words, corresponding lexical entries, or to some function of the content of other registers. Recursive calls to the network can pass values back to the calling level.
unification grammar: A declarative approach to representing grammars. The parser does matching of sentence constituents to grammar rules, and builds structures corresponding to the result of combining constituents. A unification operator can be applied to the graphs, which is similar to logical unification. Two graphs unify if, recursively, all of their subgraphs unify.
Montague semantics: (Also, compositional semantics) For every step in the syntactic parsing process, there is a corresponding step in semantic interpretation. Each time syntactic constituents are combined to form a larger syntactic unit, their corresponding semantic interpretation can be combined to form a larger semantic unit. The clearest application of this idea is the work of Montague.

The compositional approach to defining semantic interpretation has proved to be a very powerful idea. Unfortunately, there are some linguistic constructions (such as quantification) that cannot be accounted for in this way.
phoneme: A smallest distinguishable unit of speech.
morpheme: A meaningful linguistic unit that contains no smaller meaningful parts.
speech recognition: Recognizing speech without domain knowledge.
speech understanding: Recognizing speech with domain knowledge.
illocutionary force: A force present in an utterance in which the speaker's intention is distinct from what is actually said. A promise has illocutionary force -- it has the propositional content that I will do a certain thing at some particular time, but the utterance is actually performing a speech act that neither describes a state of the world or makes a prediction about the state of the world.
hermeneutic connection: Hermeneutics is the study of interpretation. One of the fundamental insights of hermeneutics is the importance of pre-understanding, to wit, that understanding rises and evolves through acts of interpretation. This circle, in which understanding is necessary for interpretation, which in turn creates understanding, is called the hermeneutic circle.
garden path sentences: Sentences that require conscious backtracking. Examples: The horse raced past the barn fell, The cake baked by the baker burnt, and Have the students who failed the exam take the supplemental.
anaphora: The use of a word to refer to previously-mentioned entities, e.g., The boys and I went over to Frank's, because they needed to talk to him.
prosody: The particular fluctuations in stress and intonation in a sentence.

To understand something is to transform it from one representation into another, where this latter representation is chosen to correspond to a set of available actions that could be performed, and for which a mapping is designed so that for each event an appropriate action will be performed.

Advantages of Natural Language Interfaces

Natural language is only one medium for human-machine interaction, but has several obvious and desirable properties:

It provides an immediate vocabulary for talking about the contents of the computer.
It provides a means of accessing information in the computer independently of its structure and encodings.
It shields the user from the formal access language of the underlying system.
It is available with a minimum of training.

The Hardness of Natural Language

There are several major reasons why natural language understanding is a difficult problem. They include:

The complexity of the target representation into which the matching is being done. Extracting meaningful information often requires the use of additional knowledge.
The type of mapping: one-to-one, many-to-one, one-to-many, or many-to-many. One-to-many mappings require a great deal of domain knowledge beyond the input to make the correct choice among target representations. So for example, the word tall in the phrase "a tall giraffe" has a different meaning than in "a tall poodle." English requires many-to-many mappings.
The level of interaction of the components of the source representation. In many natural language sentences, changing a single word can alter the interpretation of the entire structure. As the number of interactions increases, so does the complexity of the mapping.
The presence of noise in the input to the understander. We rarely listen to one another against a silent background. Thus speech recognition is a necessary precursor to speech understanding.
The modifier attachment problem. (This arises because sentences aren't inherently hierarchical, I'd say -- POD.) The sentence Give me all the employees in a division making more than $50,000 doesn't make it clear whether the speaker wants all employees making more than $50,000, or only those in divisions making more than $50,000.
The quantifier scoping problem. Words such as "the," "each," or "what" can have several readings.
Elliptical utterances. The interpretation of a query may depend on previous queries and their interpretations. E.g., asking Who is the manager of the automobile division and then saying, of aircraft?.

Speech Acts

A view of language in which utterances can be understood as acts rather than representations. In giving a command or making a promise, for example, a speaker is entering into an interaction pattern, playing a certain role, committing both the speaker and the hearer to future actions.

The implications of this view include:

It allows us to deal with the consequences of actually making statements.
It shifts reasoning away from the objective/subjective dichotomy (into the domain of interaction as opposed to objective truth).
It places a central emphasis on the potential for further articulation of unstated background. Human interaction always takes place in an unarticulated background; the background can never be made fully explicit.

We often use indirect speech acts (e.g., Do you know what time it is? or It's cold in here!)

Searle presents a linguistic theory of indirect speech acts that includes certain conversational postulates about conversation that are shared by all speakers. These include

Sincerity conditions For A to request B to do R, A must want B to do R, A must assume B can do R, and that B is willing to do R. We can also assume that B would not have done R anyway.
Reasonableness conditions A must have reasons for assuming all of the above. Reasonableness conditions often provide the basis for challenging a request (e.g., Why do you want me to open the door?)
Appropriateness conditions The statement must provide the correct amount of information and must be polite. If we ask who won the race and get something like Someone with long, dark hair we can assume the name wasn't known.

Natural Language Processing

Language processing can be divided into two tasks:

Processing written text, using lexical, syntactic, and semantic knowledge of the language as well as any required real world information.
Processing spoken language, using all the information needed above, plus additional knowledge about phonology as well as enough additional information to handle the further ambiguities that arise in speech.

The steps in the process of natural language understanding are:

Morphological analysis: Individual words are analyzed into their components, and non-word tokens (such as punctuation) are separated from the words. For example, in the phrase "Bill's house" the proper noun "Bill" is separated from the possessive suffix "'s."
Syntactic analysis: Linear sequences of words are transformed into structures that show how the words relate to one another. This parsing step converts the flat list of words of the sentence into a structure that defines the units represented by that list. Constraints imposed include word order ("manager the key" is an illegal constituent in the sentence "I gave the manager the key"); number agreement; case agreement.
Semantic analysis: The structures created by the syntactic analyzer are assigned meanings. In most universes, the sentence "Colorless green ideas sleep furiously" [Chomsky, 1957] would be rejected as semantically anomalous. This step must map individual words into appropriate objects in the knowledge base, and must create the correct structures to correspond to the way the meanings of the individual words combine with each other.
Discourse integration: The meaning of an individual sentence may depend on the sentences that precede it and may influence the sentences yet to come. The entities involved in the sentence must either have been introduced explicitly or they must be related to entities that were. The overall discourse must be coherent.
Pragmatic analysis: The structure representing what was said is reinterpreted to determine what was actually meant.

Signal Processing

Signal processing is the task of taking a spoken bit of language and turning it into a sequence of words. The problems involved digitizing the signal, and then distinguishing word segments that can be assembled into whole words.

The linguistic elements handled by signal processing are phonemes. Identifying these sounds in real-time is relatively easy, since there isn't much data in auditory signals. However, assembling them into words can be difficult, e.g.:

It's very hard to wreck a nice beach.

Syntactic Processing

Syntactic parsing determines the structure of the sentence being analyzed. Syntactic analysis involves parsing the sentence to extract whatever information the word order contains. Syntactic parsing is computationally less expensive than semantic processing.

A grammar is a declarative representation that defines the syntactic facts of a language. The most common way to represent grammars is as a set of production rules, and the simplest structure for them to build is a parse tree which records the rules and how they are matched.

Sometimes backtracking is required (e.g., The horse raced past the barn fell), and sometimes multiple interpretations may exist for the beginning of a sentence (e.g., Have the students who missed the exam -- ).

Example: Syntactic processing interprets the difference between "John hit Mary" and "Mary hit John."

Semantic Analysis

After (or sometimes in conjunction with) syntactic processing, we must still produce a representation of the meaning of a sentence, based upon the meanings of the words in it. The following steps are usually taken to do this:

Lexical processing: Look up the individual words in a dictionary. It may not be possible to choose a single correct meaning, since there may be more than one. The process of determining the correct meaning of individual words is called word sense disambiguation or lexical disambiguation. For example, "I'll meet you at the diamond" can be understood since at requires either a time or a location. This usually leads to preference semantics when it is not clear which definition we should prefer.
Sentence-level processing: There are several approaches to sentence-level processing. These include semantic grammars, case grammars, and conceptual dependencies.

Example: Semantic processing determines the differences between such sentences as "The pig is in the pen" and "The ink is in the pen."

Discourse and Pragmatic Processing

To understand most sentences, it is necessary to know the discourse and pragmatic context in which it was uttered. In general, for a program to participate intelligently in a dialog, it must be able to represent its own beliefs about the world, as well as the beliefs of others (and their beliefs about its beliefs, and so on).

The context of goals and plans can be used to aid understanding. Plan recognition has served as the basis for many understanding programs -- PAM is an early example.

Speech acts can be axiomatized just as other operators in written language, except that they require modal operators to describe states of belief, knowledge, et al.

Grammars

Chomsky delineated four types of grammars, numbered 0 to 3.

Type 0 Most general, with no restrictions on the form that rewrite rules can take. Languages generated by type 0 grammars are exactly those recognized by Turing machines.
Type 1 Context-sensitive. The right-hand side of the production must contain at least as many symbols as the left-hand side.
Type 2 Context-free. The left-hand side of a production must contain exactly one symbol, a nonterminal.
Type 3 Regular expressions. All productions are of the form X -> aY or X -> a. Exactly those languages recognized by finite state automata.

Transformational Grammars

Transformational grammars were introduced by Chomsky in 1957 in his article on syntactic structures. Utterance is characterized as the surface manifestation of a "deeper" structure representing "meaning" of the sentence. The deep structure can undergo a variety of transformations of form (e.g., word order, endings, etc.) on its way up, while retaining its essential meaning.

Transformational grammars have three components. The first is a phrase-structure grammar generating strings of morphemes representing simple, declarative, active sentences, each with an associated phrase marker or derivation tree. The second is a set of transformational rules rearranging these strings and adding or deleting morphemes to form representations of the full variety of legal sentences. Finally, a sequence of morphophonemic rules would map each sentence representation to a string of phonemes.

(The first level strikes me as highly similar to Schank's conceptual dependency representations -- POD)

A simple example:

The + boy + TENSE + eat (derived by grammar)
The + boy + eat + PAST (transformation rule)
The boy ate. (morphophonemic rule)

Semantic Grammars

Semantic grammars combine syntactic, semantic, and pragmatic knowledge into a single set of rules in the form of a grammar. This is usually just a context-free grammar in which the choice of nonterminal and production rules is governed by semantic as well as syntactic function. This is usually good for restricted natural-language interfaces.

Systemic Grammars

[Halliday, 1961] Systemic grammars distinguish three general functions of language, all of which are ordinarily served by every act of speech.

The ideational function serves for the expression of content. It says something about the speaker's experience of the world. Analyzing a clause in terms of its ideational function involves asking questions like: What kind of process does the clause describe -- an action, a mental process, or a relation? Who is the actor? Are there other participants in the process?

The interpersonal function relates to the purpose of the utterance. The speaker may be asking a question, answering one, making a request, giving information, or expressing an opinion. The mood system of English grammar expresses these possibilities in terms of categories such as statement, question, command, and exclamation.

Lastly, the textual function reflects the need for coherence in language use. Concepts for use in textual terms include theme (the element the speaker chooses to put at the beginning of a clause) and change (distinction between what is new in a message and what is assumed to be given).

Case Grammars

[Fillmore, 1968] Fillmore characterized the relationships between the verb and noun phrase as "semantically relevant syntactic relationships" and called them cases. The case assignment comes from the deep structure, even though the surface structure is different. (So, for example, the sentence "John opened the door with the key" is semantically equivalent to "The door was opened by John with the key.")

The structure that is built by the parse contains some semantic information, although further interpretation may be necessary. Grammar rules are written to describe syntactic rather than semantic regularities. But the structures the rules produce correspond to semantic relations rather than to strictly syntactic ones. For example:

S -> NP VP

Parsing using case grammars is usually expectation-driven. Once the verb of the sentence has been located, it can be used to predict the noun phrases that will occur and to determine the relationship of those phrases to the rest of the sentence.

Unification Grammars

A wide range of recent approaches to natural language processing can be described in terms of unification grammars. Most such systems share a common basis -- context-free grammars -- and are extended using the concept of unificiation (or matching) on partially-specified tree structures.

The principal problem with earlier augmented parsing systems (such as ATNs) is that they force certain search strategies; ATNs usually take a top-down approach. This works poorly for other approaches; for example, island-driven search centering on words recognized with high confidence. If the search isn't sequential, many of the registers in an ATN will not be set appropriately and arcs can't be followed as needed.

Restrictions can be placed on augmented systems to remedy this problem. In particular, if registers are not allowed to be reassigned once they have a value, many problems disappear because a system can then be developed that delays the evaluation of tests until all the necessary registers are set. Instead of treating register assignments like variable assignments in a programming language, unification-based systems use the notion of unificiation between logical variables.

The most notable advantages of this approach stem from the independence of the grammar from a particular parsing algorith, so each rule in the grammar can be better isolated as a unit of study and modification.

A simple unification system can be described as follows. It is an extension of a standard CFG in which each rule is annotated with a set of unification equations. Specifically, two register-value structures unify if each of the specified register values is consistent with the same register's value in the other structure. Registers specified in one and not in the other are simply copied.

Example:

(S MAIN-V (VERB ROOT LOVE)) and (S MAIN-V (VERB FORM en) NUM {3s})

unify to produce the structure

(S MAIN-V (VERB ROOT LOVE FORM en) NUM {3s})

All unification equations are of the form structure = structure. The structure can be defined precisely be defining each slot name as a function from its containing constituent to its value; this can be represented with directed, acyclic graphs in which each constituent and value is a node, and the slots are labeled arcs.

Parsing

Parsing Systems

Template matching: Parsing by matching the input against a series of predefined templates. Used in programs like SIR, STUDENT, ELIZA.
Phrase-structure grammar parsers: Usually used context-free grammars with various combinations of the parsing techniques mentioned.
Transformational grammar parsers: More comprehensive than phrase-structure grammars, but failed because of combinatorial explosion.
Extended grammar parsers: Extended the concept of phrase-structure rules and derivations by adding mechanisms for more complex representations and manipulation of sentences. This includes ATNs and charts.
Semantic grammar parsers: Modification to the traditional phrase structure approach that uses concepts like noun, verb, etc., into a semantic grammar for a particular domain, using concepts like destination, flight, flight-time. Used by Lifer and Sophie.
Grammar-less parsers: Ad hoc systems that use special procedures (e.g., SHRDLU).

Augmented Transition Networks

As described above, ATNs are RTNs that allow tests on their arcs, and have registers to record input values or functions of register values during processing. The basic bottom-up ATN parsing algorithm is as follows.

Set the ATN pointer to the start state and the sentence pointer to the beginning of the sentence being parsed. The current node is always the first element of the ATN pointer.
Select an arc out of the current node. In order to legally traverse the arc, it must satisfy the following conditions:
1. Any associated test must be satisfied by the current variable values.
2. If the arch is labeled with a word category (e.g., noun) the current word must be a member of that category.
Execute the actions associated with the arc. In addition, do the following, based on the type of arc:
1. If the arc is a word category, update the current position in the sentence by one word and change the current node to the destination of the arc.
2. If the arc corresponds to another ATN, push the starting node of that ATN onto the ATN pointer.
3. If the arc is jump, change the current node to the destination of the arc.
4. If the arc is done, pop the current node off of the ATN pointer and set * to the value returned by this node. If the ATN pointer is now empty and all of the text has been processed, return *. If the ATN pointer is empty and text remains, fail. Otherwise, return to step 2.

Conceptual parsing (CD)

[Schank] Schank developed conceptual dependency (CD) as a representation for the meanings of phrases and sentences. The basic tenet of CD theory is

For any two sentences that are identical in meaning, regardless of language, there should be only one representation of that meaning in CD.

Schank thus accepts the early machine translation concept of an interlingua or intermediate language. The other, more dubious (POD) idea in CD theory is

Any information in a sentence that is implicit must be made explicit in the representation of the meaning of the sentence.

Primitive acts in conceptual dependencies include: PTRANS (physical transfer), PROPEL (physical force), MTRANS (mental transfers, like tell), etc. There are 11 of these primitive acts [1977]. Relations among concepts are called dependencies, of which there are a fixed number (e.g., "R" for recipient-donor dependency.)

Here is the translation of the sentence John gives Mary a book:

                                 o         R +---> Mary
             John <===> ATRANS <--- book <---|
                                             +---> John

In the CD, "o" means that the book is the object of the ATRANS, and the three-pointed "R" relation is a recipient-donor dependency between Mary, John, and the book.

Conceptual dependencies also model primitive states, which are integer-valued variables such as HEALTH or MENTAL STATE. States and actions can be combined, so for example John told Mary that Bill was happy can be represented as

John MTRANS (Bill BE MENTAL-STATE(5)) to Mary.

In CDs, syntactic and semantic knowledge are combined into a single interpretation system that is driven by semantic knowledge. Syntactic parsing is subordinated to semantic interpretation, which is usually used to set up strong expectations for particular sentence structures. Parsing is heavily driven by a set of expectations that are set up on the basis of the sentence's main verb. Because the representation of a verb in CD is at a lower level than that of a verb in a case grammar, CD usually provides a greater degree of predictive power.

Machine Translation

Simply using dictionaries doesn't solve the problem. The other troubles are

Many words have several translations depending on context, and some don't have any (e.g., Japanese has two different words, "hot water" and "cold water").
Word orders vary from language to language.
Idiomatic expressions cannot be translated word-for-word but must be translated in toto.

A proposed approach for translating between arbitrary languages was to go through an intermediate "universal language" or interlingua. However, without commonsense world knowledge, certain kinds of translations are impossible. Bar-Hillel gives the example of "The pen is in the box" as opposed to "The box is in the pen." No translator without world understanding could parse these sentences correctly.

Speech Understanding

Since speech is our most natural form of communication, using spoken language to access computers has long been an important research goal. Originally, in the 1960's, research was done on isolated word recognition before the later development of speech understanding systems.

A physical problem with such systems is noise. Microphone and background noise can cause interference with the recording of the spoken utterance. Another problem is that the speaker doesn't always pronounce the same word in the same way. Sometimes whole syllables are dropped or "swallowed" at word boundaries. In fact, finding the boundaries between words in a connected-speech signal is one of the difficult problems in speech understanding.

With increased information available to the hearer beyond the acoustic signal (such as context), expectations about the content can be formed.

The ARPA Speech Understanding Research Program

In the early 1970's, the Advanced Research Projects Agency of the U.S. Department of Defense funded a five-year program in speech understanding research (the SUR project). The systems to be designed were to accept normally spoken sentences in a constrained domain with a 1,000 word vocabulary, and were to respond in reasonable time with less than 10% error. (This was one of the few times that AI programs had any design objectives specified before development.)

Natural Language Understanding Systems

General Syntactic Processor (GSP)

A versatile system for the parsing and generation of strings in NL. GSP can directly emulate several other syntactic processors, including Woods' ATN grammar. It is not in itself an approach to language processing, but a system in which various approaches can be described. The basic idea is a chart that represents both the grammar and the input sentence as a modified tree (modified by making it binary and interchanging nodes and arcs). Chart manipulation mechanisms operate on the grammar itself.

SAD-SAM

[Lindsay, 1963] Syntactic Appraiser and Diagrammer -- Semantic Analyzing Machine. Programmed by Robert Lindsay in 1963 at CMU. It used an basic English vocabulary (1,700 words) and followed a context-free grammar. It parsed input from left to right, built derivation trees, and passed them to SAM, which extracted the semantically relevant information to build family trees and find answers to questions.

BASEBALL

[Bert Green, 1963] An information retrieval program with a large database of facts about all American League games over a given year. It accepted input questions from the user, limited to one clause with no logical connectives.

SIR

[Bertram Raphael, 1968]

Semantic Information Retrieval system, it was a prototype "understanding" machine, since it could accumulate facts and then make deductions about them in order to answer questions.

STUDENT

[Daniel Bobrow, 1968] STUDENT was a pattern-matching natural language program written by Bobrow as his doctoral thesis work at MIT. It could solve high-school level algebra story problems.

ELIZA

[Weizenbaum, 1966] The most famous pattern-matching natural language program, ELIZA was built at MIT in 1966. The program assumes the role of a Rogerian, or "nondirective," therapist in its dialog with the user.

It operated by matching the left sides of its rules against the user's last sentence, and using the appropriate right side to generate a response. Rules were indexed by keywords so only a few had to be matched against a particular sentence. Some rules had no left side, so they could apply anywhere with replies like "Tell me more about that." Note that these rules are "approximate" matchers. This accounts for ELIZA's major strength, its ability to say something reasonable most of the time, as well as its major weakness, the superficiality of its understanding and its ability to be led completely astray.

LUNAR

[William Woods, 1973] LUNAR answered questions about the rock samples brought back from the moon using two databases -- the chemical analyzes and the literature references. Specifically, it helped geologists access, compare, and evaluate chemical analysis data on moon rocks and soil composition obtained from the Apollo-11 mission. It operated by translating a question entered in English into an expression in a formal query language. The translation was done with an ATN parser coupled with a rule-driven semantic interpretation procedure.

SHRDLU

[Winograd, 1972] SHRDLU carried on a dialog with a user in which the system simulated a robot manipulating a set of simple objects on a tabletop. Knowledge was represented as procedures within the system.

The design of the system was based on the belief that, to understand language, a program must deal in an integrated way with syntax, semantics, and reasoning. The basic viewpoint guiding its implementation was that meanings (of words, phrases, and sentences) can be embodied in procedural structures and that language is a way of activating appropriate procedures in the hearer.

The implemented SHRDLU system consisted of four basic elements: a parser, a recognition grammar for English, programs for semantic analysis (to change a sentence into a sequence of commands to the robot or into a query of its database), and a problem solver (which knows how to accomplish tasks in the blocks world).

SHRDLU's grammar is based on the notion of systemic grammar, a system of choice networks that specifies the features of a syntactic unit, how the unit functions, and how it influences other units.

By functionally integrating its knowledge of syntax and semantics, SHRDLU can avoid exploring alternative choices in an ambiguous situation. The parser is oriented toward making an original correct choice rather than performing exhaustive backtracking.

SHRDLU combined a sophisticated syntax analysis with a fairly general deductive system, operating in a "world" with visible analogs of perception and action. It was "comprehensive" in the sense that it dealt with all the aspects of language comprehension. Further naturalness came from the substantial body of programs dealing with linguistic phenomena of conversation such as context, pronouns, nouns, and ellipsis. Ideas incorporated were:

Use of a reasoning formalism (MICRO-PLANNER) based on procedural embedding of knowledge.
An emphasis on how language triggers action. The meaning of a sentence was not always a fact, but a command to do something.
A representation of lexical meaning based on procedures that operate in the building of representation structures.
An explicit representation of the cognitive context. In order to decide what a phrase like the red block refers to, it is not sufficient to consider facts about the world being described.

MARGIE

[Schank, 1973] Meaning Analysis, Response Generation, and Inference on English system, developed at Stanford in 1973. Provided an intuitive model of the process of natural language understanding; see Conceptual Parsing above.

MARGIE consisted of three components. The first, the conceptual analyzer, converted English sentences into a CD representation using production-like rules called "requests." The middle phase was an inference system that accepted CD propositions and deduced facts from it, given the current system memory. Inference knowledge was represented in a semantic net. (So, for example, if told John hit Mary the system might infer Mary might get hurt.)

The final component was a text-generation module that took internal CD representations and converted them into English-like output.

SAM

[Schank] Script Applier Mechanism. Makes use of frame-like data structures called scripts, which represent stereotyped sequences of events, to understand simple stories. Prototype frames make it possible to use expectations about the usual properties of known concepts and about what typically happens in a variety of familiar situations to help understand sentences about those objects and situations.

PAM

[Wilensky, 1978] Plan Applier Mechanism. Understands stories by determining the goals that are to be achieved in the story and attempting to match the actions of the story with the methods that it knows will achieve the goals.

Plans are the means by which goals are accomplished, and understanding plan-based stories involves discerning the goals of the actor and the methods by which the actor chooses to fulfill those goals. In a plan-based story, the understander must discern the goals of the main actor and the actions that accomplish those goals.

PAM tries to determine the main goal and the D-goals that will satisfy the goal. It analyzes input conceptualizations for their potential realization of one of the plan boxes that are called by one of the determined D-goals. PAM utilizes two kinds of knowledge structures in understanding goals: named plans and themes (such as LOVE, which contain background information upon which predictions can be based that individuals will have certain goals)..

The distinction between plan-based and script-based stories is simple: in a script-based story, parts or all of the story correspond to one or more scripts available to the story understander; in a plan-based story, the understander must discern the goals of the main actor and the actions that accomplish those goals.

LIFER

[Hendrix, 1977] Built at SRI, it is an off-the-shelf system for building "natural language front-ends" for applications in any domain. It has a set of interactive functions for specifying a language and parser.

Speech Recognition and Understanding Systems

HEARSAY

Speech understanding for voice chess [CMU, 1976]. HEARSAY uses a blackboard architecture with knowledge sources posting constraints. HEARSAY-II understood a spoken speech query about computer science abstracts stored in a database. HEARSAY-III is a general blackboard architecture.

The HEARSAY project was meant to overcome the limitations of syntax-directed methods of parsing from left to right.

HEARSAY uses three knowledge sources: acoustics and phonetics, syntax of legal utterances, and semantics of the domain. Knowledge was constrained by using expected utterances.

The blackboard was divided into a number of levels corresponding to a hierarchical breakdown of speech into units such as segments, syllables, words, and phrases. Hypotheses about these sentence units were posted to the appropriate levels, along with a time frame that indicated when the unit was hypothesized to occur in the utterance.

Based on the idea of cooperating, independent knowledge sources, hypotheses were posted to a global blackboard. This is a modular architecture; knowledge sources don't address each other directly, thus allowing different combinations of knowledge sources and control strategies to be tried.

HARPY

Speech understanding for document retrieval [CMU, 1980]. Uses a precompiled network knowledge structure. The net is searched using beam search.

The HARPY system was developed at CMU after extensive evaluation of HEARSAY-I and DRAGON. HARPY uses a single precompiled network knowledge structure. The network contains knowledge at all levels: acoustic, phonemic, lexical, syntactic, and semantic. The network contains all possible legal sentences in an efficient representation (though it also represents nonsense sentences as a result).

HARPY does a beam search so it never needs to backtrack, and only extends paths in the beam as long as they retain sufficiently high scores. HARPY was the most accurate and efficient performer of the all the ARPA SUR systems.

The system is very sensitive to missing or poorly-labeled segments since the network is composed of a sequential phonetic representation. Design cannot easily accommodate the pragmatics of the utterance, which other systems use to constrain search.

HWIM

Hear What I Mean [BBN, 1976]. Answers questions in the role of a travel agent, using a database of facts about trips and expenses.

This was the most demanding task of any of the ARPA SUR domains. All processing passed through the control strategy component which used knowledge sources as subroutines (in contrast to HEARSAY which has autonomous knowledge sources).

HWIM's organization permits more direct experimentation with control strategies -- manipulating, for example, how many word islands are generated, direction of expansion, and interaction of system components.

SRI/SDC Speech Systems

SDC is the bottom-end, SRI the top end. The task was to have the computer act as a consultant in a maintenance workstation setting.

Important features included concern with the nature of real man-machine discourse, a language definition system for specifying the input language to be understood, techniques for focusing the system's attention on certain aspects of the dialog, top-down process control stressing phrase-level hypothesizing, knowledge representation using partitioned semantic networks, and experimental evaluation of system design parameters.

Sources

Ronny Kohavi, AI Qual Note #16: Natural Language. Feigenbaum and Cohen, Handbook of Artificial Intelligence, Vol. I, Chapters 4-5; Vol. IV, Chapter 19.

Ginsberg, Essentials of Artificial Intelligence (?), Chapter 17.

See the AI Qual Reading List for further details on these sources.

Patrick Doyle pdoyle@cs.stanford.edu