Summary of "Understanding Natural Language"

Article by Barr and Feigenbaum

Summary by Sunil Vemuri for CS 523, March 11, 1996

(updated to include information from other sources June 7, 1996)

Overview

Communication is an essential component of intelligent behavior.
Much of the world's knowledge is in natural language.
Article presents a brief sketch of the history of research and gives an idea of the state of the art.
Most successful NLP systems have limited domain or limited vocabulary. Cyc attempted to solve the problem in an unrestricted domain.
Computational linguistics "the use of computers in the study of language" is a related field.
for a survey paper, this paper needs more coverage of the field ... maybe it was covered in other chapters

Applications

Machine Translation of text from one language to another
Natural language interfaces to databases
Reading for the blind (disabilities)
Natural Language Generation/Explanation

Machine Translation

Started in 1946 based on diescussions between Warren Weaver and A. Donald Booth about the similarity of code breaking (i.e Enigma codes) and traslation.

Early research speculated that computers could be used for Machine Translation "Translation from one language to another".

Much of this work done in the 50s and 60s.

Proposed the notion of an interlingua - an intermediate "universal language" that all humans share. If such an interlingua exists, machine translation could be done through it.

Problem turned out to be much harder than expected because one cannot simplay translate the words from one sentence to another, but must understand the meaning of the sentence.

Systems:

TAUM-METRO
- translates weather reports from English to French
- language highly stylized and regular
SPANAM (Vasconcellos and Leon, 1985)
- Spanish to English.
- Poor quality, but could be cost effective (i.e. hiring someone to massage the source language prior to translation and hiring someone to massage the output after translation is more cost effective than hiring a bilingual translator)

Current work at ISI/USC:

PANGLOSS Project (1994)
- Attempt to build a large-scale lightweight ontology for the purpose of machine translation.
- They have Spanish-English and Japanesse-English translation.

Bar-Hillel's example that showed some the difficulty in translation
The pen is in the box.
The box is in the pen.

Machine Translation sort of died in 1966 with the ALPAC report
"There has been no machine translation of general scientific text, and none is in immediate prospect"

Area was revitalized somewhat with the introduction of transformational grammars, better knowledge representations which led to better inferencing capabilities (i.e. an interlingua)

Grammars

Formal Grammars

Stuff covered in undergrad automata class.

formal language, context-free grammar, context sensitive grammar

Chomsky hierarchy:
0: Unrestricted. Turing recognizable
1: context-sensitive grammar
2: context-free grammar
3: regular grammar

For regular and context-free grammars, there are practical parsing algorithms to determine whether or not a given string is an element of the language, and if so, to assign to it a syntactic structure in the form of a derivation tree.

Natural languages are not generally context-free.

Transformational Grammars

The combination of a grammer and transformational rules. Proposed by Chomsky in "Syntactic Structurs". (see p.246-247) for an example.

First part was a phrase-structured grammar
Second, a sequence of transformational rules rearranging strings, adding/deleting morphemes
Third, a sequence of morphophonemic rules would map sentence representation to phonemes
obligatory transformations - transformation that insure agreement in number of subject and verb
optoinal transformations - transformations that preserve meaning, but change the structure

Generative Grammar (definition)

infinite number of possible sentences in the language
assigning to each a structural description that captures the underlying knowledge of the language held by the idealized user.

Context-sensitive (phrase-structured) grammers are inadequate for English. This is why transformations were introduced.

CSG made description of English clumsy and complex
CSG would assign same structure to sentences that mean different things
No way of identifying sentences same meaning but different surface structure

Katz-Postal hypothesis - the application of transformational rules to deep structures must preserve meaning.

Chomsky proposed:

interpretive semantics - keeps syntactic component an autonomous system.
generative semantics - syntax and semantics cannot be sharply separated and, consequently, a distinct level of syntactic deep structure does not exist.

Systemic Grammar

A theory within which linguistic structure as related to the function or use of the language, often termed pragmatics, is studied. (Halliday)

Functions of a language

ideational function - serves for the expression of content
interpersonal function - purpose of the utterance (asking question, answering, request, opinion, giving information)
textual function - the need for coherence in language use (theme, context)

Categories of Systemic Grammar

The units of the language (sentence, clause, group, word, morpheme)
The structure of the units
Classification of the units (verbal, nominal, adverbial)
The system (list of choices representing options to the speaker)

Case Grammars

Similar to the notion of case in traditional grammars. In English, case forms are mainly persoanal pronouns (I, my, me) and possessive endings ("'s").

Revision of the framework of transformational grammars.

Noun phrases and verb phrases are associated with each other in a particular relationship

Come up with a small number of fixed cases. (Agent, counter-agent, object, result, instrument, source, goal, experiencer)

"The Case for Case" What can case analysis provide answers for. (Fillmore, 1968)

Which noun phrase is the subject
Different cases may not be conjoined
Buy/Sell, Teach/Learn. Same meaning, different case frames

Case frames can be represented as semantic nets.

Systems using case at the deepest level may represent the meaning of sentences in a way that collapses structurallly distinct but identical meaning phrases.

Other Grammers (not in Paper)

Augmented Phrase Structure Grammars (APSG)

provide an alternate way of moving from simple CF grammars to ones sufficiently powerful for specifying a natural language
don't have details about how APSG differ from PSG
used in DIAGRAM and TEAM systems

Definite Clause Grammars (DCG)

Pereira and Warren
Grammar rules written as logical formulas, usually in PROLOG.
Process of recognizing whether a string belongs to the language of the DCG becomes equivalent to proving a theorem in logic.

Unification Grammar

Functional Unification Grammar (FUG) - using unification as the main operation in parsing. Sounds quite similar to DCG
Kay's work on this predates DCG

Parsing

The "delinearization" of linguistic input
Want to form structures like derivation tree
The best understood aspect of NLP

Issues in parser design

Uniformity (general parser vs. domain specific)
Multiple sources of Knowledge
-mixing syntactic parsing with other processing
-word recognition, word meaning
Precision vs. flexibility
- tolerance to poor syntactic structure
Type of structure returned

Parsing strategies

Backtracking vs. parallel processing
Top-down vs. bottom-up processing
Choosing how to expand or combine
- how are constituents combined
Multiple knowledge sources
-many relevant facts, which ones to use?
- phonemic, lexical, syntactic, semantic, etc.

Actual parsing systems

Template matching
- match input againt a predefined template
- SIR, STUDENT, ELIZA
Simple phrase-structured grammar parsers
- can deal with a moderatly large subset of English
- structure corresponds directly to the grammar rules
Transformation grammar parsers
- try to put theory of transformational grammars into practice
- tend to fail because of combinatorial explosion
Extended grammar parsers
- one of the most successful methods
- Augmented Transition net grammars (ATNs)
- Well-Formed Substring Table (WFST) [Kuno and Oettinger]
Semantic grammar parsers
- successful
- use semantic categories instead of part of speech
- <DESTINATION>, <FLIGHT> instead of <NOUN>
- LIFER, SOPHIE
Grammarless parsers
- based on special procedures, semantics-based techniques

Augmented Transition Network (ATN)

Based on finite-state transition diagrams
Can be viewed as a grammar formalism or a machine
First, let's define a Recursive Transition Network

Recursive Transition Networks (RTN)

A collection of push-down automatas
Each automata can be thought of as a "subroutine" that other automatas can "call"
Simpler for humans to understand
Modular
can be nondeterministic and can be parallelized
extension of RTN

Augmented Transition Network [Woods]

registers that store partially formed derivation trees
arcs can have arbitrary tests
actions can be associated with arcs
as powerful as a Turing Machine
computationally implementable and efficient
suited for transformational grammers (actions allow transformations)
used in LUNAR, text generation systems, speech understanding
does not handle ungramattical sentences well (poor flexibility)
proved to be a convenient means for specifying a large grammar

One can essentially think of an ATN as an RTN with a frame attached to each arc. The frame can contain any information that would be helpful for parsing (slots on frame = registers). For example, partially formed derivation trees. Also, one can have procedural attachments within the frame to compute values of slots/registers

Part of the advantage with ATNs is that they allow the grammar designer to maintain a simple grammar structure and delegate some of the parsing details (i.e. matching noun & verb number) to the procedural attachments. One can encode parsing details in the grammer as is done with definite clause grammars, but the grammar becomes quite cumbersome and awkward and the general simple structure of the grammer is lost.

Another major advantage of ATNs is that they are quite efficient.

Several textsbooks (Ginsberg, Luger & Stubblefield) discuss ATNs in detail.

The General Syntactic Processor

- Kaplan, 1973

Charts

represent both the grammar and the input sentence
Precedence made explicit, Dominance removed
basically a tree that has been modified in two ways
- rearranged to produce a binary tree
- nodes and arcs interchanged
charts can be used to represent a string of trees
provides a representation for multiple interpretations of a given word or phrase (see figure D3-4, p. 270)
article tries to discuss how parsing is done, but the description is too brief to get a grasp of what is really going on

Grammatical Rules

a series of states, with transitions between the states accomplished by following arcs
can be represented in the same way as a chart and can use chart manipulation mechanisms too.

Control Structure of GSP

REGISTERS - similar to registers in ATNs, pointers to structures
LEVELSTACK - Used for recursion
NDLIST - (nondeterminism list), list of choice points in a grammar
PROCSTACK - list of suspended processes

GSP are similar to ATN with extensions

GSP uses a chart
Grammar is encoded as a chart too
process can be suspended and resumed

Semantic Analysis

The general idea of semantic interpretation is to take NL utterances and mapping them onto some representation of the meaning or intermediate analysis. A few terms used in semantic interpretation are:

Domain - that piece of the world (real or imaginary) that the system knows about
Context - determined by a combination of the previous utterances in the discourse, the location in space and time of the utterance, and the beliefs, desires, and intentions (both private and shared) of the discourse participant.
Task - what the user is trying to do with his utterances

Some of the problems in semantic analysis include ambiguity and underdeterminism. Lexical ambiguity refers to words and phrases conveying more than one sense. Scope ambiguity refers to how to bind quantifiers ("all", "every", "not", "and", "or", most"). Underdeterminism refers to cases where all the information has not been specified explicity, but may be suggested implicitly.

Conceptual Dependancy theory (Schank) is one of the more famous examples of semantic representations.

Compositional Semantics - the semantics of any phrase is a function of the semantics of its subphrases. Compositional semantics allows us to handle an infinite grammar with a finite (and often small) set of rules.

Semantic interpretation along cannot be certain of the right interpretation of a phrase or sentence. So we divide the work -- semantic interpretation is responsible for combining meanings compositionally to get a set of possible interpretations, and disambiguation is responsible for choosing the best one. This disambiguation is the contract of the pragmatic analyzer.

In semantic interpretation, we have a quasi-logical form which we wish to convert to a logical form. Once in a logical form (i.e. FOPL), we buy the semantics of logic which will allow us to draw deductions, inferences, etc. LUNAR was the first system to attempt this as it had an exteneded notational variant of FOLPL.

Ambiguity

Lexical ambiguity - word has more than one meaning
Syntactic ambiguity (aka structural ambiguity) - the syntax of a sentence leads to multiple alternative semantic interpretations. (i.e. "I smelled a wumpus in 2,2"). Sentences with multiple parses have syntactic ambiguity
Semantic ambiguity - multiple interpretations of the same phrase. Sentences which are syntacticly unambiguous can still be semanticly ambiguous ("John and Jane are cat people. They have 10 cats each.", "The Cat people attacked before the Martians").
Referential ambiguity - basically the problem of anaphora. what does "it" refer to in a given phrase. This is the contract of a discourse disambiguator.
Pragmatic ambiguity - speaker and hearer disagree on what the current situation is. (i.e. "I'll meet you next friday" ... what does 'next' mean).
Local ambiguity - a substring can be parsed several ways. Basically a form of syntactic ambiguity (I don't know why Russell & Norvig distinguish between these) "The radio broadcasts information" "the radio broadcats inform"
vague - Natural language is sometimes vague ("It's hot")

Disambiguation typically requires information about the world, the situation, time and place as well as information about the speaker's and hearer's intentions, desires and beliefs. To formalize this, we use a combination of four models for disambiguation

world model - the probability that a fact occurs in the world
mental model - probability that the speaker forms the intention of communicating this fact to the hearer
language model - probability that a certian string of words will be chosen
acoustic model - probability that a particular sequence of sounds will be generated

Probabilistic context-free grammer . Each context-free production rule has a probability associated with it.

Discourse Interpretation

When people use language, they write or speak not in single isolated utterances, but in extended sequences of them. Natural languages take advantage of this by providing ways for one utterance to use all aspects of the context of previous utterances to augment what is explicity said in the utterance itself. Context is determined by a combination of previous utterances in the discourse.

Early work assumed one could find pronoun reference by linearly searching back within the text. This was proven to be false. It will work in many cases, but not in general. The structure of the discourse is the primary factor. Also, the refering entity might not be mentioned explicitly (i.e. commonsense knowledge: "john ate the pie". Implicit in this sentence is that he put the food in his mouth). Conceptual Dependancy theory attempts to alleviate this by representing implicit entities explicitly.

Anaphora - the occurrence of phrases referring to objects that have been mentioned previously.

Focusing of attention. Immediate focusing operates at the level of an utterance while global focusing operates at the discourse segment level.

Language and Intention

The ways in which the beliefs and intentions of speakers and writers shape teh utterances they produce and, conversely, the ways in which listeners' and readers' beliefs about a speaker's or writer's beliefs and intentions affect how those utterances are understood.

Under one view, Speech Acts (requests, informings, promises, etc.) [Searle, 1969] and the intentions behind them, rather than phrases or sentences, are the basic building blocks of communication.

The system must be able to reason about the connection between knowledge and action.

Research in this area has focused on use of general AI planning techniques in order to plan speech acts and recognize the intended actions another agent has performed. There has been work on using STRIPS as the basis for a system that could plan individual requesting and informing actions [Cohen & Perrault].

Flipping the problem around Perrault worked on ways of applying plan recognition techniques to discourse analysis in NLP. In a similar light, the notion of a script in conceptual dependancy theory [Schank & Ablelson] was used in PAM [Wilensky, 1983] and it was found useful in discourse analysis.

Generation

Content determination

Text planning

Explanation Generation (Gruber, Gautier, Lester, Mallory)

NLP Systems

Two classes of NLP systems include interactive systems and text processing systems. Interactive systems include NL systems whose primary mode of intereaction is through natural language. Text processing systems are ones in which NL texts are the primary objects of interest.

Issues in design of an NLP System:

Modularity - separation of domain dependant and independant information. Separation of processing modules from resources
Integration - bringing info from several modules together
Problem factorization - how to distribute of a task among modules (what can syntax handler do, what can semantic analyzer do)
Transportability - moving a system to a new domain
Habitability - dealing with information outside one's domain (this is similar to the concept of brittleness in expert systems - sv)
Extensibility - extend systems linguistic and/or conceptual coverage
Speed - especially important for interactive systems
Veracity - how close system comes to being an accurate model of human language behavior

SHRDLU

Winograd, 1972. Ph.D. Thesis
to understand language, a program must deal in an integrated way with syntax, semantics, and reasoning
the basic viewpoint guiding its implmentation was that meaning (of words, phrases, and sentences) can be embodied in procedural structures and that language is a way of activating appropriate procedures withinthe hearer.

SHRDLU achieve high performance levels by using procedural representations for syntactic, semantic, and reasoning.
Works only in the blocks world domain
Maintains an interactive dialog with the user
used MICRO-PLANNER (a Lisp based programming language) to encode knowledge about the world and reason about it.
- Problem solver has a group of theorems about the environment
- Problems are solved by specific procedures that reduce problems down to either trivial theorem proving or impossible proofs. Hence the efficiency.
grammar is based on the notion of systemic grammar
- a system of choice networks that specifies the features of a syntactic unit, how the unit functions, and how it influences other units.
contains "semantic markers" to rule out grammatically correct, but semantically incorrect sentences (i.e. "The table picked up the block")
designed custom parser called "PROGRAMMER" which depended on semantic procedures to make non-deterministic decisions instead of parallelism or backtracking
attempts to combine models of human linguistic and reasoning methods in the language understanding process
limited domain, probably won't scale

MARGIE

Meaning Analysis, Response Generation, and Interface on English
intent was to provide an intuitive model of the process of natural language understanding

Conceptual Dependancy (CD)

for any two sentences that are identical in meaning, regardless of language, there should be only one representation of that meaning in CD. There have been proven examples where two sentences of different structure, but same meaning were mapped to different CD representations (I forget who proved this -- sv)
Schank used CD reps as an interlingua (important point)
CD representations are made up of small number of semantic primitives.
primitives: PTRANS, PROPEL, ATRANS, MTRANS, MBUILD
see p. 301 for example
implicit concepts are made explicit
useful for parphrasing and question answering
in contrast to phrase-structured grammars and transformational grammers, the parse has little resemblance to the syntactic structure of the original sentence.

MARGIE

Converted English sentences to conceptual dependency representation
Inference mechanism to deduce large set of facts about a sentence including facts not explcity stated by implied by the sentence
Text Generation module that converted CD representation to English output
Margie ran in two modes
- inference mode - takes a sentence and makes inferences
- paraphrase mode - attempts to restate sentence

SAM and PAM

Schank, Abelson (1975, 1977)
demonstrate the use of scripts and plans in understanding simple stories
Conceptual dependancy
- creates from a set of semantic primitives, unique representations for all sentences with the same meaning
- theoretical basis for more complex story structures: scripts, plans, goals, themes.
Script is a standardized sequence of events that describe some stereotypical human activity (see Figure F6-1 on p. 307 for an example)
- script header - used to match scripts to parsed sentences
- main concept or goal of script
- network of causal connection. Causes are expected, not explicit.
Problems with Scripts
- Difficulty with events that don't run to completion (how do you know when one script is done and another begins)
- Obstacles, an activity that was thwarted by some outside cause
Plans
- the means by which goals are accomplished
- understanding plan-based stories involved descerning the goals of the actor and the methods by which the actor chooses to fulfill the goals.
- understanding plan-based stories involves determining actor's goal, subgoals, and main goal which will lead to matching the actors actions associated with the goals.
Goals and Themes
- Goals are stated explicity
- Themes imply goals

SAM

accept a story as input, parse, convert to CD
fits a story into a script and then produces a summary
has a memory module (MEMTOK) and a script applier (APPLY)
see p. 312 for an example paraphrase

PAM

Wilensky, 1978
similar to SAM
understands stories by determining the goals, and attempting to match actions of the story with methods it knows will achieve the goal
uses named plans (a set of ations and subgoals for accomplishing a main goal) and themes

BASEBALL

One of the first Q/A systems. Domain covered one season of statistics in American League baseball. Analyzed queries and created frame-like representation of them. Used the structured representation of the query to find the information in the database. Queries were in natural language, but not all types of queries were allowed:

No logical connectives (an, or, not)
Single clause only (i.e. no dependant clauses)
No queries with relations (most, highest)
No sequential facts: "Did the Red Sox ever win six games in a row "

Example good queries include:

"Who did the Red Sox lose to on July 5?"
"Did every team play at least once in each park in each month"

Transportability, habitability, extensibility, and speed were not of concern. Major contribution of BASEBALL was that it was an existence proof that such systems were possible.

LUNAR

Domain was analyses of rock samples from Apollo 10 moon landing mission. Question answering system. Important in terms of dealing with proper scoping of quantifiers and it's well-defined procedural semantics. SHRDLU took a similar approach of attaching a procedure to each predicate.

ELIZA

Natural language interface to a psychaitrist. Had pattern matchine rules that triggered based on key words found in user's dialog. Used literal text form within users dialog to refomulate questions. No understanding of what was being said, it just merely spit back questions that seemed most relevant to the last user input. [Weisenbaum, 1966]

PARRY

Attempted to embody a theory of paranoia in a system. Used a natural language interface. Used pattern matching techniques similar to that found in ELIZA. Apparently, PARRY passed some form of modified Turing Test. Designed to maximize habitability and speed. Veracity was also important.

LADDER

Practical NL database Q/A system. Wanted to provide users with a transparent way of gaining access to information distributed over various databases.

SOPHIE

A prototype instructional system aimed at teaching students procedural knowledge and reasoning strategies. SOPHIE was fast and habitable at the expense of modularity.

Questions

How are NLP and Vision similar?

What are some approaches to syntactic analysis?

What are some approaches to semantic analysis?

What are some approaches to pragmatic analysis?

What are some approaches to discourse analysis?

Definitions

Signal Processing - task of taking a spoken bit of language and turning it into a sequence of words.
Syntax - the structure of a sentence
Semantics - making sense of a parse by examining the meanings of the words it contains
Pragmatics - understanding the sentence or other text in the context of our overall knowledge base

References

Readings in Natural Language Processing
Computers and Thought
Russell & Norvig
"Understanding Natural Language" Barr & Feigenbaum