Summary of "Understanding Natural Language"
Article by Barr and Feigenbaum
Summary by Sunil Vemuri for CS 523, March 11, 1996
(updated to include information from other sources June 7, 1996)
Overview
- Communication is an essential component of intelligent behavior.
- Much of the world's knowledge is in natural language.
- Article presents a brief sketch of the history of research and gives an idea of the
state of the art.
- Most successful NLP systems have limited domain or limited vocabulary. Cyc attempted
to solve the problem in an unrestricted domain.
- Computational linguistics "the use of computers in the study of language" is a related
field.
- for a survey paper, this paper needs more coverage of the field ... maybe it was
covered in other chapters
Applications
- Machine Translation of text from one language to another
- Natural language interfaces to databases
- Reading for the blind (disabilities)
- Natural Language Generation/Explanation
Machine Translation
Started in 1946 based on diescussions between Warren Weaver and A. Donald Booth about
the similarity of code breaking (i.e Enigma codes) and traslation.
Early research speculated that computers could be used for Machine Translation "Translation
from one language to another".
Much of this work done in the 50s and 60s.
Proposed the notion of an interlingua - an intermediate "universal language" that
all humans share. If such an interlingua exists, machine translation could be done
through it.
Problem turned out to be much harder than expected because one cannot simplay translate
the words from one sentence to another, but must understand the meaning of the sentence.
Systems:
- TAUM-METRO
- translates weather reports from English to French
- language highly stylized and regular
- SPANAM (Vasconcellos and Leon, 1985)
- Spanish to English.
- Poor quality, but could be cost effective (i.e. hiring someone to massage the source
language prior to translation and hiring someone to massage the output after translation
is more cost effective than hiring a bilingual translator)
Current work at ISI/USC:
- PANGLOSS Project (1994)
- Attempt to build a large-scale lightweight ontology for the purpose of machine translation.
- They have Spanish-English and Japanesse-English translation.
Bar-Hillel's example that showed some the difficulty in translation
The pen is in the box.
The box is in the pen.
Machine Translation sort of died in 1966 with the ALPAC report
"There has been no machine translation of general scientific text, and none is in
immediate prospect"
Area was revitalized somewhat with the introduction of transformational grammars,
better knowledge representations which led to better inferencing capabilities (i.e.
an interlingua)
Grammars
Formal Grammars
Stuff covered in undergrad automata class.
formal language, context-free grammar, context sensitive grammar
Chomsky hierarchy:
0: Unrestricted. Turing recognizable
1: context-sensitive grammar
2: context-free grammar
3: regular grammar
For regular and context-free grammars, there are practical parsing algorithms to determine
whether or not a given string is an element of the language, and if so, to assign
to it a syntactic structure in the form of a derivation tree.
Natural languages are not generally context-free.
Transformational Grammars
The combination of a grammer and transformational rules. Proposed by Chomsky in "Syntactic
Structurs". (see p.246-247) for an example.
- First part was a phrase-structured grammar
- Second, a sequence of transformational rules rearranging strings, adding/deleting
morphemes
- Third, a sequence of morphophonemic rules would map sentence representation to phonemes
- obligatory transformations - transformation that insure agreement in number of subject
and verb
- optoinal transformations - transformations that preserve meaning, but change the
structure
Generative Grammar (definition)
- infinite number of possible sentences in the language
- assigning to each a structural description that captures the underlying knowledge
of the language held by the idealized user.
Context-sensitive (phrase-structured) grammers are inadequate for English. This is
why transformations were introduced.
- CSG made description of English clumsy and complex
- CSG would assign same structure to sentences that mean different things
- No way of identifying sentences same meaning but different surface structure
Katz-Postal hypothesis - the application of transformational rules to deep structures
must preserve meaning.
Chomsky proposed:
- interpretive semantics - keeps syntactic component an autonomous system.
- generative semantics - syntax and semantics cannot be sharply separated and, consequently,
a distinct level of syntactic deep structure does not exist.
Systemic Grammar
A theory within which linguistic structure as related to the function or use of the
language, often termed pragmatics, is studied. (Halliday)
Functions of a language
- ideational function - serves for the expression of content
- interpersonal function - purpose of the utterance (asking question, answering, request,
opinion, giving information)
- textual function - the need for coherence in language use (theme, context)
Categories of Systemic Grammar
- The units of the language (sentence, clause, group, word, morpheme)
- The structure of the units
- Classification of the units (verbal, nominal, adverbial)
- The system (list of choices representing options to the speaker)
Case Grammars
Similar to the notion of case in traditional grammars. In English, case forms are
mainly persoanal pronouns (I, my, me) and possessive endings ("'s").
Revision of the framework of transformational grammars.
Noun phrases and verb phrases are associated with each other in a particular relationship
Come up with a small number of fixed cases. (Agent, counter-agent, object, result,
instrument, source, goal, experiencer)
"The Case for Case" What can case analysis provide answers for. (Fillmore, 1968)
- Which noun phrase is the subject
- Different cases may not be conjoined
- Buy/Sell, Teach/Learn. Same meaning, different case frames
Case frames can be represented as semantic nets.
Systems using case at the deepest level may represent the meaning of sentences in
a way that collapses structurallly distinct but identical meaning phrases.
Other Grammers (not in Paper)
Augmented Phrase Structure Grammars (APSG)
- provide an alternate way of moving from simple CF grammars to ones sufficiently powerful
for specifying a natural language
- don't have details about how APSG differ from PSG
- used in DIAGRAM and TEAM systems
Definite Clause Grammars (DCG)
- Pereira and Warren
- Grammar rules written as logical formulas, usually in PROLOG.
- Process of recognizing whether a string belongs to the language of the DCG becomes
equivalent to proving a theorem in logic.
Unification Grammar
- Functional Unification Grammar (FUG) - using unification as the main operation in
parsing. Sounds quite similar to DCG
- Kay's work on this predates DCG
Parsing
- The "delinearization" of linguistic input
- Want to form structures like derivation tree
- The best understood aspect of NLP
Issues in parser design
- Uniformity (general parser vs. domain specific)
- Multiple sources of Knowledge
-mixing syntactic parsing with other processing
-word recognition, word meaning
- Precision vs. flexibility
- tolerance to poor syntactic structure
- Type of structure returned
Parsing strategies
- Backtracking vs. parallel processing
- Top-down vs. bottom-up processing
- Choosing how to expand or combine
- how are constituents combined
- Multiple knowledge sources
-many relevant facts, which ones to use?
- phonemic, lexical, syntactic, semantic, etc.
Actual parsing systems
- Template matching
- match input againt a predefined template
- SIR, STUDENT, ELIZA
- Simple phrase-structured grammar parsers
- can deal with a moderatly large subset of English
- structure corresponds directly to the grammar rules
- Transformation grammar parsers
- try to put theory of transformational grammars into practice
- tend to fail because of combinatorial explosion
- Extended grammar parsers
- one of the most successful methods
- Augmented Transition net grammars (ATNs)
- Well-Formed Substring Table (WFST) [Kuno and Oettinger]
- a record of all complete constituents that have been found thus far
- can be used to prevent a top-down parser from looking for and finding the same constituent
multiple times
- charts and chart parsers
- Semantic grammar parsers
- successful
- use semantic categories instead of part of speech
- <DESTINATION>, <FLIGHT> instead of <NOUN>
- LIFER, SOPHIE
- Grammarless parsers
- based on special procedures, semantics-based techniques
Augmented Transition Network (ATN)
- Based on finite-state transition diagrams
- Can be viewed as a grammar formalism or a machine
- First, let's define a Recursive Transition Network
Recursive Transition Networks (RTN)
- A collection of push-down automatas
- Each automata can be thought of as a "subroutine" that other automatas can "call"
- Simpler for humans to understand
- Modular
- can be nondeterministic and can be parallelized
- extension of RTN
Augmented Transition Network [Woods]
- registers that store partially formed derivation trees
- arcs can have arbitrary tests
- actions can be associated with arcs
- as powerful as a Turing Machine
- computationally implementable and efficient
- suited for transformational grammers (actions allow transformations)
- used in LUNAR, text generation systems, speech understanding
- does not handle ungramattical sentences well (poor flexibility)
- proved to be a convenient means for specifying a large grammar
One can essentially think of an ATN as an RTN with a frame attached to each arc.
The frame can contain any information that would be helpful for parsing (slots on
frame = registers). For example, partially formed derivation trees. Also, one can
have procedural attachments within the frame to compute values of slots/registers
Part of the advantage with ATNs is that they allow the grammar designer to maintain
a simple grammar structure and delegate some of the parsing details (i.e. matching
noun & verb number) to the procedural attachments. One can encode parsing details
in the grammer as is done with definite clause grammars, but the grammar becomes quite cumbersome
and awkward and the general simple structure of the grammer is lost.
Another major advantage of ATNs is that they are quite efficient.
Several textsbooks (Ginsberg, Luger & Stubblefield) discuss ATNs in detail.
The General Syntactic Processor
- Kaplan, 1973
Charts
- represent both the grammar and the input sentence
- Precedence made explicit, Dominance removed
- basically a tree that has been modified in two ways
- rearranged to produce a binary tree
- nodes and arcs interchanged
- charts can be used to represent a string of trees
- provides a representation for multiple interpretations of a given word or phrase
(see figure D3-4, p. 270)
- article tries to discuss how parsing is done, but the description is too brief to
get a grasp of what is really going on
Grammatical Rules
- a series of states, with transitions between the states accomplished by following
arcs
- can be represented in the same way as a chart and can use chart manipulation mechanisms
too.
Control Structure of GSP
- REGISTERS - similar to registers in ATNs, pointers to structures
- LEVELSTACK - Used for recursion
- NDLIST - (nondeterminism list), list of choice points in a grammar
- PROCSTACK - list of suspended processes
GSP are similar to ATN with extensions
- GSP uses a chart
- Grammar is encoded as a chart too
- process can be suspended and resumed
Semantic Analysis
The general idea of semantic interpretation is to take NL utterances and mapping them
onto some representation of the meaning or intermediate analysis. A few terms used
in semantic interpretation are:
- Domain - that piece of the world (real or imaginary) that the system knows about
- Context - determined by a combination of the previous utterances in the discourse,
the location in space and time of the utterance, and the beliefs, desires, and intentions
(both private and shared) of the discourse participant.
- Task - what the user is trying to do with his utterances
Some of the problems in semantic analysis include ambiguity and underdeterminism.
Lexical ambiguity refers to words and phrases conveying more than one sense. Scope
ambiguity refers to how to bind quantifiers ("all", "every", "not", "and", "or",
most"). Underdeterminism refers to cases where all the information has not been specified
explicity, but may be suggested implicitly.
Conceptual Dependancy theory (Schank) is one of the more famous examples of semantic
representations.
Compositional Semantics
- the semantics of any phrase is a function of the semantics of its subphrases.
Compositional semantics allows us to handle an infinite grammar with a finite (and
often small) set of rules.
Semantic interpretation along cannot be certain of the right interpretation of a phrase
or sentence. So we divide the work -- semantic interpretation is responsible for
combining meanings compositionally to get a set of possible interpretations, and
disambiguation is responsible for choosing the best one. This disambiguation is the contract
of the pragmatic analyzer.
In semantic interpretation, we have a quasi-logical form which we wish to convert
to a logical form. Once in a logical form (i.e. FOPL), we buy the semantics of logic
which will allow us to draw deductions, inferences, etc. LUNAR was the first system
to attempt this as it had an exteneded notational variant of FOLPL.
Ambiguity
- Lexical ambiguity
- word has more than one meaning
- Syntactic ambiguity
(aka structural ambiguity) - the syntax of a sentence leads to multiple alternative
semantic interpretations. (i.e. "I smelled a wumpus in 2,2"). Sentences with multiple
parses have syntactic ambiguity
- Semantic ambiguity
- multiple interpretations of the same phrase. Sentences which are syntacticly unambiguous
can still be semanticly ambiguous ("John and Jane are cat people. They have 10 cats
each.", "The Cat people attacked before the Martians").
- Referential ambiguity
- basically the problem of anaphora. what does "it" refer to in a given phrase.
This is the contract of a discourse disambiguator.
- Pragmatic ambiguity
- speaker and hearer disagree on what the current situation is. (i.e. "I'll meet
you next friday" ... what does 'next' mean).
- Local ambiguity
- a substring can be parsed several ways. Basically a form of syntactic ambiguity
(I don't know why Russell & Norvig distinguish between these) "The radio broadcasts
information" "the radio broadcats inform"
- vague
- Natural language is sometimes vague ("It's hot")
Disambiguation typically requires information about the world, the situation, time
and place as well as information about the speaker's and hearer's intentions, desires
and beliefs. To formalize this, we use a combination of four models for disambiguation
- world model - the probability that a fact occurs in the world
- mental model - probability that the speaker forms the intention of communicating this
fact to the hearer
- language model - probability that a certian string of words will be chosen
- acoustic model - probability that a particular sequence of sounds will be generated
Probabilistic context-free grammer
. Each context-free production rule has a probability associated with it.
Discourse Interpretation
When people use language, they write or speak not in single isolated utterances, but
in extended sequences of them. Natural languages take advantage of this by providing
ways for one utterance to use all aspects of the context
of previous utterances to augment what is explicity said in the utterance itself.
Context is determined by a combination of previous utterances in the discourse.
Early work assumed one could find pronoun reference by linearly searching back within
the text. This was proven to be false. It will work in many cases, but not in general.
The structure of the discourse is the primary factor. Also, the refering entity might not be mentioned explicitly (i.e. commonsense knowledge: "john ate the pie".
Implicit in this sentence is that he put the food in his mouth). Conceptual Dependancy
theory attempts to alleviate this by representing implicit entities explicitly.
Anaphora
- the occurrence of phrases referring to objects that have been mentioned previously.
Focusing of attention. Immediate focusing operates at the level of an utterance while
global focusing operates at the discourse segment level.
Language and Intention
The ways in which the beliefs and intentions of speakers and writers shape teh utterances
they produce and, conversely, the ways in which listeners' and readers' beliefs about
a speaker's or writer's beliefs and intentions affect how those utterances are understood.
Under one view, Speech Acts (requests, informings, promises, etc.) [Searle, 1969]
and the intentions behind them, rather than phrases or sentences, are the basic building
blocks of communication.
The system must be able to reason about the connection between knowledge and action.
Research in this area has focused on use of general AI planning techniques in order
to plan speech acts and recognize the intended actions another agent has performed.
There has been work on using STRIPS as the basis for a system that could plan individual requesting and informing actions [Cohen & Perrault].
Flipping the problem around Perrault worked on ways of applying plan recognition techniques
to discourse analysis in NLP. In a similar light, the notion of a script in conceptual
dependancy theory [Schank & Ablelson] was used in PAM [Wilensky, 1983] and it was found useful in discourse analysis.
Generation
Content determination
Text planning
Explanation Generation (Gruber, Gautier, Lester, Mallory)
NLP Systems
Two classes of NLP systems include interactive systems and text processing systems.
Interactive systems include NL systems whose primary mode of intereaction is through
natural language. Text processing systems are ones in which NL texts are the primary
objects of interest.
Issues in design of an NLP System:
- Modularity
- separation of domain dependant and independant information. Separation of processing
modules from resources
- Integration
- bringing info from several modules together
- Problem factorization
- how to distribute of a task among modules (what can syntax handler do, what can
semantic analyzer do)
- Transportability
- moving a system to a new domain
- Habitability
- dealing with information outside one's domain (this is similar to the concept of
brittleness in expert systems - sv)
- Extensibility
- extend systems linguistic and/or conceptual coverage
- Speed
- especially important for interactive systems
- Veracity
- how close system comes to being an accurate model of human language behavior
SHRDLU
Winograd, 1972. Ph.D. Thesis
to understand language, a program must deal in an integrated way with syntax, semantics,
and reasoning
the basic viewpoint guiding its implmentation was that meaning (of words, phrases,
and sentences) can be embodied in procedural structures and that language is a way
of activating appropriate procedures withinthe hearer.
- SHRDLU achieve high performance levels by using procedural representations for syntactic,
semantic, and reasoning.
- Works only in the blocks world domain
- Maintains an interactive dialog with the user
- used MICRO-PLANNER (a Lisp based programming language) to encode knowledge about
the world and reason about it.
- Problem solver has a group of theorems about the environment
- Problems are solved by specific procedures that reduce problems down to either trivial
theorem proving or impossible proofs. Hence the efficiency.
- grammar is based on the notion of systemic grammar
- a system of choice networks that specifies the features of a syntactic unit, how
the unit functions, and how it influences other units.
- contains "semantic markers" to rule out grammatically correct, but semantically incorrect
sentences (i.e. "The table picked up the block")
- designed custom parser called "PROGRAMMER" which depended on semantic procedures
to make non-deterministic decisions instead of parallelism or backtracking
- attempts to combine models of human linguistic and reasoning methods in the language
understanding process
- limited domain, probably won't scale
MARGIE
- Meaning Analysis, Response Generation, and Interface on English
- intent was to provide an intuitive model of the process of natural language understanding
Conceptual Dependancy (CD)
- for any two sentences that are identical in meaning, regardless of language, there
should be only one representation of that meaning in CD. There have been proven
examples where two sentences of different structure, but same meaning were mapped
to different CD representations (I forget who proved this -- sv)
- Schank used CD reps as an interlingua (important point)
- CD representations are made up of small number of semantic primitives.
- primitives: PTRANS, PROPEL, ATRANS, MTRANS, MBUILD
- see p. 301 for example
- implicit concepts are made explicit
- useful for parphrasing and question answering
- in contrast to phrase-structured grammars and transformational grammers, the parse
has little resemblance to the syntactic structure of the original sentence.
MARGIE
- Converted English sentences to conceptual dependency representation
- Inference mechanism to deduce large set of facts about a sentence including facts
not explcity stated by implied by the sentence
- Text Generation module that converted CD representation to English output
- Margie ran in two modes
- inference mode - takes a sentence and makes inferences
- paraphrase mode - attempts to restate sentence
SAM and PAM
- Schank, Abelson (1975, 1977)
- demonstrate the use of scripts and plans in understanding simple stories
- Conceptual dependancy
- creates from a set of semantic primitives, unique representations for all sentences
with the same meaning
- theoretical basis for more complex story structures: scripts, plans, goals, themes.
- Script is a standardized sequence of events that describe some stereotypical human
activity (see Figure F6-1 on p. 307 for an example)
- script header - used to match scripts to parsed sentences
- main concept or goal of script
- network of causal connection. Causes are expected, not explicit.
- Problems with Scripts
- Difficulty with events that don't run to completion (how do you know when one script
is done and another begins)
- Obstacles, an activity that was thwarted by some outside cause
- Plans
- the means by which goals are accomplished
- understanding plan-based stories involved descerning the goals of the actor and the
methods by which the actor chooses to fulfill the goals.
- understanding plan-based stories involves determining actor's goal, subgoals, and
main goal which will lead to matching the actors actions associated with the goals.
- Goals and Themes
- Goals are stated explicity
- Themes imply goals
SAM
- accept a story as input, parse, convert to CD
- fits a story into a script and then produces a summary
- has a memory module (MEMTOK) and a script applier (APPLY)
- see p. 312 for an example paraphrase
PAM
- Wilensky, 1978
- similar to SAM
- understands stories by determining the goals, and attempting to match actions of
the story with methods it knows will achieve the goal
- uses named plans (a set of ations and subgoals for accomplishing a main goal) and
themes
-
BASEBALL
One of the first Q/A systems. Domain covered one season of statistics in American
League baseball. Analyzed queries and created frame-like representation of them.
Used the structured representation of the query to find the information in the database.
Queries were in natural language, but not all types of queries were allowed:
- No logical connectives (an, or, not)
- Single clause only (i.e. no dependant clauses)
- No queries with relations (most, highest)
- No sequential facts: "Did the Red Sox ever win six games in a row
"
Example good queries include:
- "Who did the Red Sox lose to on July 5?"
- "Did every team play at least once in each park in each month"
Transportability, habitability, extensibility, and speed were not of concern. Major
contribution of BASEBALL was that it was an existence proof that such systems were
possible.
LUNAR
Domain was analyses of rock samples from Apollo 10 moon landing mission. Question
answering system. Important in terms of dealing with proper scoping of quantifiers
and it's well-defined procedural semantics. SHRDLU took a similar approach of attaching
a procedure to each predicate.
ELIZA
Natural language interface to a psychaitrist. Had pattern matchine rules that triggered
based on key words found in user's dialog. Used literal text form within users dialog
to refomulate questions. No understanding of what was being said, it just merely spit back questions that seemed most relevant to the last user input. [Weisenbaum,
1966]
PARRY
Attempted to embody a theory of paranoia in a system. Used a natural language interface.
Used pattern matching techniques similar to that found in ELIZA. Apparently, PARRY
passed some form of modified Turing Test. Designed to maximize habitability and
speed. Veracity was also important.
LADDER
Practical NL database Q/A system. Wanted to provide users with a transparent way
of gaining access to information distributed over various databases.
SOPHIE
A prototype instructional system aimed at teaching students procedural knowledge and
reasoning strategies. SOPHIE was fast and habitable at the expense of modularity.
Questions
How are NLP and Vision similar?
What are some approaches to syntactic analysis?
What are some approaches to semantic analysis?
What are some approaches to pragmatic analysis?
What are some approaches to discourse analysis?
Definitions
- Signal Processing - task of taking a spoken bit of language and turning it into a
sequence of words.
- Syntax - the structure of a sentence
- Semantics - making sense of a parse by examining the meanings of the words it contains
- Pragmatics - understanding the sentence or other text in the context of our overall
knowledge base
References
Readings in Natural Language Processing
Computers and Thought
Russell & Norvig
"Understanding Natural Language" Barr & Feigenbaum