Summary of Tom M. Mitchell's "Generalization as Search"

(by Christian Shelton)

This paper beings by giving examples of the wide range of abilities all grouped as learning. Mitchell then limits the scope of the paper to generalization by highlighting it as a central capability in many aspects of learning. Generalization is described here as the ability to take a set of events or descriptions and their classification (binary in this case) and predict the classification of other events or descriptions following the same set of "rules" as the first set.


A set of definitions are required to understand this general conversation about generalization:

instance language
the language used to represent the observations to be generalized
generalization language
the language used to represent the learned generalization. It should be noted that a generalization language often times will not allow all possible generalizations.
matching predicate
a test to determine whether or not an instance is classified as positive in a given generalization language.
training instance
an instance from the instance language and its classification used for learning the generalization
Generalization Problem
  1. A language in which to describe instances
  2. A language in which to describe generalizations
  3. A matching predicate that matches generalizations to instances
  4. A set of positive and negative training instances of a target generalization to be learned

Determine: Generalizations within the provided language that are consistent with the presented training instances (i.e., plausible descriptions of the target generalization)."

Ordering and Search

"The above generalization problem is essentially a search problem. The generalization language corresponds to an hypothesis space (search space) of possible solutions, and the learning task is to examine this hypothesis space, subject to constraints imposed by the training instances, to determine plausible generalizations."

An important feature of this hypothesis space is that it is partially ordered by the "more-specific-than" operator. "This partial ordering is important because it provides a powerful basis for organizing the search through the hypothesis space."

Thus there are three major ways of searching this space:

Considerations when Comparing Search Methods


(where p is the number of positive sets, n is the number of negative sets, s is the maximum size of the set S and g is the maximum size of the set G)

Processing Time:

Storage Space:

Other Generalization Strategies

These are non-data-driven methods (the above three were all data-driven).
a class of strategies that produce hypotheses independently of the data given and then test them against all of the data.
Statistical Pattern Recognition
a class of strategies that represent instances as points in an n-dimensional space and decision surfaces as surfaces in these spaces. These methods tend to be better than the search methods when there are errors in the training set.

Further Issues

The following are open problems in machine learning