Mining for association rules in market basket data has proved a
fruitful area of research. Measures such
as conditional probability (confidence) and correlation have been used
to infer rules of the form "the existence of item A implies the
existence of item B." However, such rules indicate only a
statistical relationship between A and B. They do not specify the
nature of the relationship: whether the presence of A causes the
presence of B, or the converse, or some other attribute or
phenomenon causes both to appear together. In applications, knowing
such causal relationships is extremely useful for enhancing
understanding and effecting change. While distinguishing causality
from correlation is a truly difficult problem, recent work in
statistics and Bayesian learning provide some avenues of attack. In
these fields, the goal has generally been to learn complete causal
models, which are essentially impossible to learn in large-scale data mining
applications with a large number of variables.
In this paper, we consider the problem of determining *casual*
relationships, instead of mere associations, when mining market basket
data. We identify some problems with the direct application of
Bayesian learning ideas to mining large databases, concerning both the
scalability of algorithms and the appropriateness of the statistical
techniques, and introduce some initial ideas for dealing with these
problems. We present experimental results from applying our
algorithms on several large, real-world data sets. The results
indicate that the approach proposed here is both computationally
feasible and successful in identifying interesting causal structures.
An interesting outcome is that it is perhaps easier to infer the
*lack of causality* than to infer causality, information that is useful
in preventing erroneous decision making.