Mining for association rules in market basket data has proved a
fruitful area of research. Measures such as conditional probability
(confidence) and correlation have been used to infer rules of the form
``the existence of item A implies the existence of item B.'' However,
such rules indicate only a statistical relationship between A and B.
They do not specify the nature of the relationship: whether the
presence of A causes the presence of B, or the converse, or some other
attribute or phenomenon causes both to appear together. In
applications, knowing such causal relationships is extremely useful
for enhancing understanding and effecting change. While
distinguishing causality from correlation is a truly difficult
problem, recent work in statistics and Bayesian learning provide some
avenues of attack. In these fields, the goal has generally been to
learn complete causal models, which are essentially impossible to
learn in large-scale data mining applications with a large number of
variables.
In this paper, we consider the problem of determining *casual*
relationships, instead of mere associations, when mining market basket
data. We identify some problems with the direct application of
Bayesian learning ideas to mining large databases, concerning both the
scalability of algorithms and the appropriateness of the statistical
techniques, and introduce some initial ideas for dealing with these
problems. We present experimental results from applying our
algorithms on several large, real-world data sets. The results
indicate that the approach proposed here is both computationally
feasible and successful in identifying interesting causal structures.
An interesting outcome is that it is perhaps easier to infer the *lack
of causality* than to infer causality, information that is useful in
preventing erroneous decision making.