Robust independence testing for constraint-based learning of causal structure



Authors:

Denver H. Dash and Marek J. Druzdzel
Decision Systems Laboratory
School of Information Sciences
and Intelligent Systems Program
University of Pittsburgh
135 North Bellefield Avenue
Pittsburgh, PA 15260, U.S.A.
e-mail: ddash@isp.pitt.edu, marek@sis.pitt.edu

Abstract:
This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraint-based (CB) learning of causal structure. Our method produces a smoothed contingency table N_{XYZ} that can be used with any test of independence that relies on contingency table statistics. N_{XYZ} can be calculated in the same asymptotic time and space required to calculate a standard contingency table, allows the specification of a prior distribution over parameters, and can be calculated when the database is incomplete. We provide theoretical justification for the procedure, and with synthetic data we demonstrate its benefits empirically over both a CB algorithm using the standard contingency table, and over a greedy Bayesian algorithm. We show that, even when used with noninformative priors, it results in better recovery of structural features and it produces networks with smaller KL-Divergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small.

The paper is available in PostScript (495KB) and PDF (173KB) formats.
Back to list of publications
Back to Marek's home page

marek@sis.pitt.edu / Last update: 9 May 2005