Learning Bayesian network parameters from small data sets: Application of Noisy-OR gates



Authors:
Agnieszka Onisko
Bialystok University of Technology
Institute of Computer Science
Bialystok, 15-351, Poland
e-mail: aonisko@ii.pb.bialystok.pl

Marek J. Druzdzel
Decision Systems Laboratory
School of Information Sciences
and Intelligent Systems Program
University of Pittsburgh
e-mail: marek@sis.pitt.edu

Hanna Wasyluk
The Medical Center of Postgraduate Education
Warsaw, Marymoncka 99, Poland
e-mail: hwasyluk@cmkp.edu.pl

Abstract:
Existing data sets of cases can significantly reduce the knowledge engineering effort required to parameterize Bayesian networks. Unfortunately, when a data set is small, many conditioning cases are represented by too few or no data records and they do not offer sufficient basis for learning conditional probability distributions. We propose a method that uses Noisy-OR gates to reduce the data requirements in learning conditional probabilities. We test our method on HEPAR II, a model for diagnosis of liver disorders, whose parameters are extracted from a real, small set of patient records. Diagnostic accuracy of the multiple-disorder model enhanced with the Noisy-OR parameters was 6.7% better than the accuracy of the plain multiple-disorder model and 14.3% better than a single-disorder diagnosis model.

The full paper is available in Compressed PostScript (275KB) and PDF (339KB) formats.
Back to list of publications
Back to Marek's home page

marek@sis.pitt.edu / Last update: 9 May 2005