An experimental comparison of methods for handling incomplete data in
learning parameters of Bayesian networks
- Authors:
-
Agnieszka Onisko
Bialystok University of Technology
Institute of Computer Science
Bialystok, 15-351, Poland
e-mail:
aonisko@ii.pb.bialystok.pl
-
Marek J. Druzdzel
Decision Systems Laboratory
School of Information Sciences
and
Intelligent Systems Program
University of Pittsburgh
e-mail: marek@sis.pitt.edu
-
Hanna Wasyluk
The Medical Center of Postgraduate Education
Warsaw, Marymoncka 99, Poland
e-mail: hwasyluk@cmkp.edu.pl
-
Abstract:
-
Missing values of attributes in data sets, also referred to as
incomplete data, pose difficulties in learning tasks, such as
classification, data mining, or learning Bayesian network
structure and its numerical parameters.
Because of the predominance of incomplete data in practice,
many methods have been proposed to deal with them while there
are few studies that compare their performance.
The HEPAR II project presents an excellent opportunity to test
experimentally how these methods perform on a real data set.
We briefly review several popular methods for handling incomplete
data and then compare them on the task of learning conditional
probability distributions of a Bayesian network model, where the
comparison criterion is the resulting diagnostic accuracy.
While substitution of "normal" values of missing attributes
seemed to perform best, we observed only a small difference
in performance among the studied methods.
The full paper is available in
Compressed PostScript (93KB)
and
PDF (109KB)
formats.
Back to list of publications
Back to Marek's home page
marek@sis.pitt.edu /
Last update: 11 May 2005