Combining knowledge from different sources in causal probabilistic models



Authors:
Marek J. Druzdzel
Decision Systems Laboratory
School of Information Sciences
and Intelligent Systems Program
University of Pittsburgh
135 North Bellefield Avenue
Pittsburgh, PA 15260, U.S.A.
e-mail: marek@sis.pitt.edu

Francisco J. Diez
and Dept. Inteligencia Artificial
Universidad Nacional de Educacion a Distancia
Senda del Rey, 9
28040 Madrid, Spain
e-mail: fjdiez@dia.uned.es


Abstract:
Building probabilistic and decision-theoretic models requires a considerable knowledge engineering effort in which the most daunting task is obtaining the numerical parameters. Authors of Bayesian networks usually combine various sources of information, such as textbooks, statistical reports, databases, and expert judgement. In this paper, we demonstrate the risks of such a combination, even when this knowledge encompasses such seemingly population-independent characteristics as sensitivity and specificity of medical symptoms. We show that the criteria "do not combine knowledge from different sources" or "use only data from the setting in which the model will be used" are neither necessary nor sufficient to guarantee the correctness of the model. Instead, we offer graphical criteria for determining when knowledge from different sources can be safely combined into the general population model. We also offer a method for building subpopulation models. The analysis performed in this paper and the criteria we propose may be useful in such fields as knowledge engineering, epidemiology, machine learning, and statistical meta-analysis.

The paper is available in PostScript (150KB) and PDF (174KB) format.
Back to list of publications
Back to Marek's home page

marek@sis.pitt.edu / Last update: 14 May 2005