Criteria for Combining Knowledge from Different Sources in Probabilistic Models



Authors:
Marek J. Druzdzel
Decision Systems Laboratory
School of Information Sciences
and Intelligent Systems Program
University of Pittsburgh
135 North Bellefield Avenue
Pittsburgh, PA 15260, U.S.A.
e-mail: marek@sis.pitt.edu

F. Javier Diez
Departamento de Inteligencia Artificial
Universidad Nacional de Educacion a Distancia
Senda del Rey, 9
28040 Madrid, Spain
e-mail: fjdiez@dia.uned.es

Abstract:
Building probabilistic and decision-analytic models requires a considerable knowledge engineering effort in which obtaining numerical parameters is especially daunting. Often knowledge engineers combine various sources of information, such as information reported in textbooks and professional literature, available statistics, and data collected in practical settings. We show that combining probabilistic knowledge that originates from different sources requires utmost care. In particular, we demonstrate that even such seemingly population-independent characteristics as sensitivity and specificity of medical symptoms can vary within a population, depending purely on how the data are collected. We offer guidelines for detecting when different sources of data can be safely combined. Our analysis shows that a knowledge engineer should exercise much care in building practical models.

The full paper is available in PostScript (254KB) and PDF (234KB) formats.
Back to list of publications
Back to Marek's home page

marek@sis.pitt.edu / Last update: 4 May 2005