Knowledge engineering for very large decision-analytic medical models



Authors:
Marek J. Druzdzel
Decision Systems Laboratory
School of Information Sciences
and Intelligent Systems Program
University of Pittsburgh
e-mail: marek@sis.pitt.edu

Agnieszka Onisko
Bialystok University of Technology
Institute of Computer Science
Bialystok, 15-351, Poland
e-mail: aonisko@ii.pb.bialystok.pl
FAX: (085) 422-393

Daniel Schwartz
Center for Biomedical Informatics
University of Pittsburgh

John N. Dowling
Center for Biomedical Informatics
University of Pittsburgh

Hanna Wasyluk
The Medical Center of Postgraduate Education
and Institute of Biocybernetics
and Biomedical Engineering,
Polish Academy of Sciences
Warsaw, Marymoncka 99, Poland
e-mail: hwasyluk@cmkp.edu.pl

Graphical decision-analytic models, such as Bayesian networks, are powerful tools for modeling complex diagnostic problems, capable of encoding subjective expert knowledge and combining it with available statistics. Practical models built using this approach often reach the size of tens or even hundreds of variables. Large models pose challenging problems in terms of knowledge engineering. Their sheer scale requires numerous interactions with experts, eliciting both the structure of the model and the probabilities that quantify the interactions among its variables. A model of a hundred variables, for example, may require several thousand numerical parameters. If traditional decision analytic techniques are applied, both the structure and the numbers require countless sessions with an expert. Furthermore, even if an expert were available, trusting the quality of several thousand numbers would be naive. Skills and experience in constructing large Bayesian networks are difficult to gain and there is almost no literature that would aid a beginning model builder.

The goal of our presentation is to describe typical problems that occur in building large medical Bayesian network models and to illustrate some practical techniques to overcome them. We have collaborated over the last couple of years on building diagnostic systems for diagnosis of liver disorders, processing of liver pathology data, and various epidemiological models. In our conference presentation, we will focus on typical problems encountered in model building from the point of view of both knowledge engineers (Druzdzel and Onisko) and medical experts (Schwartz, Dowling and Wasyluk). We will illustrate the knowledge engineering process with examples from our networks.

There are three important aspects of model building: building the graphical structure of the model, obtaining the numerical parameters for this structure, and verification. We believe that these three are closely related to each other and should be the focus of efforts from the very start of the process, which in turn should be iterative as opposed to oneshot.

The graphical structure of a Bayesian network, often downplayed in the literature, models important and robust structural properties of the domain direct interactions among the domain variables and, indirectly, conditional independencies among them. The structure is an important focus of the interaction with experts during all stages of model building and it is a good practice to make it follow the causal structure of the domain. This provides a common denominator among various experts and users of the model. The graphical structure, and in particular its connectivity, has a direct impact on the number of numerical parameters required to fully quantify the model. It is also the single most important factor in the accuracy of the ultimate model.

Through our work, we have learned to appreciate the importance of a reliable modeling tool that allows for easy construction, presentation, and modification of graphs, allows for documenting the model while building it, and supports hierarchical model structure that hides unnecessary detail in large models. While we have not encountered an ideal tool, we found GeNIe, developed at the Decision Systems Laboratory, University of Pittsburgh (described elsewhere in this volume and presented at the conference), suitable for developing medical applications.

The space allocated for this abstract does not allow us for an indepth coverage of our presentation. We will make a full-length paper covering the contents of our presentation available to interested readers at http://www.pitt.edu/~druzdzel/publ.html.

Acknowledgments
Our research was supported by the Air Force Office of Scientific Research, grant F49620-97-1-0225, by the National Science Foundation under Faculty Early Career Develop-ment (CAREER) Program, grant IRI-9624629, by the Polish Committee for Scientific Research, grant 8T11E00811, by the Medical Centre of Postgraduate Education of Poland grant 501-2-1-02-14/99, and by the Institute of Biocybernetics and Biomedical Engineering Polish Academy of Sciences, grant 16/ST/99.


An extended version of this paper can be found (at the following location).

The paper is available in PostScript (61KB) and PDF (43KB) formats.


Back to list of publications
Back to Marek's home page

marek@sis.pitt.edu / Last update: 14 May 2005