HISA (Vic) home Magazine back issues Previous Article Next article Title Contents

Belief Networks


Malcolm Pradhan

Part 2: Basic Networks

In the last issue I described Bayes rule and the advantages of probability for representing uncertain information in computer systems. This month I will introduce belief networks with a simple (very simple) example and discuss some of their advantages, and problems.

A Simple Belief Network

You may recall that using simple Bayes rule required making assumptions of mutual exclusivity and conditional independence. Belief networks allow us to get around these limitations by adopting a graphical notation to explicitly represent independence between variables. A network consists of nodes, which represent un-known entities, and directed arcs which connect nodes (directed means they have arrows). A belief network can be extended into an influence diagram by adding other kinds of nodes called decision nodes and value nodes [1]. Influence diagrams are used to model decision making using probability so decisions are optimized to maximize the value node. In this article I will concentrate on the subset of belief network without decision nodes, used commonly for inference.

Say we want to know if someone has had a bacterial pneumonia (PN) based on the presence of two symptoms: cough and fever, but a fever can also be due to influenza - as I mentioned, this is a very simplistic example. We have four uncertain entities, the two symptoms and two diseases, and we can represent them as shown in Figure 1.

Figure 1. A very simple belief network

The network in figure 1 is drawn to show that a pneumonia may cause a cough and fever, while flu can cause just the fever. Although we use the term cause in this example the arrows represent influence rather than a stricter definition of causality. For figure 1 we can say that the presence of pneumonia influences the presence of cough and of fever.

Once we have our model represented in a network we can assign probabilities to the nodes. We will assume that each finding and disease can be either present or absent, but belief networks do not restrict the number of states a node has. Fever, for example, could be represented as 3 centigrade values "less than 37.5", "37.6-39", and ">39.1" if we felt that these distinctions would make a difference to our model. We need the following probabilities for figure 1:

p(PN) prior probability of someone having pneumonia, which is the prevalence of the disease in this case

p(flu) prior probability of someone having flu

p(cough|PN) probability of cough given that pneumonia is present

p(cough|ØPN) probability of cough given that pneumonia is not present

p(fever|PN, flu) probability of fever given that pneumonia and flu are present

p(fever|ØPN, flu) probability of fever given that pneumonia is absent but flu is present

p(fever|PN,Øflu) probability of fever given that pneumonia is present but flu is absent, and

p(fever|ØPN,Øflu) probability of fever given that both pneumonia and flu are absent.

The p(finding|disease) is the sensitivity of the finding for that disease, and the p(finding| Ødisease) is the false positive rate (one minus the specificity). If a node has multiple parents a full joint probability is required over all states of the parents.

In this network this is represented by the four probabilities required for the specification of the fever node. The number of probabilites required to specify a node grows exponentially with the number of parents: If we had 4 parents we would need to acquire 24 = 16 probabilities! In my next article I will describe the noisy-Or technique which allows us to avoid this exponential growth under certain circumstances.

One of the complaints against probabilistic methods is 'where do all the numbers come from?'. Most often the numbers come from expert opinion and/or literature review. The advantage of using probability is that the numbers can also be derived from studies or hospital databases when such entities exist. Another consideration is that not all numbers are critical, generally the larger the network the less important is an individual value. If the network creators are unsure how reliable the numbers are they may do

sensitivity analyses to examine the effect of variables on the nodes of interest, and concentrate on obtaining more accurate probabilities if needed.

An advantage of the belief networks is their graphical representation, they are a good communication tools. By representing the important variables and assigning arcs between nodes that are dependent we can build an explicit representation of the problem we are trying to solve. We can discuss these networks with experts who know little about probability or belief networks to refine the structure of our model.

Independence and Complexity

In figure 1 we have a simple belief network without any connection between the nodes pneumonia and flu. The lack of arc between these nodes means they are independent, but they are not independent if we know the person has a fever.

Our simple network only knows of two diagnoses, pneumonia and flu, so if a person is known to have a fever and we know they do not have pneumonia the probability of flu becomes more likely. That is, the absence of pneumonia increases the likelihood of flu given we know they have a fever [2]. Likewise, we can calculate other independencies from the graphical structure of the belief network, which is an advantage because algorithms can understand the independence assumptions without considering the actual numbers involved.

If two nodes are dependent, that is they are connected by an arc, then the direction of the arc is unimportant as we can reverse the arc using Bayes rule. Generally the network is built using arcs from cause to effect, however the network is often used to infer unobserved events (e.g. pneumonia) from observed events (e.g. fever and cough). Reversing arcs can lead to a more complex network [3] so it is often convenient to leave the network in causal form and let algorithms do the inference.

A significant problem of the belief network representation is that they have been proven to be NP-hard [4]. In English, this means that the time required to calculate a network increases exponentially with the complexity of the network. This limitation shows up in real-life networks and is one reason why probabilistic networks have not been used extensively. In recent years several methods have been developed to reduce the time required to calculate belief networks and make them more attractive to real situations.

Next month I will give an overview of such techniques, and describe examples of belief networks that are being used today.

References

[1] Howard, R.A., Matheson, J.E. "Influence diagrams". In Applications of Decision Analysis, volume 2, eds. R.A. Howard and J.E. Matheson, 721-762. Menlo Park, Calif.: Strategic Systems Group, 1981.

[2] Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, Calif.: Morgan Kaufmann, 1988.

[3] Shacter, R.D., Heckerman, D.E. "Thinking backwards for knowledge acquisition" AI Magazine, 8(Fall), 55-61, 1987.

[4] Cooper, G.F. "Probabilistic inference using belief networks is NP-Hard". Technical Report KSL-87-27, Section on Medical Informatics, Stanford University, 1987.

About the Author

Malcolm Pradhan is a medical graduate of the University of Adelaide. He is currently doing a post-graduate degree in medical informatics at the Section of Medical Informatics, Stanford University, California, USA. He can be contacted by fax (415) 725-7944 or via Internet e-mail at pradhan@camis.stanford.edu.


Previous Article Next article Title Contents Your Comments
HTML by Jon Hilton SCON Solutions, +61-3-94823794
Conversion greatly assisted by rtftoweb.
Copyright © HISA Inc.
Revised Thu Sep 14 21:09:08 1995