Skip to content

About BAMT algorithms

Rimmary edited this page Sep 4, 2022 · 9 revisions

Web-BAMT is a web service that allows you to train Bayesian networks on demonstration examples on data of various nature.

Introduction to Bayesian networks

A Bayesian network is a pair of directed acyclic graph (DAG) describing the dependencies of characteristics and some factorization of the joint distribution of characteristics in the product of conditional, generated by these dependencies. The task of training a Bayesian network is thus split into two subtasks:

  • Finding the structure of the Bayesian network.
  • Parametric learning of the Bayesian network or, in other words, selection of marginal and conditional distributions that describe the conditional ones accurately enough.

Structural learning algorithms

Often the task of constructing a network is reduced to optimization. In the DAG space, score functions are introduced that evaluate how well the graph describes the dependencies between features. Web BAMT uses the Hill-Climbing algorithm to search in this space. Steps of Hill-Climbing algorithm:

  1. Initialized by a graph without edges;
  2. For each pair of nodes, an edge is added, removed or the direction is changed;
  3. The score-function value is counted after the selected action on the pair;
  4. If the value of score-function is better than that of the previous iteration, the result is memorized;
  5. Stops when the value changes less than the threshold value.

The following score functions from BAMT package are included in Web BAMT: K2, BIC (Bayesian Information Criterion), MI (Mutual Information).

The above approach to structure search allows to introduce elements of expert control by narrowing the search area to structures that include expert-specified edges or fixed root nodes describing key and basic features.

Parametric learning algorithms

Parametric learning of distributions is performed by the method of likelihood maximization in a fixed class of distributions. In classical conditional Gaussian Bayesian networks, multinomial discrete distributions are used to describe discrete features and Gaussian distributions are used to approximate continuous ones.

One of the extensions to this basic model available in Web BAMT is the application of a multinomial mixture of Gaussian distributions. BIC and AIC criterion-based approach is used to determine the number of mixture components. Parameter learning of such a model is also done by the method of likelihood maximization. Due to the large number of unknowns, an EM algorithm is used to find the model parameters, which consists of two steps: an estimation step in which we estimate the posterior probabilities of the mixture, and a maximization step in which we recalculate the mixture parameters to maximize the posterior probabilities.

Another available modification is including classification models, in this implementation such models are represented by logistic regression (logit). In general, Bayesian network structure learning methods allow continuous variables to be parents of discrete variables. The main problem arises at the parameter learning stage, as such parent-child pairs imply a model that estimates a discrete distribution with respect to some continuous data. However, these types of relationships can be specified expertly and ignoring them may affect the quality and interpretability of the model.

Sampling algorithms

Sampling in Bayesian networks is done top-down according to the topological order. For the root nodes, the parametric model describes the distribution from which the random value is to be obtained. For discrete ones this is a multinomial discrete distribution, for continuous ones it is a Gaussian distribution or a mixture of Gaussian distributions. Once the values for the root vertices are obtained, all the following distributions are conditional on the values in the parent vertices.

For discrete values, these will also be random values obtained according to the table of conditional distributions estimated from the data in the case of discrete parents or obtained from the classification model if at least one parent is continuous. In the continuous case, the distribution parameters for sampling are obtained using gaussian mixture regression (GMR). If the simple option is chosen, such a model will be degenerate and will have only one mixture element.