Add pyAgrum package and MIIC algorithm? #115

bdatko · 2024-06-07T18:01:03Z

I think pyAgrum would be a great addition to the list of algorithms. To my eyes, it did not look like there was a comparison in benchpress using the Multivariate Information-based Inductive Causation (MIIC) algorithm which pyAgrum has implemented. The library also offer a scikit-learn interface to learn classifiers which should help with the integration into benchpress.

The text was updated successfully, but these errors were encountered:

felixleopoldo · 2024-06-08T00:15:15Z

Hi, that sounds like a good idea. In pyAgrum they call the useMIIC function on a learner object (link) and link, but it's not totally clear how to pass arguments to the algorithm, like choosing score or test function. Do you have some sample usage?
MIIC also seems to be implemented here. Do you know which one to prefer?

bdatko · 2024-06-08T18:04:00Z

@felixleopoldo The useMIIC is the their lower-level API, but there is a convenience class pyAgrum.skbn.BNClassifier where the default choice of learningMethod is MIIC. The other choice for learningMethod are: Chow-Liu, NaiveBayes, Tree-augmented NaiveBayes, MIIC + (MDL or NML), Greedy Hill Climb, Tabu. You can use scoringType within the initializer of pyAgrum.skbn.BNClassifier to pick your flavor: AIC, BIC, BD, BDeu, K2, Log2.

There are examples of using pyAgrum.skbn.BNClassifier within this notebook titled Learning classifiers, shown below is a call using MIIC (cell 7 from the linked notebook):

#we use now another method to learn the BN (MIIC)
BNTest= skbn.BNClassifier(learningMethod = 'MIIC', prior= 'Smoothing', priorWeight = 0.5,
                          discretizationStrategy = 'quantile', usePR = True, significant_digit = 13)

xTrain, yTrain = BNTest.XYfromCSV(filename = 'res/creditCardTest.csv', target = 'Class')

More examples using BNClassifier can be found in the notebook titled Comparing classifiers (including Bayesian networks) with scikit-learn.

I have only used pyAgrum because I don't know R so, I have never directly compared the two. pyAgrum is a Python wrapper around the aGrum C++ library where their MIIC implementation is sourced in C++. It looks similar to how the original authors of MIIC provide a C++ implementation wrapped in R, but I don't know for sure.

Let me know if you need any more help. =)

felixleopoldo · 2024-06-09T04:51:57Z

Thanks. It seems like they refer to the Bayesian network as a classifier, where one is specified as Target? It would be nice if you could show how to do the following two steps:

Learn the graph of a Bayesian network from a CSV data file (in the Benchpress data format) using with relevant parameters for structure learning
Write the adjacency matrix representation of the graph to a CSV file following Benchpress graph format

bdatko · 2024-06-12T04:07:45Z

Learn the graph of a Bayesian network from a CSV data file (in the Benchpress data format) using with relevant parameters for structure learning

I hope the example below demos what you need.

Write the adjacency matrix representation of the graph to a CSV file following Benchpress graph format

From what I know, there isn't any convenient writer to save the adjacency matrix to CSV so, shown below is a small helper to save the matrix in the format for benchpress.

The example assumes you have the following installed in your environment: pyAgrum, pandas, scikit-learn. You will need all three to run the example below.

import csv
from pathlib import Path

import pandas as pd
import pyAgrum.skbn as skbn
from pyAgrum import BayesNet


def adjacency_to_csv(bn: BayesNet, *, to_file: str):

    id_to_name = {bn.idFromName(name): name for name in bn.names()}

    with Path(to_file).open(mode="w", encoding="utf-8") as csvfile:
        writer = csv.writer(csvfile)
        # write header
        writer.writerow(id_to_name[col_id] for col_id in range(bn.size()))
        #write rows
        adj_mat = bn.adjacencyMatrix()
        writer.writerows(row for row in adj_mat)


data = pd.read_csv(
    "https://github.com/mwaskom/seaborn-data/master/titanic.csv"
).dropna()

data.to_csv("fully_obs_titanic.csv", index=False)

classifier = skbn.BNClassifier(learningMethod="MIIC", scoringType="BIC")
xdata, ydata = classifier.XYfromCSV(filename="fully_obs_titanic.csv", target="survived")
classifier.fit(xdata, ydata)

adjacency_to_csv(classifier.bn, to_file="resulting_adjacency.csv")

Here is the resulting adjacency matrix:

❯ cat resulting_adjacency.csv
survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,1,0,0,0,0,1,0,0,0,0,0,0,0,0
0,0,1,1,0,0,0,0,0,0,1,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,1,0
0,1,0,0,0,0,0,0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,1,1,0,0,0,1,0,0,0,0,0

I ran this example with the following environment:

Python 3.11.7
numpy               1.26.4
pandas              2.2.2
pyAgrum             1.14.0
scikit-learn        1.5.0
scipy               1.13.1

felixleopoldo · 2024-06-12T05:58:14Z

Thanks a lot. So for the target variable (survived), can we just choose the first one in the order?

bdatko · 2024-06-12T14:14:28Z

For the fit method of BNClassifier you can specify any column within the CSV file, see here. Shown below is the snippet for the target

Fits the model to the training data provided. The two possible uses of this function are fit(X,y) and fit(data=…, targetName=…). Any other combination will raise a ValueError

targetName (str) – specifies the name of the targetVariable in the csv file. Warning: Raises ValueError if either X or y is not None. Raises ValueError if data is None.

felixleopoldo · 2024-06-16T12:19:27Z

Ok!

phwuil · 2024-06-18T05:56:53Z

Hi @felixleopoldo , many thanks to @bdatko for this "issue".

Actually, BNClassifier is based on the BNLearner class. If you want to test the learning algorithms of pyAgrum, you should use BNLearner.
MIIC is a "constraint-based" method based on mutual information. There is no score but one can apply corrections (MDL/NML). Of course, you can add some priors for the parameters approximation.

import pyAgrum as gum
learner=gum.BNLearner("test.csv") # MIIC is used as default (some score-based are also implented)
learner.useMDLCorrection() # for small dataset
learner.useSmoothingPrior() # smoothing (default weight=1) for parameters
bn=learner.learnBN() # learning

Thanks again to @bdatko. Please tell me if you need some other snippets :-)

felixleopoldo · 2024-06-19T01:17:58Z

Hi @phwuil,
thanks for the snippet. Could you show how MIIC could be run on continuous data too?

phwuil · 2024-06-20T05:38:10Z

hi @felixleopoldo , thank you for that. pyAgrum is mainly about discrete variables. However there are 2 solutions for continuous data :
1- automatic discretization
2- CLG (experimental python model)

1- automatic discretisation with pyAgrum.skbn.BNDiscretizer

import pyAgrum as gum
import pyAgrum.skbn as skbn

filename="test.csv"
# BNDiscretizer has many options 
disc=skbn.BNDiscretizer()
template=disc.discretizedBN(filename)

# template contains all the (discrete variables) 
# that will be used for the learning
learner=gum.BNLearner(filename,template)
learner.useMDLCorrection()
learner.useSmoothingPrior()
bn=learner.learnBN()

phwuil · 2024-06-20T05:42:25Z

2- CLG : new CLG implementation in pyAgrum 1.14.0
pyAgrum.CLG tutorial

import pyAgrum.clg as gclg
# no hybrid learning : pure clg data
learner = clg.CLGLearner(filename)
clg = learner.learnCLG()

felixleopoldo · 2024-06-24T05:06:38Z

OK. There is a new pyagrum branch, where you can try pyagrum by
snakemake --cores all --use-singularity --configfile workflow/rules/structure_learning_algorithms/pyagrum/pyagrum.json --rerun-incomplete
If you know any data scenario where it performs well, let me know!

phwuil · 2024-06-24T06:31:26Z

Hi @felixleopoldo, thank you for this. I have to admit that I did not know before it was pointed out to me by @bdatko. Thanks for both of you.
So I will have to learn how to use it. :-) (if you have THE good ref to help, please tell me :-) !)

felixleopoldo · 2024-06-24T07:08:49Z

I see, no worries:) If you mean the main reference to Benchpress it is here. It is not mentioned there, but you can also run it under WSL on Windows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pyAgrum package and MIIC algorithm? #115

Add pyAgrum package and MIIC algorithm? #115

bdatko commented Jun 7, 2024 •

edited

Loading

felixleopoldo commented Jun 8, 2024

bdatko commented Jun 8, 2024 •

edited

Loading

felixleopoldo commented Jun 9, 2024

bdatko commented Jun 12, 2024

felixleopoldo commented Jun 12, 2024

bdatko commented Jun 12, 2024

felixleopoldo commented Jun 16, 2024

phwuil commented Jun 18, 2024

felixleopoldo commented Jun 19, 2024

phwuil commented Jun 20, 2024 •

edited

Loading

phwuil commented Jun 20, 2024 •

edited

Loading

felixleopoldo commented Jun 24, 2024 •

edited

Loading

phwuil commented Jun 24, 2024

felixleopoldo commented Jun 24, 2024

Add pyAgrum package and MIIC algorithm? #115

Add pyAgrum package and MIIC algorithm? #115

Comments

bdatko commented Jun 7, 2024 • edited Loading

felixleopoldo commented Jun 8, 2024

bdatko commented Jun 8, 2024 • edited Loading

felixleopoldo commented Jun 9, 2024

bdatko commented Jun 12, 2024

felixleopoldo commented Jun 12, 2024

bdatko commented Jun 12, 2024

felixleopoldo commented Jun 16, 2024

phwuil commented Jun 18, 2024

felixleopoldo commented Jun 19, 2024

phwuil commented Jun 20, 2024 • edited Loading

phwuil commented Jun 20, 2024 • edited Loading

felixleopoldo commented Jun 24, 2024 • edited Loading

phwuil commented Jun 24, 2024

felixleopoldo commented Jun 24, 2024

bdatko commented Jun 7, 2024 •

edited

Loading

bdatko commented Jun 8, 2024 •

edited

Loading

phwuil commented Jun 20, 2024 •

edited

Loading

phwuil commented Jun 20, 2024 •

edited

Loading

felixleopoldo commented Jun 24, 2024 •

edited

Loading