Skip to content

Implement Adjusted mutual information #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

christiangnrd
Copy link

I may have gone overboard with the code reuse. I can refactor if needed.

@codecov-commenter
Copy link

codecov-commenter commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 98.38710% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.50%. Comparing base (b4df21a) to head (275b9dc).
Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
src/mutualinfo.jl 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
+ Coverage   95.40%   95.50%   +0.09%     
==========================================
  Files          20       20              
  Lines        1503     1556      +53     
==========================================
+ Hits         1434     1486      +52     
- Misses         69       70       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alyst
Copy link
Member

alyst commented Apr 9, 2025

Thank you for the contribution! I think it would be useful addition.
I have not reviewed your PR in the detail, but there are some things that I have noticed:

  • You introduce the new adjustedmutualinfo method, I think it would be better to keep a single mutualinfo(), just add the method kwarg, and allow method-specifc kwargs (averaging, normed).
  • I also think it is better to stay conservative with Clustering deps.
    You can move SpecialFunctions to the weakdeps (because it brings ChainRules, OpenLibm, LogExpFunctions with it etc) and move the _mutualinfo_adjusted() into the ClusteringSpecialFuncsExt.

@christiangnrd
Copy link
Author

Thanks for the quick feedback!

Regarding your first point, isn’t that breaking? And how should I deal with the 4 variants of the adjusted mutual information? Have one symbol for each? (And another that’s just :adjusted as default)

And good point about SpecialFunctions, I somehow assumed it was lightweight and did not verify.

@alyst
Copy link
Member

alyst commented Apr 10, 2025

Regarding your first point, isn’t that breaking? And how should I deal with the 4 variants of the adjusted mutual information? Have one symbol for each? (And another that’s just :adjusted as default)

I am thinking about something like

mutualinfo(a, b; method = :classic, kwargs...) = _mutualinfo(Val{method}, a, b; kwargs...)

function _mutualinfo(::Val{:classic}, a, b; normed::Bool = true)
    ....
end

function _mutualinfo(method::Val, a, b; kwargs...)
    if method[] == :adjusted
        error("mutualinfo(..., method=:adjusted) requires SpecialFunctions package")
    else
        error("mutualinfo(): method=$(method) is not supported")
    end
end

and then in the extension

function _mutualinfo(::Val{:adjusted}, a, b; aggregate::Symbol = :mean)
    ....
end

So since the default method value is :classic, the old code should not be broken (I don't know if :classic is the best name, probably you know better).
I think this is a pretty standard pattern for Julia extensions.
And it allows each method to have its own keyword arguments.
So it will automatically pass aggregate kwarg to your adjusted mutual information implementation
(I suggest to rename average_method to aggregate (taking values :mean, :max, :min, :geomean), since it's a more generic term, which I think better fits here)

@christiangnrd
Copy link
Author

@alyst I made the changes you suggested with a few adjustments. The method parameter can now take :classic, ':normalized, and :adjusted`. I did it in a way to preserve backwards compatibility, but I think this is better since the NMI is no less a different variant of MI than AMI is.

As for the value to use for the regular Mutual Information, :classic could work. I think I slightly prefer :original, but I have no justification other than I like it more, so I'll go with whatever you think is best.

The 1.0 test is failing because LazyStrings were only added in 1.8. Can I raise the minimum Julia to 1.10 (the latest LTS)? I can't imagine anyone is using an older version of Julia while also updating their environment.

@alyst
Copy link
Member

alyst commented Apr 10, 2025

I made the changes you suggested with a few adjustments.

Thank you!

The method parameter can now take :classic, ':normalized, and :adjusted`.

I think it is better to raise ArgumentError if method == :normalized and normed == false.
Otherwise you silently give normed param priority over method, but in this case both method and normed are non-default, so they must have been explicitly given by the user.

Also, if normed kwarg is in the generic mutualinfo() function, it has to be supported by all methods, but it looks like for :adjusted it will be no-op at the moment.
If adjusted would not be supporting normed, I think it is better to keep it specific for "classic" method only, and the adjusted one will automatically throw MethodError if normed = is supplied.

Regarding :normalized: you can have this logic in the generic mutualinfo that if method == :normalized, you detect it and call _mutualinfo(Val(:classic), ...; normed = true, kwargs...).

Can I raise the minimum Julia to 1.10 (the latest LTS)?

Extensions will require us to bump Julia requirement to 1.8 (and we can use backward compatibility packages to support earlier versions, probably 1.6 LTS) -- which is reasonable, given how outdated 1.0 is by now.
LazyStrings are nice, but I don't see a real use-case for Clustering.jl where the advantages of LazyStrings will be noticeable.
The negative effect of setting Julia to 1.10+ is that 1.10 is quite recent, and it will exclude many users of older Julia versions from the latest Clustering.jl.

@christiangnrd
Copy link
Author

christiangnrd commented Apr 10, 2025

I think it is better to raise ArgumentError if method == :normalized and normed == false.
Otherwise you silently give normed param priority over method, but in this case both method and normed are non-default, so they must have been explicitly given by the user.

method=:normalized is the default since previously, normed=true was default. I changed things around such that method and normed can't be used together, and I officially deprecated normed. Now, an ArgumentError is thrown for any invalid combination of arguments.

Extensions will require us to bump Julia requirement to 1.8 (and we can use backward compatibility packages to support earlier versions, probably 1.6 LTS) -- which is reasonable, given how outdated 1.0 is by now.
LazyStrings are nice, but I don't see a real use-case for Clustering.jl where the advantages of LazyStrings will be noticeable.
The negative effect of setting Julia to 1.10+ is that 1.10 is quite recent, and it will exclude many users of older Julia versions from the latest Clustering.jl.

Since I removed the use of LazyString, this package works for every version since 1.0. For Julia versions before 1.9, SpecialFunctions will simply be treated as a regular dependency.

@alyst alyst requested a review from Copilot May 18, 2025 07:26
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements adjusted mutual information by expanding the mutualinfo API to support additional methods and aggregates while deprecating the old normed parameter. Key changes include:

  • Updates in the tests (test/mutualinfo.jl) to cover classic, normalized, and adjusted mutual information methods along with error cases.
  • Refactoring of src/mutualinfo.jl to support method dispatch based on a new keyword argument and integration of adjusted mutual information.
  • Introduction of a new module (ext/SpecialFunctionsExt.jl) to handle the adjusted MI computation and conditional inclusion in src/Clustering.jl.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/runtests.jl Added the SpecialFunctions import to support tests with adjusted mutual information.
test/mutualinfo.jl Expanded tests to cover multiple mutual information methods and error conditions.
src/mutualinfo.jl Modified API and internal dispatch to support a new :adjusted method with aggregates.
src/Clustering.jl Added conditionally including the SpecialFunctionsExt.jl module based on Base definition.
ext/SpecialFunctionsExt.jl New module for adjusted mutual information computation using SpecialFunctions.

@christiangnrd
Copy link
Author

christiangnrd commented May 18, 2025

@alyst This is ready to merge and tests are passing!

Changes I've made since the review:

  • I've made some tweaks to comments to make them more clear based on Copilot's wrong comments
  • I've fixed tests and code for older Julia versions.
  • I've reorganized the PR into 4 commits: Code before today, tweaks from today, cleanup of Project.toml, and 2 fixes to CI to silence warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants