Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] design discussion - pdf and pmf in distributions, discrete, continuous, and mixed #229

Open
fkiraly opened this issue Mar 31, 2024 · 1 comment
Labels
API design API design & software architecture module:probability&simulation probability distributions and simulators

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 31, 2024

This is a design discussion on how to handle pdf and pmf in distrubtions, which can be discrete, continuous (short for "absolutely continuous"), and mixed. Assuming domain on the real numbers, and distributions without singular component.

scipy handles these as follows:

  • pmf is present and pdf is not present, for discrete distributions.
  • pdf is present and pmf is not present, for continuous distributions.
  • no support for mixed distributions.

I think it would be more consistent with composition and unified interfaces a la sklearn if all distributions had all these methods, and they correspond to the measures in the Lebesgue decomposition. That is,

  • pmf and pdf are present in all distributions
  • the sum of measures implied by pmf and pdf is a probability measure

In particular, this would mean:

  • for discrete distributions, pmf sums to one, and pdf is always zero
  • for continuous distributions, pdf integrates to one, and pmf is always zero
  • for mixed distributions, integral of pdf and sum of pmf sum to one. In general, the pdf integral, or pmf sum are not equal to one.

Being faithful to the Lebesgue decomposition also has an advantage in mixtures: the pdf and pmf of a m = Mixture([d1, d2], [w1, w2]) has m.pdf = w1 * d1.pdf + w2 * d2.pdf, and m.pmf = w1 * d1.pmf + w2 * d2.pmf, irrespective of components d1, d2 being continuous, discrete, or mixed. (assuming w1 + w2 == 1).

In a sense, this seems to be the convention that treats all edge cases consistently.

Thoughts?

@fkiraly fkiraly added module:probability&simulation probability distributions and simulators feature request New feature or request API design API design & software architecture and removed feature request New feature or request labels Mar 31, 2024
@ShreeshaM07
Copy link
Contributor

ShreeshaM07 commented Mar 31, 2024

Being faithful to the Lebesgue decomposition also has an advantage in mixtures: the pdf and pmf of a m = Mixture([d1, d2], [w1, w2]) has m.pdf = w1 * d1.pdf + w2 * d2.pdf, and m.pmf = w1 * d1.pmf + w2 * d2.pmf, irrespective of components d1, d2 being continuous, discrete, or mixed. (assuming w1 + w2 == 1).
In a sense, this seems to be the convention that treats all edge cases consistently.

Yes, that is correct it will handle all edge cases irrespective of d1, d2 being continuous, discrete or mixed as whenever the distribution becomes discrete the pdf integrates to 0 in that interval only the pmf will contribute in that interval.
And whenever the distribution becomes continuous in an interval the pmf sum will be 0 and only the pdf will contribute in that interval.
So in case of mixed distribution m.pdf = w1 * d1.pdf + w2 * d2.pdf, and m.pmf = w1 * d1.pmf + w2 * d2.pmf will still be true.
And m.pdf + m.pmf == 1 will also be true when we consider the whole interval ie (-inf, inf).

@fkiraly fkiraly changed the title [ENH] design discussion - pdf and pmf in distrubtions, discrete, continuous, and mixed [ENH] design discussion - pdf and pmf in distributions, discrete, continuous, and mixed Apr 4, 2024
fkiraly added a commit that referenced this issue May 4, 2024
This PR adds a `pmf` and `log_pmf` method to the base interface. Fixes
#289

In accordance with #229, these return 0 resp `-np.inf` if the
distribution is continuous.

Also makes the following, connected changes:
* `pdf` return 0 for discrete distributions
* removes the discrete/continuous handling logic from the `scipy`
adapter, as this is now in the base class

I've also changed the way in which `TestScipyAdapter` queries the
distributions - by inheritance, not by tag. This is since the tag is
"mechanical" (for internal testing only) and it might confuse users to
see a value in `object_type` which is not related to an external API
property.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture module:probability&simulation probability distributions and simulators
Projects
None yet
Development

No branches or pull requests

2 participants