Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make inspect.Parameter.__hash__ use _default instead of default #102302

Closed
Gouvernathor opened this issue Feb 27, 2023 · 13 comments
Closed

Make inspect.Parameter.__hash__ use _default instead of default #102302

Gouvernathor opened this issue Feb 27, 2023 · 13 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@Gouvernathor
Copy link
Contributor

Gouvernathor commented Feb 27, 2023

Feature or enhancement

Enable subclasses of Parameter to add a default property accessing hash(self) rather than having hash rely on the default property

Pitch

I wanted to make a subclass of Parameter which would store as _default the string eval-ing to the actual default value, and which would only eval it when accessed, and memorize it at that time.
So, I coded this :

class Parameter(inspect.Parameter):
    __slots__ = ()
    @property
    @functools.cache
    def default(self):
        return renpy.python.py_eval(super(Parameter, self).default)

(not using @cached_property because I want to keep the empty slots)
This doesn't work, because cache tries to hash self, and Parameter.__hash__ accesses self.default, rather than self._default (so it does an infinite recursion loop).

The last edit to the hash method was purportedly to "fix a discrepancy between eq and hash". However, the eq still accesses the raw _default, and hash did so too before the "fix".
Overall, apart from my selfish use-case, I think it makes more sense for an immutable class to hash relative to the actual value instead of the less-reliable propertied value.
And there's a minor performance bonus too.

Linked PRs

@Gouvernathor Gouvernathor added the type-feature A feature request or enhancement label Feb 27, 2023
@AlexWaygood AlexWaygood added the stdlib Python modules in the Lib dir label Feb 28, 2023
@AlexWaygood
Copy link
Member

Can you expand on your motivation for wanting to subclass inspect.Parameter? I don't think the class is designed to be extensible, and the docs don't advertise it as such.

@AlexWaygood
Copy link
Member

"Consistency with __eq__" isn't a valid reason to make this change unless there's a user-visible bug being fixed. Optimising code for performance is a valid reason to make this change, but I sort of doubt that inspect.Parameter.__hash__ is ever going to be a performance bottleneck in real code, so even there, I'm not sure that this proposed change is worth the code churn.

@Gouvernathor
Copy link
Contributor Author

Subclassing Parameter is just as useful as subclassing Signature, and Signature is expected to be subclassable. Do we need special permission before subclassing something builtin ? In my experience, classes resisting inheritance are the exception rather than the norm, and those would deserve a special warning, instead of an explicit grant whenever subclassing is allowed.
Not counting cases when types are just mentioned, like Parameter.kind which is not described as its own class (and the class object is not public).

@AlexWaygood
Copy link
Member

You haven't addressed my question :)

You don't need permission to subclass anything, of course, but any change to the standard library needs a strong motivation. Can I ask again what your motivation is for subclassing inspect.Parameter, so that I can better understand the bug that's being fixed here?

@Gouvernathor
Copy link
Contributor Author

Gouvernathor commented Feb 28, 2023

The idea is to manually parse a language we defined where things take parameters. However, unlike Python, we don't evaluate defaults at parse time, but at call time, which means the "default" field of parameters are really a string evaluating to their actual default value, or P.empty (or None in our previous implementation) for required parameters.

As life goes on, we find ourselves in the need to manipulate signatures with literal defaults, and signatures with evaluated defaults. We tried an implementation with two different Signature subclasses for these two cases but they share the same Parameter class (the native one, at this point), so if we somehow need to manipulate both kinds of signatures in the same function, there's a chance a mixup happens between the two kinds of parameter. To solve that, subclassing Parameter would be useful, so we can have a literalParameter on one side, and a evaledParameter on the other (the latter probably being the native class, but whatever).

Then, for performance reasons, I had the idea of only computing the evaled value of the default when required, and memorizing it after, which means that the default property evaluates the literal string the first time it's accessed, and then caches it and returns the cached value (the example I used on top of the thread).
But caching doesn't work, because be it using functools.cache or a class-bound WeakSet (tried that too), the Parameter object needs to be hashed, and when you subclass the default property that causes an infinite recursion.

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 28, 2023

This is a very specific use case that doesn't work currently:

  • You need to specify an empty __slots__ in the subclass
  • You need to override default in the subclass
  • default must use @functools.cache to cache results in the subclass.

All of these are reasonable, but I wonder if you couldn't just cache results in an alternative way in your code?

class Foo(Parameter):
    __slots__ = ("_default",)
    @property
    def default(self):
        try:
            return self._default
        except AttributeError:
            self._default = my_expensive_function_call()
            return self._default

This alternative solution also avoids well-known pitfalls around using functools.cache or functools.lru_cache on methods or properties.

@Gouvernathor
Copy link
Contributor Author

Gouvernathor commented Feb 28, 2023

You need to rename _default to _evaled_default in your code otherwise it collides with inspect.Parameter._default (which I need to pass to my expensive function). But anyway, I get the picture.

Yes, I guess I could do that. But I also think the original recursion error is obscure and easily fixable, and comes from an inconsistency which, if not a good enough reason by itself to change the code, certainly doesn't go in favor of leaving it as it is.
I happen to prefer a dict-like cache solution, such as the one I had with a WeakKeyDict (not a WeakSet, sorry) bound to the class, which does not reproduce the pitfalls you described but still requires hashing to be independent of the method or property you're caching.

Solving the inconsistency would unlock the other ways of implementing "my" feature, for no cost that I can see. I mean, I understand the fear of changing working code, sure, but what bad consequence would the change cause ? If it has any, it means that a property was overriding Parameter's native ones and returning something other than the underlying values (_name, _kind, _default and _annotation respectively). If so, that means the hash was not constant... and that's way worse, right ?

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 28, 2023

I'm afraid you're approaching this from the wrong angle. The burden is on you to explain why this change would be generally useful to users of inspect.Parameter; the burden is not on me to explain why this change shouldn't be made.

One of the issues I'm having is that there's little point fixing a bug or adding a feature unless we include a test; otherwise, the behaviour we're trying to enable could easily be broken accidentally in the future if somebody fixes a different bug or refactors the code for some other reason. So what would a test look like here? It could look something like this:

def test_Signature_can_be_subclassed_and_functools_cache_can_be_used_on_properties_in_the_subclass(self):
    class ParameterSubclass(inspect.Parameter):
        __slots__ = ()
        @property
        @functools.cache
        def default(self):
            return super().default
    
    self.assertEqual(ParameterSubclass("x", inspect.Parameter.POSITIONAL_ONLY, default=3).default, 3)

...But that seems like such an absurdly specific test that it makes me wonder whether this really is a problem where fixing it would be generally useful to the Python community at large, or whether this would be code churn for little gain.

@Gouvernathor
Copy link
Contributor Author

Gouvernathor commented Feb 28, 2023

My reasons to want this change are that I prefer a dict-like cache solution, such as the one I had with a WeakKeyDict bound to the class, which does not reproduce the property-functools-cache pitfalls but still requires hashing to be independent of the method or property you're caching. Solving the inconsistency between the dunders would unlock using hash(self) within property calls for any other purpose. It would also disable incorrect hash behaviors relying on inconstant overridden properties and yielding variable hashes for the same instance - which is a violation of the data model as far as I understand it.

Sumup, which could end up being the news entry:

  • enable subclassing Parameter, overriding the properties, and:
    • making them hash(self)
    • making them not constant and not automatically have the subclass violate the data model
  • (bonus) code consistency and performance

So, I would write the test that way :

class ParameterSubclass(inspect.Parameter):
    __slots__ = ()
    @property
    def default(self):
        return random.random()
    # same for the other properties

inst = ParameterSubclass("x", POSITIONAL_ONLY, default=5)
self.assertEqual(hash(inst), hash(Parameter("x", POSITIONAL_ONLY, default=5))) # hash doesn't rely on properties but on the actual values
self.assertEqual(hash(inst), hash(inst)) # constant hash for the same instance
# same with the instances themselves to test __eq__, and possibly pickle if we want to test __reduce__ too

@AlexWaygood
Copy link
Member

AlexWaygood commented Feb 28, 2023

Thanks. I'm afraid I'm still unconvinced here. It just doesn't seem to me that inspect.Parameter is designed to be subclassed in this way, so I don't find the argument that we should improve support for this kind of subclassing persuasive. I understand that this is frustrating for your use case, but it seems like a pretty unusual use case, and the bar for changing things in the standard library is high. If we changed things every time a user had an inconvenience like the one you're describing, the code churn would be huge.

Note that even if we did change things here, we likely wouldn't backport it, since this is arguably a feature rather than a bug.

I'll leave this open for a few days in case another core dev feels differently.

@AlexWaygood AlexWaygood added the pending The issue will be closed if no feedback is provided label Feb 28, 2023
@Gouvernathor
Copy link
Contributor Author

Gouvernathor commented Feb 28, 2023

If the change is not done in the end, then I suggest warning against subclassing Parameter in the doc, so that the inconstant hash problem is covered (which doesn't raise an exception, unlike my cache use case).

@AlexWaygood
Copy link
Member

If the change is not done in then end, then I suggest warning against subclassing Parameter in the doc, so that the inconstant hash problem is covered (which doesn't raise an exception, unlike my cache use case).

Subclassing Parameter should work without complications as long as you're only adding new behaviour. The difficulty here is that you're not adding new behaviour: you're overriding behaviour that already exists on the superclass, which is a much more delicate/risky thing to be doing. There are many classes in the stdlib which are "open for extension, closed for modification" in this way, so I'm not sure that a note should be added to the docs for Parameter here.

@AlexWaygood AlexWaygood closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2023
@AlexWaygood
Copy link
Member

I still disagree with the rationale put forward in this issue thread for making the change, but I was persuaded to merge the linked PR anyway due to a microbenchmark showing a decent speedup.

@AlexWaygood AlexWaygood added performance Performance or resource usage and removed pending The issue will be closed if no feedback is provided labels Mar 4, 2023
hugovk pushed a commit to hugovk/cpython that referenced this issue Mar 6, 2023
carljm added a commit to carljm/cpython that referenced this issue Mar 6, 2023
* main: (21 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
carljm added a commit to carljm/cpython that referenced this issue Mar 7, 2023
* main: (37 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants