[red-knot] Improve the `unresolved-import` check #13007

AlexWaygood · 2024-08-20T12:55:18Z

Summary

This PR improves the unresolved-import check so that it once again is able to recognise unresolved import from imports. It adds the same tests as #12986, but doesn't add the UnknownTypeKind enum that that PR introduces.

I've had to comment out some assertions in the tests for now. One case that passes with #12986 but doesn't work with this PR is this:

src/a.py: import foo as foo
src/b.py: from a import foo

This should only cause an unresolved-import diagnostic to be emitted on a.py, and this is the case with #12986, but with this PR unresolved-import diagnostics are emitted on both files.

Another case that passes with #12986 but not with this PR is the case where a symbol that exists, but has an inferred type of Unknown, is imported from another module. This should not trigger an unresolved-import diagnostic, but does with this PR, as it's hard to determine the root cause of the Unknown type.

Overall, this PR feels to me like it's an improvement on the status quo, but inferior to #12986. It's also a less significant change than #12986, however, since it's not a significant change to the design of how we think about types in red-knot, so the idea is that it should be possible to land this and then rebase #12986 on top of it.

Test Plan

Several tests added to crates/red_knot_python_semantic/src/types.rs and crates/red_knot_python_semantic/src/types/infer.rs. The assertions in the benchmarks have also been updated.

github-actions · 2024-08-20T13:30:46Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

crates/red_knot_python_semantic/src/types/infer.rs

MichaReiser · 2024-08-20T14:57:15Z

crates/red_knot_python_semantic/src/types/infer.rs

@@ -977,11 +996,34 @@ impl<'db> TypeInferenceBuilder<'db> {
        // the runtime error will occur immediately (rather than when the symbol is *used*,
        // as would be the case for a symbol with type `Unbound`), so it's appropriate to
        // think of the type of the imported symbol as `Unknown` rather than `Unbound`
-        let ty = module_ty
+        let member_ty = module_ty
+            .unwrap_or(Type::Unbound)
            .member(self.db, &Name::new(&name.id))


I still find it odd that we have to do the "unbound" check everywhere where we call .member. Do we not always want to emit a diagnostic when failing to resolve a member?

We do, but it will need to be different error codes. It would be very counterintuitive for users if import foo triggers an unresolved-import error code and from foo import bar triggers an unresolved-member error code. Users will expect all import-related failures to be the same error code and all attribute-access-related failures to be a different error code.

That makes sense. We might just want to parameterize the member method. But maybe that's not worth it? I don't know. Have you looked at how mypy/pyright do this?

We might just want to parameterize the member method.

Not sure exactly what you mean; could you clarify?

Have you looked at how mypy/pyright do this?

Mypy has a very different architecture to the one we're building. It resolves all imports eagerly in a first pass before it does any other type checking; module-not-found errors are emitted during this first pass here: https://github.com/python/mypy/blob/fe15ee69b9225f808f8ed735671b73c31ae1bed8/mypy/build.py#L2600-L2700.

I'm not as familiar with pyright's codebase but I can dig in now

Hmm... it seems both mypy and pyright in fact do what you'd prefer here -- they both use "attribute-access" error codes for from x import y imports:

pyright: https://pyright-play.net/?pyrightVersion=1.1.368&code=GYJw9gtgBAxmA28CmMAuBLMA7AzgOgEMAjGKdCABzBFSmDDCA

mypy: https://mypy-play.net/?mypy=latest&python=3.12&gist=3110174b575f8d8eba319462c45a8ddf

Pyright's error code is reportAttributeAccessIssue; mypy's is attr-defined, which both seem equally bad. In terms of error messages, pyright definitely wins, though: pyright has:

"foo" is unknown import symbol

whereas mypy has

Module "collections.abc" has no attribute "foo"

I'm somewhat surprised by this. But given this precedent, I'm okay with emitting the diagnostic from .member(), if you'd prefer. Though I hope we still aspire to be more user-friendly in our error messages than either mypy or pyright.

Oh, I agree that that makes a lot of sense! I can make that change

Hmmm... if the inferred type of a member access is a union, e.g. int | Unbound, would you expect that to be represented as an Ok() or an Err() variant (if we return an Result from .member()?

having a closure or similar to create a diagnostic if the member is unbound.

This might be doable, but after looking at it for a little bit I think it would still be quite a large refactor and design change. I think it's out of the scope of this PR.

I'm not convinced there is any problem with reporting an error on from foo import bar (where the module foo does exist, but has no top-level symbol bar) as an attribute-access error code. If I had to pick between the pyright or mypy error messages above, I would prefer the mypy one (the pyright one gives less useful context). The fact that the attribute access occurred as part of a from-import will be clear from the error location; it doesn't have to be part of the error message or code. I wouldn't want to make design decisions based on the hypothesis that this is confusing to users without clear evidence that it is (e.g. a history of users filing issues on mypy about that error code or message); personally I find it intuitive.

I don't think we should return an Option or Result from member(), for the reason @AlexWaygood mentions above: we can have a possibly-unbound name (currently represented as a union with Unbound), and it's not clear how that should look in a binary representation.

We could design our own representation for "Type plus definitely-bound/maybe-unbound/definitely-unbound state" and use that instead of Type::Unbound. This will complicate some APIs and I think be somewhat less efficient, but it would have the advantage that it would force handling Unbound early, which I think is desirable. In general I don't think we want Unbound escaping from the scope where it originated; it should instead result in a diagnostic and be eliminated from the type (replaced with Unknown if we have no other type information; probably just disappear from the type if we do.) I do think this alternative to Type::Unbound is worth exploring and seeing what kind of impact it will have.

I was thinking that we would push diagnostics inside member() to simplify the handling of Unbound, and I think I still favor that approach. If we do want to make distinctions in error codes/messages based on context from the caller of member(), I think we can always pass in context to make that possible.

MichaReiser

I would still have preferred to implement a holistic solution to how we handle unresolved members rather than implementing a one off solution because I'm worried that we (I for example) end up copying this pattern. That was also the reason why I intentionally didn't implement this diagnostic in my original PR.

I'm okay with merging if you find the parity important but I ask you to at least add a TODO comment to make it clear that this pattern should not be copied.

crates/red_knot_python_semantic/src/types.rs

AlexWaygood · 2024-08-21T11:14:18Z

I would still have preferred to implement a holistic solution to how we handle unresolved members

Well, so would I! #12986 was my attempt 😆 The idea behind this PR was to try to implement as many of the fixes implemented there as possible without fundamentally reworking red-knot's design, so that it would be clearer exactly what the pros and cons of doing so would be.

I ask you to at least add a TODO comment to make it clear that this pattern should not be copied.

I'll add this

Update crates/red_knot_python_semantic/src/types/infer.rs Co-authored-by: Micha Reiser <micha@reiser.io>

codspeed-hq · 2024-08-21T13:15:51Z

CodSpeed Performance Report

Merging #13007 will improve performances by 4.16%

_{Comparing alex/redknot-import-from (2c1fb1d) with main (785c399)}

Summary

⚡ 1 improvements
✅ 31 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`alex/redknot-import-from`	Change
⚡	`linter/default-rules[unicode/pypinyin.py]`	354.7 µs	340.6 µs	+4.16%

crates/red_knot_python_semantic/src/types.rs

carljm · 2024-08-22T00:55:07Z

crates/red_knot_python_semantic/src/types.rs

+        let b_file_diagnostics = super::check_types(&db, b_file);
+        assert_diagnostic_messages(
+            &b_file_diagnostics,
+            &["Could not resolve import of 'thing' from 'a'"],


Here I think describing it as a failed attribute access would actually be much clearer than the vague "Could not resolve" which doesn't really communicate anything! This message also doesn't currently clarify that 'a' is a module (though it could be taken as implied.) I actually can't think of an error message here that I think would be better than "Module a has no attribute thing."

Do people really think of from collections import deque as an "attribute access" of deque from the collections module? I certainly don't. Of course that's what's happening under the hood, but even at runtime this detail is hidden -- the import fails with ImportError rather than AttributeError, and there's no mention of attempting to access an attribute:

% python -c "from collections import foo" Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: cannot import name 'foo' from 'collections' (/Users/alexw/.pyenv/versions/3.12.4/lib/python3.12/collections/__init__.py)

carljm · 2024-08-22T00:57:00Z

crates/red_knot_python_semantic/src/types.rs

+A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
+despite the symbol existing in the symbol table for `a.py`"]


I think when we issue the error on the import in a.py, we should set the type of foo in a to Unknown, not Unbound, which should avoid this?

carljm · 2024-08-22T00:59:21Z

crates/red_knot_python_semantic/src/types/infer.rs

+        &self,
+        tail: Option<&str>,
+        level: NonZeroU32,
+    ) -> Result<ModuleName, ModuleResolutionError> {


Why not just emit the diagnostics inside this function, return an Option, and avoid the need for ModuleResolutionError?

carljm · 2024-08-22T01:01:41Z

crates/red_knot_python_semantic/src/types/infer.rs

+                    module.unwrap_or_default()
+                ),
+            );
+        } else if module_ty.is_ok() && member_ty.is_unknown() {


I don't think we should be emitting a diagnostic at all if the imported symbol is of type Unknown. Unknown might result from an error in the other module, but in that case the diagnostic should occur in the other module; importing a value of unknown type is not an error.

I think just removing this clause would fix the ignored test; what would it break?

Yeah, the only reason we have to check for Unknown here is that we intentionally replaced Unbound with Unknown just a few lines above. If we wait to do it after, the bug in the ignored test goes away, and all tests pass:

diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs index 173c957d1..c4ff40220 100644 --- a/crates/red_knot_python_semantic/src/types.rs +++ b/crates/red_knot_python_semantic/src/types.rs @@ -451,9 +451,6 @@ mod tests { ); } - #[ignore = "\ -A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \ -despite the symbol existing in the symbol table for `a.py`"] #[test] fn resolved_import_of_symbol_from_unresolved_import() { let mut db = setup_db(); diff --git a/crates/red_knot_python_semantic/src/types/infer.rs b/crates/red_knot_python_semantic/src/types/infer.rs index 9b183727e..28dc8f2e3 100644 --- a/crates/red_knot_python_semantic/src/types/infer.rs +++ b/crates/red_knot_python_semantic/src/types/infer.rs @@ -1012,15 +1012,9 @@ impl<'db> TypeInferenceBuilder<'db> { asname: _, } = alias; - // If a symbol is unbound in the module the symbol was originally defined in, - // when we're trying to import the symbol from that module into "our" module, - // the runtime error will occur immediately (rather than when the symbol is *used*, - // as would be the case for a symbol with type `Unbound`), so it's appropriate to - // think of the type of the imported symbol as `Unknown` rather than `Unbound` let member_ty = module_ty .unwrap_or(Type::Unbound) - .member(self.db, &Name::new(&name.id)) - .replace_unbound_with(self.db, Type::Unknown); + .member(self.db, &Name::new(&name.id)); if matches!(module_ty, Err(ModuleResolutionError::UnresolvedModule)) { self.add_diagnostic( @@ -1032,7 +1026,7 @@ impl<'db> TypeInferenceBuilder<'db> { module.unwrap_or_default() ), ); - } else if module_ty.is_ok() && member_ty.is_unknown() { + } else if module_ty.is_ok() && member_ty.is_unbound() { self.add_diagnostic( AnyNodeRef::Alias(alias), "unresolved-import", @@ -1044,6 +1038,8 @@ impl<'db> TypeInferenceBuilder<'db> { ); } + let member_ty = member_ty.replace_unbound_with(self.db, Type::Unknown); + self.types.definitions.insert(definition, member_ty); }

Thanks, I think you're right that this fixes the tests I added (and skipped) in this PR. Here's an interesting edge case that you might not have considered, though. What should we do for something like this?

# foo.py for x in range(0): pass # bar.py from foo import x

Running python bar.py fails with ImportError at runtime because x is not defined in foo.py (due to the iterable being empty), but neither mypy nor pyright catches this. Arguably we should infer the type of x after the for loop has completed as being int | Unbound -- should we flag any union that includes Unbound as one of the elements with an unresolved-import diagnostic? Probably not, I don't think -- the import might succeed, it just also might not. So let's say (for argument's sake) we use the error code import-possibly-unbound for that.

So then what if we have something like this, where the symbol definitely exists in foo.py, but is also definitely unbound? (Mypy and pyright do both catch this one.) Should that have the error code unresolved-import, or import-possibly-unbound?

# foo.py x: int # bar.py from foo import x

These are somewhat obscure edge cases, so I think what you propose here is a strict improvement on what I implemented in this PR; I'll make a fixup PR to address your suggestions. But my point is that we may still at some point need some context propagated regarding the cause of the error -- whether that error is represented as Unknown or Unbound.

Yeah, great points and great questions! I had the x: int case in mind last night when thinking again about representation of unbound-ness in a different comment on this PR.

I agree that maybe-unbound is a different error class than definitely-unbound (and arguable whether maybe-unbound should trigger an error on import at all.)

I think we do want a representation of "definitely unbound and typed" that we use for imports, so e.g. we can error if you import from x: int (as long as it's not a stub), but we can still type check the importing module treating x as int rather than Unknown. (Not sure how important this is, but it seems marginally better.) I think this can be handled by declared vs inferred types.

What's less clear to me is if we need a representation of "definitely unbound and typed" for inferred types. It would mean that within a scope we could also check code following x: int and emit diagnostics both for "x is not bound" and for use of x that isn't valid for an int. Maybe that's useful?

Like I mentioned above in a different comment, I'm pretty open to exploring other representations of unbound to make it orthogonal to types; the main question for me is if we can avoid it regressing in perf, and if not, do we gain enough from it to be worth the regression?

carljm · 2024-08-22T01:04:02Z

crates/red_knot_python_semantic/src/types/infer.rs

-        }
+        &self,
+        module_name: ModuleName,
+    ) -> Result<Type<'db>, ModuleResolutionError> {


I don't see the advantage of returning a Result vs just emitting the diagnostic directly in this method, as the previous code did.

carljm · 2024-08-22T01:35:36Z

While I'm making notes on this PR of things that I'd like to address in this area: we've now started to accumulate some tests in types.rs that (as far as I can see) are indistinguishable in purpose and kind from the tests we already have in infer.rs. Maybe all of the tests should be moved out to types.rs instead of infer.rs, or should actually live in their own file (or better should be in a more concise testing framework), but I don't think we should be accumulating these tests in two different files without a clear rationale for which tests go in which file.

AlexWaygood · 2024-08-22T11:02:29Z

While I'm making notes on this PR of things that I'd like to address in this area: we've now started to accumulate some tests in types.rs that (as far as I can see) are indistinguishable in purpose and kind from the tests we already have in infer.rs. Maybe all of the tests should be moved out to types.rs instead of infer.rs, or should actually live in their own file (or better should be in a more concise testing framework), but I don't think we should be accumulating these tests in two different files without a clear rationale for which tests go in which file.

I think the idea is that the tests in types.rs are specifically testing which diagnostics are emitted, whereas the tests in infer.rs are (currently) solely focussed on which types are inferred

MichaReiser · 2024-08-22T11:37:09Z

I think the idea is that the tests in types.rs are specifically testing which diagnostics are emitted, whereas the tests in infer.rs are (currently) solely focussed on which types are inferred

My initial idea was to only test that typecheck emits diagnostic and I picked the only existing diagnostic. My intention isn't to test specific diagnostics.

carljm · 2024-08-22T19:05:05Z

I think the idea is that the tests in types.rs are specifically testing which diagnostics are emitted, whereas the tests in infer.rs are (currently) solely focussed on which types are inferred

Ah! Yes, I see that distinction does seem to apply currently. I think longer-term it doesn't necessarily make sense to split tests according to this distinction, though. At some point I suspect ~all of our tests will be tests of diagnostics, which can include a diagnostic from reveal_type() as one edge case.

AlexWaygood added the red-knot Multi-file analysis & type inference label Aug 20, 2024

AlexWaygood requested review from carljm and MichaReiser as code owners August 20, 2024 12:55

AlexWaygood force-pushed the alex/redknot-import-from branch from 7883c7a to 96dbfdd Compare August 20, 2024 13:16

AlexWaygood force-pushed the alex/redknot-import-from branch from 96dbfdd to 36dca2a Compare August 20, 2024 13:34

MichaReiser reviewed Aug 20, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Aug 20, 2024

View reviewed changes

MichaReiser approved these changes Aug 21, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

AlexWaygood and others added 2 commits August 21, 2024 13:46

Add failing tests

2efa423

Fix tests as best we can

29ab98d

Update crates/red_knot_python_semantic/src/types/infer.rs Co-authored-by: Micha Reiser <micha@reiser.io>

AlexWaygood force-pushed the alex/redknot-import-from branch from 7e907f8 to ea3a449 Compare August 21, 2024 13:09

AlexWaygood force-pushed the alex/redknot-import-from branch from ea3a449 to fce1b32 Compare August 21, 2024 13:16

Address review

2c1fb1d

AlexWaygood force-pushed the alex/redknot-import-from branch from fce1b32 to 2c1fb1d Compare August 21, 2024 13:37

AlexWaygood enabled auto-merge (squash) August 21, 2024 13:39

AlexWaygood merged commit ecd9e6a into main Aug 21, 2024
17 checks passed

AlexWaygood deleted the alex/redknot-import-from branch August 21, 2024 13:44

carljm reviewed Aug 22, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Show resolved Hide resolved

carljm reviewed Aug 22, 2024

View reviewed changes

carljm mentioned this pull request Aug 22, 2024

[red-knot] Track root cause of why a type is inferred as Unknown #12986

Closed

AlexWaygood mentioned this pull request Aug 22, 2024

[red-knot] Improve the accuracy of the unresolved-import check #13055

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Improve the `unresolved-import` check #13007

[red-knot] Improve the `unresolved-import` check #13007

AlexWaygood commented Aug 20, 2024

github-actions bot commented Aug 20, 2024 •

edited

Loading

MichaReiser Aug 20, 2024

AlexWaygood Aug 20, 2024

MichaReiser Aug 20, 2024

AlexWaygood Aug 20, 2024

AlexWaygood Aug 20, 2024 •

edited

Loading

AlexWaygood Aug 20, 2024

AlexWaygood Aug 20, 2024

AlexWaygood Aug 20, 2024 •

edited

Loading

carljm Aug 22, 2024

carljm Aug 22, 2024 •

edited

Loading

MichaReiser left a comment

AlexWaygood commented Aug 21, 2024

codspeed-hq bot commented Aug 21, 2024 •

edited

Loading

carljm Aug 22, 2024

AlexWaygood Aug 22, 2024

carljm Aug 22, 2024

carljm Aug 22, 2024 •

edited

Loading

carljm Aug 22, 2024 •

edited

Loading

carljm Aug 22, 2024

AlexWaygood Aug 22, 2024

carljm Aug 22, 2024

carljm Aug 22, 2024 •

edited

Loading

carljm commented Aug 22, 2024 •

edited

Loading

AlexWaygood commented Aug 22, 2024

MichaReiser commented Aug 22, 2024 •

edited

Loading

carljm commented Aug 22, 2024

		A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
		despite the symbol existing in the symbol table for `a.py`"]

[red-knot] Improve the unresolved-import check #13007

[red-knot] Improve the unresolved-import check #13007

Conversation

AlexWaygood commented Aug 20, 2024

Summary

Test Plan

github-actions bot commented Aug 20, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

MichaReiser left a comment

Choose a reason for hiding this comment

AlexWaygood commented Aug 21, 2024

codspeed-hq bot commented Aug 21, 2024 • edited Loading

Merging #13007 will improve performances by 4.16%

Summary

Benchmarks breakdown

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

carljm Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

carljm commented Aug 22, 2024 • edited Loading

AlexWaygood commented Aug 22, 2024

MichaReiser commented Aug 22, 2024 • edited Loading

carljm commented Aug 22, 2024

[red-knot] Improve the `unresolved-import` check #13007

[red-knot] Improve the `unresolved-import` check #13007

github-actions bot commented Aug 20, 2024 •

edited

Loading

`ruff-ecosystem` results

AlexWaygood Aug 20, 2024 •

edited

Loading

AlexWaygood Aug 20, 2024 •

edited

Loading

carljm Aug 22, 2024 •

edited

Loading

codspeed-hq bot commented Aug 21, 2024 •

edited

Loading

carljm Aug 22, 2024 •

edited

Loading

carljm Aug 22, 2024 •

edited

Loading

carljm Aug 22, 2024 •

edited

Loading

carljm commented Aug 22, 2024 •

edited

Loading

MichaReiser commented Aug 22, 2024 •

edited

Loading