Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: assertion left == right failed but it depends on library-unstable details #128899

Closed
BoxyUwU opened this issue Aug 9, 2024 · 25 comments
Labels
regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@BoxyUwU
Copy link
Member

BoxyUwU commented Aug 9, 2024

[INFO] [stdout] thread 'tests::test_primie_factorization' panicked at src/lib.rs:132:9:
[INFO] [stdout] assertion `left == right` failed
[INFO] [stdout]   left: [PrimeFactor { prime: 2, power: 2 }, PrimeFactor { prime: 5, power: 3 }]
[INFO] [stdout]  right: [PrimeFactor { prime: 5, power: 3 }, PrimeFactor { prime: 2, power: 2 }]
@BoxyUwU BoxyUwU added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Aug 9, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 9, 2024
@BoxyUwU BoxyUwU removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 9, 2024
@workingjubilee
Copy link
Member

workingjubilee commented Aug 10, 2024

It is probably the case that these include incorrect sort algorithm impls.

I am bringing these to the attention of @orlp and @Voultapher if they wish to inspect the regressions further.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 10, 2024

I think I am wrong for some most, actually.

bevy_serde_macros
assertion `left == right` failed
[INFO] [stdout]   left: {" Component3": Array [Array [Number(1), Object {"target": Number(0), "test_enum": Object {"ATest": String("test")}}]], " Component1": Array [Array [Number(0), Null], Array [Number(1), Null]], " Component2": Array [Array [Number(1), Object {"target": Number(0)}]]}
[INFO] [stdout]  right: {"Component3": Array [Array [Number(1), Object {"target": Number(0), "test_enum": Object {"ATest": String("test")}}]], "Component2": Array [Array [Number(1), Object {"target": Number(0)}]], "Component1": Array [Array [Number(0), Null], Array [Number(1), Null]]}

Either way, these do not seem to be directly from the sort changes.

@Voultapher
Copy link
Contributor

Voultapher commented Aug 10, 2024

coding-problems-practice has a now-failing-tests solution, but it actually is still correct according to the original problem.
the tests seem to rely on sort_unstable_by having sorted things a certain way,

sort_unstable* is unstable as the name says and the new implementation is deterministic as the previous one was, but it produces a different equal element ordering than the one before, so a test that relies on one specific order but uses sort_unstable is only valid for one version of Rust.

however, the original problem specifically mentions that it's okay to get either "eert" or "eetr"

I think the correct way to test that would be to do stable sort or assert_matches!(val, "eert" | "eetr").

@Voultapher
Copy link
Contributor

Voultapher commented Aug 10, 2024

atrocious_sort: appears to be failing in a sort, yes, but is specifically failing on sleep_sort only. did we change something about concurrency? https://github.com/Chromfalke/AtrociousSort/blob/main/src/algorithms/sleep_sort.rs

Looking at the implementation, it sleeps for milliseconds, I fail to see how that could ever work reliably. The OS can decide to wake the threads up at some later point in time, sleep does not guarantee a fixed wakeup time. Plus it doesn't use latches, so it's spawn + sleep before the next thread is spawned. IMO that implementation is just plain broken.

@Voultapher
Copy link
Contributor

I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?

@Voultapher
Copy link
Contributor

gaffer
Other initial hypothesis is that the library is doing panic recovery. Perhaps some other library in its dep tree is sorting but then panic recovery occurs? survemobility/gaffer@f238bae

I suspect it's just racy. I checked the project out locally and ran cargo t, on my initial run I got:

test source::test::queued_resets_recurring ... FAILED

failures:

---- source::test::queued_resets_recurring stdout ----
thread 'source::test::queued_resets_recurring' panicked at src/source.rs:320:9:
assertion `left == right` failed
  left: [Tester(3), Tester(2), Tester(1)]
 right: [Tester(2)]


failures:
    source::test::queued_resets_recurring

test result: FAILED. 36 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

Which is a completely different test, and the one in the crater run passed. Second run all tests pass.

@orlp
Copy link
Contributor

orlp commented Aug 10, 2024

@workingjubilee I find the proxmox-api case very strange: https://github.com/datdenkikniet/proxmox-api/blob/7d7540839371e3a58837b049997fe5fb8816d0cf/proxmox-api/src/types/multi.rs#L241-L255

Note that the portion of the serde output that was changed is actually a BTreeMap, not a HashMap.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 10, 2024

@orlp Thanks for catching that! I retract my comments re: "iterating over a hashmap" for that library, but it still seems very odd.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 10, 2024

Somewhat shockingly, the Project Euler code is also operating with a hashmap: https://github.com/chubei/project_euler/blob/d100bed4c6b7c417276ccba051567c53ed869d40/src/lib.rs#L39-L48

I really expected that one to be due to sorting issues...

I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?

There is a certain sort of person who only ever writes and runs a test once.

I do not mean to accuse, but if they only ran the test a few times, it is possible the hashes happened to work out for them. Even if the hashing perfectly randomizes the keys, if you assert on the order of only 2 items, then it comes up as a coinflip either way. It's reasonable for that to pass 2-4 times in a row if you are lucky.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 10, 2024

I downloaded workpool and reran its tests a few times. It's another racy case. I think we've correctly identified racy library tests in:

  • atrocious_sort
  • gaffer
  • workpool

And what seem to be definitely hashmap problems:

  • chubei/project-euler
  • rainhead/rust-anagrams
  • rwini

Unless these are fixed promptly, we should ignore those in future crater runs which cover tests.

Correctly root-caused to the sort impl, but known accepted breakage:

  • lht102/coding-problems-practice

MPCHomeControl is a float problem, so I'll bring it to #128898

That leaves the following as still open questions as to why the test is failing to begin with:

  • bevy_serde_macros
  • hopscotch
  • proxmox-api

@saethlin
Copy link
Member

I haven't looked at these crates carefully.

I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.

@Voultapher
Copy link
Contributor

Voultapher commented Aug 11, 2024

I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.

It's a little off topic and if this turns into a bigger discussion let's please find a better place for it. My 2 cents, if your tests are order dependent and or can't be run in parallel they are broken. I'd rather tell these libraries "you choose to break the tooling's expectations, so you can't expect us to care for your use-case".

Specifically with the racy libraries here I'm not sure it would even help:

  • atrocious_sort: Broken impl, single-core testing won't help.
  • gaffer: Docs say: "some of the tests are very dependent on timing and will fail if run slowly", so not sure how single-core testing would help.
  • workpool: No docs whatsoever, sporadic test failure, not sure how single-core testing would help.

@Voultapher
Copy link
Contributor

  • bevy_serde_macros
image

I think it's our good friend hashmap order see, which constructs two hashmaps and does assert_eq!. Beware this is hashbrown::HashMap with a PartialEq implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun that PartialEq is implemented that way?

@Voultapher
Copy link
Contributor

  • proxmox-api

Is once again also hashmap order, the main branch defines Names as:

#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_multi_btree::<'_, NumberedNames, _>",
        serialize_with = "super::serialize_multi_btree::<NumberedNames, _>"
    )]
    names: BTreeMap<u32, String>,
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
    )]
    additional_data: HashMap<String, String>,
}

Where the field additional_data is a hashmap. But in the test it only accounts for the name_2 key and not for name1 or name2 which are the ones that swap place in the produced JSON string. However if look at the v0.1.1 tag Names has a different definition:

#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_multi::<'_, NumberedNames, _>",
        serialize_with = "super::serialize_multi::<NumberedNames, _>"
    )]
    names: HashMap<u32, String>,
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
    )]
    additional_data: HashMap<String, String>,
}

Now the fields names and additional_data are HashMaps. This is the version that was tested by the crater run. Mystery solved 😄

@Nemo157
Copy link
Member

Nemo157 commented Aug 11, 2024

I think it's our good friend hashmap order see, which constructs two hashmaps and does assert_eq!. Beware this is hashbrown::HashMap with a PartialEq implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun that PartialEq is implemented that way?

Equality doesn't care about which order you check the keys in, the debug output is different but that shouldn't affect the comparison result. There seem to be extra spaces in the keys on one side, which I assume means the test itself is broken. EDIT: looking at it a little more, seems to be related to not accounting for whitespace in stringify! of ty fragments, so tests :: Component1 gets split at the :: and given the name Component1 instead of the expected Component1, maybe #125174 related.

@Voultapher
Copy link
Contributor

  • hopscotch

This is not a regression. Locally the tests partial_ord_correct and ord_correct fail with some chance both on stable and nightly. I've seen all 4 combinations of success and fail on for both versions of Rust. I think it depends on wether quickcheck finds a code path where the properties don't hold, which I assume depends on randomness.

Digging into why it fails, as example input that fails let's take left = [(66, 1)] and right = [(66, 0), (0, 0)]. The tests compare partial_cmp and cmp results. compare_as_iters answers what the keys compared via the the iterator::cmp method yields. In this case [66].iter().cmp([66, 0].iter()) == Ordering::Less. In contrast the hopscotch::Queue::cmp method uses completely different logic and does offset and tag comparisons before going over to value, not key, iter comparison. In this specific instance tag comparison yields Ordering::Greater which causes the mismatch and test failure.

In my opinion either the test does not test what the author wanted it to test, or the implementation does not do what the author intended it to do.

@Voultapher
Copy link
Contributor

Given the lack of active maintenance and sub 10k downloads, it might make sense to deny-list some of these crates that fail because of broken tests and or implementations.

@Voultapher
Copy link
Contributor

Equality doesn't care about which order you check the keys in

My bad, indeed the other.get call makes it order independent.

@orlp
Copy link
Contributor

orlp commented Aug 11, 2024

Would it be possible to add a clippy warning to some code patterns which attempt to assert an expression that depends on hashmap iteration order? Obviously we can't catch everything, but perhaps some of the more common patterns?

@workingjubilee workingjubilee changed the title regression: assertion left == right for stuff that looks sorting related regression: assertion left == right failed because people iterate hashmaps Aug 11, 2024
@workingjubilee workingjubilee changed the title regression: assertion left == right failed because people iterate hashmaps regression: assertion left == right failed because of comparisons that depend on library-unstable details Aug 11, 2024
@workingjubilee workingjubilee changed the title regression: assertion left == right failed because of comparisons that depend on library-unstable details regression: assertion left == right failed but it depends on library-unstable details Aug 11, 2024
@workingjubilee
Copy link
Member

I guess I retract my retraction regarding proxmox-api???

@workingjubilee
Copy link
Member

I guess we should test a revert of #125174

@workingjubilee
Copy link
Member

Hm, #125174 was cratered.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 11, 2024

The proxmox-api tests may work on the next crater run, so let's not exclude those.

I've opened some PRs and issues on the relevant crates in the hopes of them fixing their code so that we can continue using their test data:

It seems unlikely these will all be resolved by the next beta crater run.

In particular, the gaffer tests seem to be completely insoluble as it is an issue in essentially all the tests, so let's just skip that crate's tests from now on.

@workingjubilee
Copy link
Member

I have opened some followups:

Everything else is accepted breakage, or (hopefully) will be published and fixed for the next beta crater, so I think we're done here.

@workingjubilee
Copy link
Member

Thank you for playing, everyone!

@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Aug 12, 2024
bors added a commit to rust-lang/crater that referenced this issue Aug 24, 2024
…, r=Mark-Simulacrum

Exclude flakes found in beta 1.81

Details found in rust-lang/rust#128899
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants