regression: assertion `left == right` failed but it depends on library-unstable details #128899

BoxyUwU · 2024-08-09T17:31:57Z

[INFO] [stdout] thread 'tests::test_primie_factorization' panicked at src/lib.rs:132:9:
[INFO] [stdout] assertion `left == right` failed
[INFO] [stdout]   left: [PrimeFactor { prime: 2, power: 2 }, PrimeFactor { prime: 5, power: 3 }]
[INFO] [stdout]  right: [PrimeFactor { prime: 5, power: 3 }, PrimeFactor { prime: 2, power: 2 }]

The text was updated successfully, but these errors were encountered:

workingjubilee · 2024-08-10T00:32:37Z

It is probably the case that these include incorrect sort algorithm impls.

I am bringing these to the attention of @orlp and @Voultapher if they wish to inspect the regressions further.

coding-problems-practice has a now-failing-tests solution, but it actually is still correct according to the original problem.
- the tests seem to rely on sort_unstable_by having sorted things a certain way, as it only sorts by frequency, not by frequency and then natural order as the tests seem to imply is required: https://github.com/lht102/coding-problems-practice/blob/5b22b46182abe6d0617d087e6b8a0fa22f2edada/leetcode/src/00451_sort_characters_by_frequency/solution.rs#L10-L27
- however, the original problem specifically mentions that it's okay to get either "eert" or "eetr": https://leetcode.com/problems/sort-characters-by-frequency/description/
hopscotch has an ord_correct test that is now failing, and while I can't find the sort call, it feels likely the answer is wrong or sort may be implicitly called by quickcheck: https://github.com/plaidfinch/hopscotch/blob/54483dabb3252f00f73432984b42bd00f7471466/tests/hopscotch.rs#L136-L146

workingjubilee · 2024-08-10T00:46:18Z

I think I am wrong for ~~some~~ most, actually.

anagrams: uses a HashTrieMap, then asserts on an iter-then-collect-to-vec, a somewhat notorious hazard. https://github.com/rainhead/rust-anagrams/blob/d42b99a1c5d3465d65a6ded34aff0857a0e52bfe/src/lib.rs#L35-L41
atrocious_sort: appears to be failing in a sort, yes, but is specifically failing on sleep_sort only. did we change something about concurrency? https://github.com/Chromfalke/AtrociousSort/blob/main/src/algorithms/sleep_sort.rs
bevy_serde_macros
- seems to be asserting on serialization/deserialization being handled a certain way, but now some whitespace is in the keys?
gaffer
- No sorting appears to occur in https://github.com/survemobility/gaffer/blob/fd1cc240414bb5ee61c840538e0ed4f83e5d58dd/tests/integration.rs or the rest of the library at a glance, so I believe that's instead a concurrency bug?
- Other initial hypothesis is that the library is doing panic recovery. Perhaps some other library in its dep tree is sorting but then panic recovery occurs? survemobility/gaffer@f238bae
proxmox-api: seems to be another case of asserting after iterating over a hashmap: https://github.com/datdenkikniet/proxmox-api/blob/7d7540839371e3a58837b049997fe5fb8816d0cf/proxmox-api/src/types/multi.rs#L241-L255 actually see orlp's comment.
rwini: another case of asserting on the iteration of a hashmap!
- https://github.com/zerodegress/rwini-rs/blob/master/src/data/ini.rs
- https://github.com/zerodegress/rwini-rs/blob/b7a098fe1e2785bd859a1753ace77c8662d2ebde/src/parser/mod.rs#L42-L66
workpool: another case where it's less clear and seems likely to be a concurrency issue? https://github.com/ggriffiniii/lateral/blob/469c20d70b2d36b8bc5726281cb98989f4423bec/workpool/src/pool/dynamic_pool.rs#L504-L515

bevy_serde_macros

assertion `left == right` failed
[INFO] [stdout]   left: {" Component3": Array [Array [Number(1), Object {"target": Number(0), "test_enum": Object {"ATest": String("test")}}]], " Component1": Array [Array [Number(0), Null], Array [Number(1), Null]], " Component2": Array [Array [Number(1), Object {"target": Number(0)}]]}
[INFO] [stdout]  right: {"Component3": Array [Array [Number(1), Object {"target": Number(0), "test_enum": Object {"ATest": String("test")}}]], "Component2": Array [Array [Number(1), Object {"target": Number(0)}]], "Component1": Array [Array [Number(0), Null], Array [Number(1), Null]]}

Either way, these do not seem to be directly from the sort changes.

Voultapher · 2024-08-10T07:59:04Z

coding-problems-practice has a now-failing-tests solution, but it actually is still correct according to the original problem.
the tests seem to rely on sort_unstable_by having sorted things a certain way,

sort_unstable* is unstable as the name says and the new implementation is deterministic as the previous one was, but it produces a different equal element ordering than the one before, so a test that relies on one specific order but uses sort_unstable is only valid for one version of Rust.

however, the original problem specifically mentions that it's okay to get either "eert" or "eetr"

I think the correct way to test that would be to do stable sort or assert_matches!(val, "eert" | "eetr").

Voultapher · 2024-08-10T08:03:46Z

atrocious_sort: appears to be failing in a sort, yes, but is specifically failing on sleep_sort only. did we change something about concurrency? https://github.com/Chromfalke/AtrociousSort/blob/main/src/algorithms/sleep_sort.rs

Looking at the implementation, it sleeps for milliseconds, I fail to see how that could ever work reliably. The OS can decide to wake the threads up at some later point in time, sleep does not guarantee a fixed wakeup time. Plus it doesn't use latches, so it's spawn + sleep before the next thread is spawned. IMO that implementation is just plain broken.

Voultapher · 2024-08-10T08:25:34Z

I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?

Voultapher · 2024-08-10T09:58:50Z

gaffer
Other initial hypothesis is that the library is doing panic recovery. Perhaps some other library in its dep tree is sorting but then panic recovery occurs? survemobility/gaffer@f238bae

I suspect it's just racy. I checked the project out locally and ran cargo t, on my initial run I got:

test source::test::queued_resets_recurring ... FAILED

failures:

---- source::test::queued_resets_recurring stdout ----
thread 'source::test::queued_resets_recurring' panicked at src/source.rs:320:9:
assertion `left == right` failed
  left: [Tester(3), Tester(2), Tester(1)]
 right: [Tester(2)]


failures:
    source::test::queued_resets_recurring

test result: FAILED. 36 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

Which is a completely different test, and the one in the crater run passed. Second run all tests pass.

orlp · 2024-08-10T10:03:01Z

@workingjubilee I find the proxmox-api case very strange: https://github.com/datdenkikniet/proxmox-api/blob/7d7540839371e3a58837b049997fe5fb8816d0cf/proxmox-api/src/types/multi.rs#L241-L255

Note that the portion of the serde output that was changed is actually a BTreeMap, not a HashMap.

workingjubilee · 2024-08-10T21:52:09Z

@orlp Thanks for catching that! I retract my comments re: "iterating over a hashmap" for that library, but it still seems very odd.

workingjubilee · 2024-08-10T21:59:16Z

Somewhat shockingly, the Project Euler code is also operating with a hashmap: https://github.com/chubei/project_euler/blob/d100bed4c6b7c417276ccba051567c53ed869d40/src/lib.rs#L39-L48

I really expected that one to be due to sorting issues...

I'm a little surprised by how many instances there are of relying on the order of hashmaps, the order is randomized with the default hasher. Looking at the linked code they seem to be using the default hasher, how come these tests ever reliably work during local development? What am I missing here?

There is a certain sort of person who only ever writes and runs a test once.

I do not mean to accuse, but if they only ran the test a few times, it is possible the hashes happened to work out for them. Even if the hashing perfectly randomizes the keys, if you assert on the order of only 2 items, then it comes up as a coinflip either way. It's reasonable for that to pass 2-4 times in a row if you are lucky.

workingjubilee · 2024-08-10T22:36:23Z

I downloaded workpool and reran its tests a few times. It's another racy case. I think we've correctly identified racy library tests in:

atrocious_sort
gaffer
workpool

And what seem to be definitely hashmap problems:

chubei/project-euler
rainhead/rust-anagrams
rwini

Unless these are fixed promptly, we should ignore those in future crater runs which cover tests.

Correctly root-caused to the sort impl, but known accepted breakage:

lht102/coding-problems-practice

MPCHomeControl is a float problem, so I'll bring it to #128898

That leaves the following as still open questions as to why the test is failing to begin with:

bevy_serde_macros
hopscotch
proxmox-api

saethlin · 2024-08-11T03:03:38Z

I haven't looked at these crates carefully.

I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.

Voultapher · 2024-08-11T08:03:54Z

I know there are quite a few crates which are tested by crater that have test suites that are designed in various ways such that they are incompatible with multiple tests running at once. This would be fixed once and for all if crater would set the number of CPUs in each container to 1.

It's a little off topic and if this turns into a bigger discussion let's please find a better place for it. My 2 cents, if your tests are order dependent and or can't be run in parallel they are broken. I'd rather tell these libraries "you choose to break the tooling's expectations, so you can't expect us to care for your use-case".

Specifically with the racy libraries here I'm not sure it would even help:

atrocious_sort: Broken impl, single-core testing won't help.
gaffer: Docs say: "some of the tests are very dependent on timing and will fail if run slowly", so not sure how single-core testing would help.
workpool: No docs whatsoever, sporadic test failure, not sure how single-core testing would help.

Voultapher · 2024-08-11T09:31:54Z

bevy_serde_macros

I think it's our good friend hashmap order see, which constructs two hashmaps and does assert_eq!. Beware this is hashbrown::HashMap with a PartialEq implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun that PartialEq is implemented that way?

Voultapher · 2024-08-11T09:43:18Z

proxmox-api

Is once again also hashmap order, the main branch defines Names as:

#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_multi_btree::<'_, NumberedNames, _>",
        serialize_with = "super::serialize_multi_btree::<NumberedNames, _>"
    )]
    names: BTreeMap<u32, String>,
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
    )]
    additional_data: HashMap<String, String>,
}

Where the field additional_data is a hashmap. But in the test it only accounts for the name_2 key and not for name1 or name2 which are the ones that swap place in the produced JSON string. However if look at the v0.1.1 tag Names has a different definition:

#[derive(Debug, Deserialize, Serialize, PartialEq)]
struct Names {
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_multi::<'_, NumberedNames, _>",
        serialize_with = "super::serialize_multi::<NumberedNames, _>"
    )]
    names: HashMap<u32, String>,
    #[serde(
        flatten,
        deserialize_with = "super::deserialize_additional_data::<'_, Names, _, _>"
    )]
    additional_data: HashMap<String, String>,
}

Now the fields names and additional_data are HashMaps. This is the version that was tested by the crater run. Mystery solved 😄

Nemo157 · 2024-08-11T10:32:48Z

I think it's our good friend hashmap order see, which constructs two hashmaps and does assert_eq!. Beware this is hashbrown::HashMap with a PartialEq implementation like this. Which is order dependent. The stdlib one has the same impl. Now that I think of it, isn't that quite the big footgun that PartialEq is implemented that way?

Equality doesn't care about which order you check the keys in, the debug output is different but that shouldn't affect the comparison result. There seem to be extra spaces in the keys on one side, which I assume means the test itself is broken. EDIT: looking at it a little more, seems to be related to not accounting for whitespace in stringify! of ty fragments, so tests :: Component1 gets split at the :: and given the name Component1 instead of the expected Component1, maybe #125174 related.

Voultapher · 2024-08-11T10:47:56Z

hopscotch

This is not a regression. Locally the tests partial_ord_correct and ord_correct fail with some chance both on stable and nightly. I've seen all 4 combinations of success and fail on for both versions of Rust. I think it depends on wether quickcheck finds a code path where the properties don't hold, which I assume depends on randomness.

Digging into why it fails, as example input that fails let's take left = [(66, 1)] and right = [(66, 0), (0, 0)]. The tests compare partial_cmp and cmp results. compare_as_iters answers what the keys compared via the the iterator::cmp method yields. In this case [66].iter().cmp([66, 0].iter()) == Ordering::Less. In contrast the hopscotch::Queue::cmp method uses completely different logic and does offset and tag comparisons before going over to value, not key, iter comparison. In this specific instance tag comparison yields Ordering::Greater which causes the mismatch and test failure.

In my opinion either the test does not test what the author wanted it to test, or the implementation does not do what the author intended it to do.

Voultapher · 2024-08-11T10:48:38Z

Given the lack of active maintenance and sub 10k downloads, it might make sense to deny-list some of these crates that fail because of broken tests and or implementations.

Voultapher · 2024-08-11T11:31:27Z

Equality doesn't care about which order you check the keys in

My bad, indeed the other.get call makes it order independent.

orlp · 2024-08-11T13:00:13Z

Would it be possible to add a clippy warning to some code patterns which attempt to assert an expression that depends on hashmap iteration order? Obviously we can't catch everything, but perhaps some of the more common patterns?

workingjubilee · 2024-08-11T21:16:40Z

I guess I retract my retraction regarding proxmox-api???

workingjubilee · 2024-08-11T21:18:40Z

I guess we should test a revert of #125174

workingjubilee · 2024-08-11T21:24:31Z

Hm, #125174 was cratered.

workingjubilee · 2024-08-11T22:27:56Z

The proxmox-api tests may work on the next crater run, so let's not exclude those.

I've opened some PRs and issues on the relevant crates in the hopes of them fixing their code so that we can continue using their test data:

It seems unlikely these will all be resolved by the next beta crater run.

In particular, the gaffer tests seem to be completely insoluble as it is an issue in essentially all the tests, so let's just skip that crate's tests from now on.

workingjubilee · 2024-08-11T23:00:52Z

I have opened some followups:

Everything else is accepted breakage, or (hopefully) will be published and fixed for the next beta crater, so I think we're done here.

workingjubilee · 2024-08-11T23:08:49Z

Thank you for playing, everyone!

…, r=Mark-Simulacrum Exclude flakes found in beta 1.81 Details found in rust-lang/rust#128899

BoxyUwU added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Aug 9, 2024

rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 9, 2024

BoxyUwU removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 9, 2024

BoxyUwU mentioned this issue Aug 9, 2024

Crater runs for 1.81 #128468

Closed

workingjubilee mentioned this issue Aug 10, 2024

Seedable hashmap for testing? #128948

Open

workingjubilee changed the title ~~regression: assertion left == right for stuff that looks sorting related~~ regression: assertion left == right failed because people iterate hashmaps Aug 11, 2024

workingjubilee changed the title ~~regression: assertion left == right failed because people iterate hashmaps~~ regression: assertion left == right failed because of comparisons that depend on library-unstable details Aug 11, 2024

workingjubilee changed the title ~~regression: assertion left == right failed because of comparisons that depend on library-unstable details~~ regression: assertion left == right failed but it depends on library-unstable details Aug 11, 2024

workingjubilee mentioned this issue Aug 11, 2024

partial_ord_correct and ord_correct sometimes fail plaidfinch/hopscotch#19

Open

workingjubilee mentioned this issue Aug 11, 2024

Subtle change in how ty fragments get stringified #128992

Open

workingjubilee mentioned this issue Aug 11, 2024

Exclude flakes found in beta 1.81 rust-lang/crater#732

Merged

workingjubilee closed this as completed Aug 11, 2024

apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Aug 12, 2024

Voultapher mentioned this issue Aug 12, 2024

Improve Ord docs #129003

Open

bors added a commit to rust-lang/crater that referenced this issue Aug 24, 2024

Auto merge of #732 - workingjubilee:exclude-flakes-found-in-beta-1.81…

cec97cb

…, r=Mark-Simulacrum Exclude flakes found in beta 1.81 Details found in rust-lang/rust#128899

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression: assertion `left == right` failed but it depends on library-unstable details #128899

regression: assertion `left == right` failed but it depends on library-unstable details #128899

BoxyUwU commented Aug 9, 2024

workingjubilee commented Aug 10, 2024 •

edited

Loading

workingjubilee commented Aug 10, 2024 •

edited

Loading

Voultapher commented Aug 10, 2024 •

edited

Loading

Voultapher commented Aug 10, 2024 •

edited

Loading

Voultapher commented Aug 10, 2024

Voultapher commented Aug 10, 2024

orlp commented Aug 10, 2024

workingjubilee commented Aug 10, 2024 •

edited

Loading

workingjubilee commented Aug 10, 2024 •

edited

Loading

workingjubilee commented Aug 10, 2024 •

edited

Loading

saethlin commented Aug 11, 2024

Voultapher commented Aug 11, 2024 •

edited

Loading

Voultapher commented Aug 11, 2024

Voultapher commented Aug 11, 2024

Nemo157 commented Aug 11, 2024 •

edited

Loading

Voultapher commented Aug 11, 2024

Voultapher commented Aug 11, 2024

Voultapher commented Aug 11, 2024

orlp commented Aug 11, 2024 •

edited

Loading

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024 •

edited

Loading

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024

regression: assertion left == right failed but it depends on library-unstable details #128899

regression: assertion left == right failed but it depends on library-unstable details #128899

Comments

BoxyUwU commented Aug 9, 2024

workingjubilee commented Aug 10, 2024 • edited Loading

workingjubilee commented Aug 10, 2024 • edited Loading

Voultapher commented Aug 10, 2024 • edited Loading

Voultapher commented Aug 10, 2024 • edited Loading

Voultapher commented Aug 10, 2024

Voultapher commented Aug 10, 2024

orlp commented Aug 10, 2024

workingjubilee commented Aug 10, 2024 • edited Loading

workingjubilee commented Aug 10, 2024 • edited Loading

workingjubilee commented Aug 10, 2024 • edited Loading

saethlin commented Aug 11, 2024

Voultapher commented Aug 11, 2024 • edited Loading

Voultapher commented Aug 11, 2024

Voultapher commented Aug 11, 2024

Nemo157 commented Aug 11, 2024 • edited Loading

Voultapher commented Aug 11, 2024

Voultapher commented Aug 11, 2024

Voultapher commented Aug 11, 2024

orlp commented Aug 11, 2024 • edited Loading

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024 • edited Loading

workingjubilee commented Aug 11, 2024

workingjubilee commented Aug 11, 2024

regression: assertion `left == right` failed but it depends on library-unstable details #128899

regression: assertion `left == right` failed but it depends on library-unstable details #128899

workingjubilee commented Aug 10, 2024 •

edited

Loading

workingjubilee commented Aug 10, 2024 •

edited

Loading

Voultapher commented Aug 10, 2024 •

edited

Loading

Voultapher commented Aug 10, 2024 •

edited

Loading

workingjubilee commented Aug 10, 2024 •

edited

Loading

workingjubilee commented Aug 10, 2024 •

edited

Loading

workingjubilee commented Aug 10, 2024 •

edited

Loading

Voultapher commented Aug 11, 2024 •

edited

Loading

Nemo157 commented Aug 11, 2024 •

edited

Loading

orlp commented Aug 11, 2024 •

edited

Loading

workingjubilee commented Aug 11, 2024 •

edited

Loading