-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some issues with cargo doc
and the new feature resolver.
#9077
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,10 +36,11 @@ use crate::core::profiles::{Profiles, UnitFor}; | |
use crate::core::resolver::features::{self, FeaturesFor, RequestedFeatures}; | ||
use crate::core::resolver::{HasDevUnits, Resolve, ResolveOpts}; | ||
use crate::core::{FeatureValue, Package, PackageSet, Shell, Summary, Target}; | ||
use crate::core::{PackageId, PackageIdSpec, TargetKind, Workspace}; | ||
use crate::core::{PackageId, PackageIdSpec, SourceId, TargetKind, Workspace}; | ||
use crate::ops; | ||
use crate::ops::resolve::WorkspaceResolve; | ||
use crate::util::config::Config; | ||
use crate::util::interning::InternedString; | ||
use crate::util::restricted_names::is_glob_pattern; | ||
use crate::util::{closest_msg, profile, CargoResult, StableHasher}; | ||
|
||
|
@@ -503,6 +504,12 @@ pub fn create_bcx<'a, 'cfg>( | |
interner, | ||
)?; | ||
|
||
// TODO: In theory, Cargo should also dedupe the roots, but I'm uncertain | ||
// what heuristics to use in that case. | ||
if build_config.mode == (CompileMode::Doc { deps: true }) { | ||
remove_duplicate_doc(build_config, &mut unit_graph); | ||
} | ||
|
||
if build_config | ||
.requested_kinds | ||
.iter() | ||
|
@@ -1455,3 +1462,108 @@ fn opt_patterns_and_names( | |
} | ||
Ok((opt_patterns, opt_names)) | ||
} | ||
|
||
/// Removes duplicate CompileMode::Doc units that would cause problems with | ||
/// filename collisions. | ||
/// | ||
/// Rustdoc only separates units by crate name in the file directory | ||
/// structure. If any two units with the same crate name exist, this would | ||
/// cause a filename collision, causing different rustdoc invocations to stomp | ||
/// on one another's files. | ||
/// | ||
/// Unfortunately this does not remove all duplicates, as some of them are | ||
/// either user error, or difficult to remove. Cases that I can think of: | ||
/// | ||
/// - Same target name in different packages. See the `collision_doc` test. | ||
/// - Different sources. See `collision_doc_sources` test. | ||
/// | ||
/// Ideally this would not be necessary. | ||
fn remove_duplicate_doc(build_config: &BuildConfig, unit_graph: &mut UnitGraph) { | ||
// NOTE: There is some risk that this can introduce problems because it | ||
// may create orphans in the unit graph (parts of the tree get detached | ||
// from the roots). I currently can't think of any ways this will cause a | ||
// problem because all other parts of Cargo traverse the graph starting | ||
// from the roots. Perhaps this should scan for detached units and remove | ||
// them too? | ||
// | ||
// First, create a mapping of crate_name -> Unit so we can see where the | ||
// duplicates are. | ||
let mut all_docs: HashMap<String, Vec<Unit>> = HashMap::new(); | ||
for unit in unit_graph.keys() { | ||
if unit.mode.is_doc() { | ||
all_docs | ||
.entry(unit.target.crate_name()) | ||
.or_default() | ||
.push(unit.clone()); | ||
} | ||
} | ||
let mut remove = |units: Vec<Unit>, reason: &str| { | ||
for unit in &units { | ||
log::debug!( | ||
"removing duplicate doc due to {} for package {} target `{}`", | ||
reason, | ||
unit.pkg, | ||
unit.target.name() | ||
); | ||
unit_graph.remove(unit); | ||
} | ||
for unit_deps in unit_graph.values_mut() { | ||
unit_deps.retain(|unit_dep| !units.iter().any(|unit| *unit == unit_dep.unit)); | ||
} | ||
}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm slightly worried about the performance of this since this could be in the worst case O(n^2). Would it be possible to maintain a set of units to remove and they're all processed together at the very end? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops, yea. Pushed a change to filter them once. |
||
// Iterate over the duplicates and try to remove them from unit_graph. | ||
for (_crate_name, mut units) in all_docs { | ||
if units.len() == 1 { | ||
continue; | ||
} | ||
// Prefer target over host if --target was not specified. | ||
if build_config | ||
.requested_kinds | ||
.iter() | ||
.all(CompileKind::is_host) | ||
{ | ||
let (to_remove, remaining_units): (Vec<Unit>, Vec<Unit>) = | ||
units.into_iter().partition(|unit| unit.kind.is_host()); | ||
// Note these duplicates may not be real duplicates, since they | ||
// might get merged in rebuild_unit_graph_shared. Either way, it | ||
// shouldn't hurt to remove them early (although the report in the | ||
// log might be confusing). | ||
remove(to_remove, "host/target merger"); | ||
units = remaining_units; | ||
if units.len() == 1 { | ||
continue; | ||
} | ||
} | ||
// Prefer newer versions over older. | ||
let mut source_map: HashMap<(InternedString, SourceId, CompileKind), Vec<Unit>> = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could the key here perhaps be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so (per the other comment, there can be multiple entries with the same version). |
||
HashMap::new(); | ||
for unit in units { | ||
let pkg_id = unit.pkg.package_id(); | ||
// Note, this does not detect duplicates from different sources. | ||
source_map | ||
.entry((pkg_id.name(), pkg_id.source_id(), unit.kind)) | ||
.or_default() | ||
.push(unit); | ||
} | ||
let mut remaining_units = Vec::new(); | ||
for (_key, mut units) in source_map { | ||
if units.len() > 1 { | ||
units.sort_by(|a, b| a.pkg.version().partial_cmp(b.pkg.version()).unwrap()); | ||
// Remove any entries with version < newest. | ||
let newest_version = units.last().unwrap().pkg.version().clone(); | ||
let (to_remove, keep_units): (Vec<Unit>, Vec<Unit>) = units | ||
.into_iter() | ||
.partition(|unit| unit.pkg.version() < &newest_version); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given a source it's not possible to have multiple crates of the same name at the same version, right? Is it possible for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it should be very rare, but it is possible. One example is I'm not sure how that should behave. Should it be an error? Should it ignore the binary? Should rustdoc put those in separate directories? There are a bunch of collision opportunities, and this isn't intended to cover all of them, just a few that seemed reasonable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm nah that's a good point. I don't know of a better thing to do here other than warn really at this time unfortunately. |
||
remove(to_remove, "older version"); | ||
remaining_units.extend(keep_units); | ||
} else { | ||
remaining_units.extend(units); | ||
} | ||
} | ||
if remaining_units.len() == 1 { | ||
continue; | ||
} | ||
// Are there other heuristics to remove duplicates that would make | ||
// sense? Maybe prefer path sources over all others? | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW long-term I don't think that this function should exist. Ideally rustdoc would have a feature that we could specify the directory/lookup name and we could pass that in to avoid all duplication issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. However, it is not clear to me how that should work. Because the directories are exposed as URLs, I think it would be unpleasant for those to have Cargo-style hashes in them. There could be some pattern like $NAME-$VERSION and then have extra disambiguators when necessary like $NAME-$VERSION-bin-foo. But there are some collisions that are harder (like when they only differ in the features).
What do you think about having a hash in the URL?
This feels like a bit of a mess, and I'm uncertain how to climb out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for
cargo doc
locally I feel like the hash doesn't matter too much since you'll just be clicking on links anyway (assuming we generate some well-known landing page which we don't already do today). For everything else though (like doc.rust-lang.org or docs.rs) I wouldn't want Cargo-style hashes in URLs.This function is likely gonna be here for a long long time which is by no means the end of the world. I think it'll take some more thinking to figure out what it might look like to not have this function.