Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some issues with cargo doc and the new feature resolver. #9077

Merged
merged 3 commits into from
Jan 20, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions src/cargo/core/compiler/unit_dependencies.rs
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ fn compute_deps(
return compute_deps_custom_build(unit, unit_for, state);
} else if unit.mode.is_doc() {
// Note: this does not include doc test.
return compute_deps_doc(unit, state);
return compute_deps_doc(unit, state, unit_for);
}

let id = unit.pkg.package_id();
Expand Down Expand Up @@ -395,9 +395,13 @@ fn compute_deps_custom_build(
}

/// Returns the dependencies necessary to document a package.
fn compute_deps_doc(unit: &Unit, state: &mut State<'_, '_>) -> CargoResult<Vec<UnitDep>> {
fn compute_deps_doc(
unit: &Unit,
state: &mut State<'_, '_>,
unit_for: UnitFor,
) -> CargoResult<Vec<UnitDep>> {
let deps = state
.deps(unit, UnitFor::new_normal())
.deps(unit, unit_for)
.into_iter()
.filter(|&(_id, deps)| deps.iter().any(|dep| dep.kind() == DepKind::Normal));

Expand All @@ -414,7 +418,7 @@ fn compute_deps_doc(unit: &Unit, state: &mut State<'_, '_>) -> CargoResult<Vec<U
// Rustdoc only needs rmeta files for regular dependencies.
// However, for plugins/proc macros, deps should be built like normal.
let mode = check_or_build_mode(unit.mode, lib);
let dep_unit_for = UnitFor::new_normal().with_dependency(unit, lib);
let dep_unit_for = unit_for.with_dependency(unit, lib);
let lib_unit_dep = new_unit_dep(
state,
unit,
Expand All @@ -441,11 +445,11 @@ fn compute_deps_doc(unit: &Unit, state: &mut State<'_, '_>) -> CargoResult<Vec<U
}

// Be sure to build/run the build script for documented libraries.
ret.extend(dep_build_script(unit, UnitFor::new_normal(), state)?);
ret.extend(dep_build_script(unit, unit_for, state)?);

// If we document a binary/example, we need the library available.
if unit.target.is_bin() || unit.target.is_example() {
ret.extend(maybe_lib(unit, state, UnitFor::new_normal())?);
ret.extend(maybe_lib(unit, state, unit_for)?);
}
Ok(ret)
}
Expand Down
114 changes: 113 additions & 1 deletion src/cargo/ops/cargo_compile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,11 @@ use crate::core::profiles::{Profiles, UnitFor};
use crate::core::resolver::features::{self, FeaturesFor, RequestedFeatures};
use crate::core::resolver::{HasDevUnits, Resolve, ResolveOpts};
use crate::core::{FeatureValue, Package, PackageSet, Shell, Summary, Target};
use crate::core::{PackageId, PackageIdSpec, TargetKind, Workspace};
use crate::core::{PackageId, PackageIdSpec, SourceId, TargetKind, Workspace};
use crate::ops;
use crate::ops::resolve::WorkspaceResolve;
use crate::util::config::Config;
use crate::util::interning::InternedString;
use crate::util::restricted_names::is_glob_pattern;
use crate::util::{closest_msg, profile, CargoResult, StableHasher};

Expand Down Expand Up @@ -503,6 +504,12 @@ pub fn create_bcx<'a, 'cfg>(
interner,
)?;

// TODO: In theory, Cargo should also dedupe the roots, but I'm uncertain
// what heuristics to use in that case.
if build_config.mode == (CompileMode::Doc { deps: true }) {
remove_duplicate_doc(build_config, &mut unit_graph);
}

if build_config
.requested_kinds
.iter()
Expand Down Expand Up @@ -1455,3 +1462,108 @@ fn opt_patterns_and_names(
}
Ok((opt_patterns, opt_names))
}

/// Removes duplicate CompileMode::Doc units that would cause problems with
/// filename collisions.
///
/// Rustdoc only separates units by crate name in the file directory
/// structure. If any two units with the same crate name exist, this would
/// cause a filename collision, causing different rustdoc invocations to stomp
/// on one another's files.
///
/// Unfortunately this does not remove all duplicates, as some of them are
/// either user error, or difficult to remove. Cases that I can think of:
///
/// - Same target name in different packages. See the `collision_doc` test.
/// - Different sources. See `collision_doc_sources` test.
///
/// Ideally this would not be necessary.
fn remove_duplicate_doc(build_config: &BuildConfig, unit_graph: &mut UnitGraph) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW long-term I don't think that this function should exist. Ideally rustdoc would have a feature that we could specify the directory/lookup name and we could pass that in to avoid all duplication issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. However, it is not clear to me how that should work. Because the directories are exposed as URLs, I think it would be unpleasant for those to have Cargo-style hashes in them. There could be some pattern like $NAME-$VERSION and then have extra disambiguators when necessary like $NAME-$VERSION-bin-foo. But there are some collisions that are harder (like when they only differ in the features).

What do you think about having a hash in the URL?

This feels like a bit of a mess, and I'm uncertain how to climb out of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for cargo doc locally I feel like the hash doesn't matter too much since you'll just be clicking on links anyway (assuming we generate some well-known landing page which we don't already do today). For everything else though (like doc.rust-lang.org or docs.rs) I wouldn't want Cargo-style hashes in URLs.

This function is likely gonna be here for a long long time which is by no means the end of the world. I think it'll take some more thinking to figure out what it might look like to not have this function.

// NOTE: There is some risk that this can introduce problems because it
// may create orphans in the unit graph (parts of the tree get detached
// from the roots). I currently can't think of any ways this will cause a
// problem because all other parts of Cargo traverse the graph starting
// from the roots. Perhaps this should scan for detached units and remove
// them too?
//
// First, create a mapping of crate_name -> Unit so we can see where the
// duplicates are.
let mut all_docs: HashMap<String, Vec<Unit>> = HashMap::new();
for unit in unit_graph.keys() {
if unit.mode.is_doc() {
all_docs
.entry(unit.target.crate_name())
.or_default()
.push(unit.clone());
}
}
let mut remove = |units: Vec<Unit>, reason: &str| {
for unit in &units {
log::debug!(
"removing duplicate doc due to {} for package {} target `{}`",
reason,
unit.pkg,
unit.target.name()
);
unit_graph.remove(unit);
}
for unit_deps in unit_graph.values_mut() {
unit_deps.retain(|unit_dep| !units.iter().any(|unit| *unit == unit_dep.unit));
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about the performance of this since this could be in the worst case O(n^2). Would it be possible to maintain a set of units to remove and they're all processed together at the very end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yea. Pushed a change to filter them once.

// Iterate over the duplicates and try to remove them from unit_graph.
for (_crate_name, mut units) in all_docs {
if units.len() == 1 {
continue;
}
// Prefer target over host if --target was not specified.
if build_config
.requested_kinds
.iter()
.all(CompileKind::is_host)
{
let (to_remove, remaining_units): (Vec<Unit>, Vec<Unit>) =
units.into_iter().partition(|unit| unit.kind.is_host());
// Note these duplicates may not be real duplicates, since they
// might get merged in rebuild_unit_graph_shared. Either way, it
// shouldn't hurt to remove them early (although the report in the
// log might be confusing).
remove(to_remove, "host/target merger");
units = remaining_units;
if units.len() == 1 {
continue;
}
}
// Prefer newer versions over older.
let mut source_map: HashMap<(InternedString, SourceId, CompileKind), Vec<Unit>> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the key here perhaps be BTreeMap<Version, Unit>? (to avoid the extra sort later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so (per the other comment, there can be multiple entries with the same version).

HashMap::new();
for unit in units {
let pkg_id = unit.pkg.package_id();
// Note, this does not detect duplicates from different sources.
source_map
.entry((pkg_id.name(), pkg_id.source_id(), unit.kind))
.or_default()
.push(unit);
}
let mut remaining_units = Vec::new();
for (_key, mut units) in source_map {
if units.len() > 1 {
units.sort_by(|a, b| a.pkg.version().partial_cmp(b.pkg.version()).unwrap());
// Remove any entries with version < newest.
let newest_version = units.last().unwrap().pkg.version().clone();
let (to_remove, keep_units): (Vec<Unit>, Vec<Unit>) = units
.into_iter()
.partition(|unit| unit.pkg.version() < &newest_version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a source it's not possible to have multiple crates of the same name at the same version, right?

Is it possible for keep_units to have length greater than 1? (I'm probably missing something...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be very rare, but it is possible.

One example is cargo doc --bins --lib. This will try to document both the library and the binary, which have the same name and thus end up colliding.

I'm not sure how that should behave. Should it be an error? Should it ignore the binary? Should rustdoc put those in separate directories?

There are a bunch of collision opportunities, and this isn't intended to cover all of them, just a few that seemed reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm nah that's a good point. I don't know of a better thing to do here other than warn really at this time unfortunately.

remove(to_remove, "older version");
remaining_units.extend(keep_units);
} else {
remaining_units.extend(units);
}
}
if remaining_units.len() == 1 {
continue;
}
// Are there other heuristics to remove duplicates that would make
// sense? Maybe prefer path sources over all others?
}
}
Loading