-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[clang-tidy] Speed up misc-header-include-cycle
#148757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang-tools-extra Author: Victor Chernyakin (localspook) ChangesThe clangd folks have classified this check as Slow™, because enabling it carries a runtime overhead of more than 8% on their benchmark: ./build/bin/clangd --check=clang/lib/Sema/Sema.cpp --check-locations=0 --check-tidy-time=misc-header-include-cycle --compile-commands-dir=build (The benchmark is wrapped in a script, but that's what it boils down to.) Why the slowness? This line is to blame:
All our file comparisons are based on Now, I'm not an expert here, but I hope I'm not missing some reason that While we're at it, this PR adds a few other small efficiencies:
Full diff: https://github.com/llvm/llvm-project/pull/148757.diff 1 Files Affected:
diff --git a/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp b/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp
index 21d443a1f34ff..1bd51c5902cde 100644
--- a/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp
+++ b/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp
@@ -13,8 +13,8 @@
#include "clang/Lex/Preprocessor.h"
#include "llvm/Support/Regex.h"
#include <algorithm>
-#include <deque>
#include <optional>
+#include <vector>
using namespace clang::ast_matchers;
@@ -23,8 +23,8 @@ namespace clang::tidy::misc {
namespace {
struct Include {
- FileID Id;
- llvm::StringRef Name;
+ const FileEntry *File;
+ StringRef Name;
SourceLocation Loc;
};
@@ -50,31 +50,27 @@ class CyclicDependencyCallbacks : public PPCallbacks {
if (Reason != EnterFile && Reason != ExitFile)
return;
- FileID Id = SM.getFileID(Loc);
+ const FileID Id = SM.getFileID(Loc);
if (Id.isInvalid())
return;
+ const FileEntry *NewFile = SM.getFileEntryForID(Id);
+ const FileEntry *PrevFile = SM.getFileEntryForID(PrevFID);
+
if (Reason == ExitFile) {
- if ((Files.size() > 1U) && (Files.back().Id == PrevFID) &&
- (Files[Files.size() - 2U].Id == Id))
+ if ((Files.size() > 1U) && (Files.back().File == PrevFile) &&
+ (Files[Files.size() - 2U].File == NewFile))
Files.pop_back();
return;
}
- if (!Files.empty() && Files.back().Id == Id)
+ if (!Files.empty() && Files.back().File == NewFile)
return;
- std::optional<llvm::StringRef> FilePath = SM.getNonBuiltinFilenameForID(Id);
- llvm::StringRef FileName =
- FilePath ? llvm::sys::path::filename(*FilePath) : llvm::StringRef();
-
- if (!NextToEnter)
- NextToEnter = Include{Id, FileName, SourceLocation()};
-
- assert(NextToEnter->Name == FileName);
- NextToEnter->Id = Id;
- Files.emplace_back(*NextToEnter);
- NextToEnter.reset();
+ const std::optional<StringRef> FilePath = SM.getNonBuiltinFilenameForID(Id);
+ const StringRef FileName =
+ FilePath ? llvm::sys::path::filename(*FilePath) : StringRef();
+ Files.push_back({NewFile, FileName, std::exchange(NextToEnter, {})});
}
void InclusionDirective(SourceLocation, const Token &, StringRef FilePath,
@@ -85,36 +81,26 @@ class CyclicDependencyCallbacks : public PPCallbacks {
if (FileType != clang::SrcMgr::C_User)
return;
- llvm::StringRef FileName = llvm::sys::path::filename(FilePath);
- NextToEnter = {FileID(), FileName, Range.getBegin()};
+ NextToEnter = Range.getBegin();
if (!File)
return;
- FileID Id = SM.translateFile(*File);
- if (Id.isInvalid())
- return;
-
- checkForDoubleInclude(Id, FileName, Range.getBegin());
+ checkForDoubleInclude(&File->getFileEntry(),
+ llvm::sys::path::filename(FilePath),
+ Range.getBegin());
}
- void EndOfMainFile() override {
- if (!Files.empty() && Files.back().Id == SM.getMainFileID())
- Files.pop_back();
-
- assert(Files.empty());
- }
-
- void checkForDoubleInclude(FileID Id, llvm::StringRef FileName,
+ void checkForDoubleInclude(const FileEntry *File, StringRef FileName,
SourceLocation Loc) {
- auto It =
- std::find_if(Files.rbegin(), Files.rend(),
- [&](const Include &Entry) { return Entry.Id == Id; });
+ const auto It =
+ llvm::find_if(llvm::reverse(Files),
+ [&](const Include &Entry) { return Entry.File == File; });
if (It == Files.rend())
return;
- const std::optional<StringRef> FilePath = SM.getNonBuiltinFilenameForID(Id);
- if (!FilePath || isFileIgnored(*FilePath))
+ const StringRef FilePath = File->tryGetRealPathName();
+ if (FilePath.empty() || isFileIgnored(FilePath))
return;
if (It == Files.rbegin()) {
@@ -145,8 +131,8 @@ class CyclicDependencyCallbacks : public PPCallbacks {
}
private:
- std::deque<Include> Files;
- std::optional<Include> NextToEnter;
+ std::vector<Include> Files;
+ SourceLocation NextToEnter;
HeaderIncludeCycleCheck &Check;
const SourceManager &SM;
std::vector<llvm::Regex> IgnoredFilesRegexes;
|
@llvm/pr-subscribers-clang-tidy Author: Victor Chernyakin (localspook) ChangesThe clangd folks have classified this check as Slow™, because enabling it carries a runtime overhead of more than 8% on their benchmark: ./build/bin/clangd --check=clang/lib/Sema/Sema.cpp --check-locations=0 --check-tidy-time=misc-header-include-cycle --compile-commands-dir=build (The benchmark is wrapped in a script, but that's what it boils down to.) Why the slowness? This line is to blame:
All our file comparisons are based on Now, I'm not an expert here, but I hope I'm not missing some reason that While we're at it, this PR adds a few other small efficiencies:
Full diff: https://github.com/llvm/llvm-project/pull/148757.diff 1 Files Affected:
diff --git a/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp b/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp
index 21d443a1f34ff..1bd51c5902cde 100644
--- a/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp
+++ b/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp
@@ -13,8 +13,8 @@
#include "clang/Lex/Preprocessor.h"
#include "llvm/Support/Regex.h"
#include <algorithm>
-#include <deque>
#include <optional>
+#include <vector>
using namespace clang::ast_matchers;
@@ -23,8 +23,8 @@ namespace clang::tidy::misc {
namespace {
struct Include {
- FileID Id;
- llvm::StringRef Name;
+ const FileEntry *File;
+ StringRef Name;
SourceLocation Loc;
};
@@ -50,31 +50,27 @@ class CyclicDependencyCallbacks : public PPCallbacks {
if (Reason != EnterFile && Reason != ExitFile)
return;
- FileID Id = SM.getFileID(Loc);
+ const FileID Id = SM.getFileID(Loc);
if (Id.isInvalid())
return;
+ const FileEntry *NewFile = SM.getFileEntryForID(Id);
+ const FileEntry *PrevFile = SM.getFileEntryForID(PrevFID);
+
if (Reason == ExitFile) {
- if ((Files.size() > 1U) && (Files.back().Id == PrevFID) &&
- (Files[Files.size() - 2U].Id == Id))
+ if ((Files.size() > 1U) && (Files.back().File == PrevFile) &&
+ (Files[Files.size() - 2U].File == NewFile))
Files.pop_back();
return;
}
- if (!Files.empty() && Files.back().Id == Id)
+ if (!Files.empty() && Files.back().File == NewFile)
return;
- std::optional<llvm::StringRef> FilePath = SM.getNonBuiltinFilenameForID(Id);
- llvm::StringRef FileName =
- FilePath ? llvm::sys::path::filename(*FilePath) : llvm::StringRef();
-
- if (!NextToEnter)
- NextToEnter = Include{Id, FileName, SourceLocation()};
-
- assert(NextToEnter->Name == FileName);
- NextToEnter->Id = Id;
- Files.emplace_back(*NextToEnter);
- NextToEnter.reset();
+ const std::optional<StringRef> FilePath = SM.getNonBuiltinFilenameForID(Id);
+ const StringRef FileName =
+ FilePath ? llvm::sys::path::filename(*FilePath) : StringRef();
+ Files.push_back({NewFile, FileName, std::exchange(NextToEnter, {})});
}
void InclusionDirective(SourceLocation, const Token &, StringRef FilePath,
@@ -85,36 +81,26 @@ class CyclicDependencyCallbacks : public PPCallbacks {
if (FileType != clang::SrcMgr::C_User)
return;
- llvm::StringRef FileName = llvm::sys::path::filename(FilePath);
- NextToEnter = {FileID(), FileName, Range.getBegin()};
+ NextToEnter = Range.getBegin();
if (!File)
return;
- FileID Id = SM.translateFile(*File);
- if (Id.isInvalid())
- return;
-
- checkForDoubleInclude(Id, FileName, Range.getBegin());
+ checkForDoubleInclude(&File->getFileEntry(),
+ llvm::sys::path::filename(FilePath),
+ Range.getBegin());
}
- void EndOfMainFile() override {
- if (!Files.empty() && Files.back().Id == SM.getMainFileID())
- Files.pop_back();
-
- assert(Files.empty());
- }
-
- void checkForDoubleInclude(FileID Id, llvm::StringRef FileName,
+ void checkForDoubleInclude(const FileEntry *File, StringRef FileName,
SourceLocation Loc) {
- auto It =
- std::find_if(Files.rbegin(), Files.rend(),
- [&](const Include &Entry) { return Entry.Id == Id; });
+ const auto It =
+ llvm::find_if(llvm::reverse(Files),
+ [&](const Include &Entry) { return Entry.File == File; });
if (It == Files.rend())
return;
- const std::optional<StringRef> FilePath = SM.getNonBuiltinFilenameForID(Id);
- if (!FilePath || isFileIgnored(*FilePath))
+ const StringRef FilePath = File->tryGetRealPathName();
+ if (FilePath.empty() || isFileIgnored(FilePath))
return;
if (It == Files.rbegin()) {
@@ -145,8 +131,8 @@ class CyclicDependencyCallbacks : public PPCallbacks {
}
private:
- std::deque<Include> Files;
- std::optional<Include> NextToEnter;
+ std::vector<Include> Files;
+ SourceLocation NextToEnter;
HeaderIncludeCycleCheck &Check;
const SourceManager &SM;
std::vector<llvm::Regex> IgnoredFilesRegexes;
|
Could you use |
It may be worth to mention improvements in Release Notes. |
It doesn't seem like preprocessor-based checks appear in the profile. You can see this by running, say, Since I can't get a precise profile, I ran a crude benchmark with Update: a bigger benchmark (still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, thank you for a improvement, I were aware about this flaw, but didn't wan't to spend time on something that worked, and there were lot of issues (corner cases) with this check when implementing it, thats why assert in EndOfMainFile existed.
Now, I'm not an expert here, but I hope I'm not missing some reason that FileEntry * is unacceptable, and FileID is the only solution? (The tests pass with both.)
There were no reason for that, just that API were used.
Overall looks fine for me, but need to be tested, if Files is empty on when test against llvm code base, then should be fine.
Release notes entry could be added about performance improvement to a check.
Issue with lack of profiling for predecessor callbacks is known but without adding decorator it could be hard to implement this.
@@ -145,8 +131,8 @@ class CyclicDependencyCallbacks : public PPCallbacks { | |||
} | |||
|
|||
private: | |||
std::deque<Include> Files; | |||
std::optional<Include> NextToEnter; | |||
std::vector<Include> Files; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deque were used because there are situation when you can have 200 includes and to avoid re-allocation.
You could consider using llvm::SmallVector with some pre-allocation, to speed things up even more.
The clangd folks have classified this check as Slow™, because enabling it carries a runtime overhead of more than 8% on their benchmark:
(The benchmark is wrapped in a script, but that's what it boils down to.)
I can reproduce that result locally (I see 11%). This PR brings it down to 3%.
Why the slowness? This line is to blame:
llvm-project/clang-tools-extra/clang-tidy/misc/HeaderIncludeCycleCheck.cpp
Line 94 in 2c67718
All our file comparisons are based on
FileID
s. That means we need aSourceLocation
→FileID
conversion inFileChanged
and aFileEntryRef
→FileID
conversion inInclusionDirective
, and that second conversion is slow. This PR changes our comparisons to be based onFileEntry *
; bothSourceLocation
→FileEntry *
andFileEntryRef
→FileEntry *
conversions are cheap.Now, I'm not an expert here, but I hope I'm not missing some reason that
FileEntry *
is unacceptable, andFileID
is the only solution? (The tests pass with both.)While we're at it, this PR adds a few other small efficiencies:
Files
is now astd::vector
; there's no reason for it to be astd::deque
.EndOfMainFile
override is deleted; by the time it's called, the check has nothing left to do.NextToEnter
is simplified a bit.