Skip to content

[lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. #147134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions lld/MachO/Config.h
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ struct Configuration {
bool interposable = false;
bool errorForArchMismatch = false;
bool ignoreAutoLink = false;
int readThreads = 0;
// ld64 allows invalid auto link options as long as the link succeeds. LLD
// does not, but there are cases in the wild where the invalid linker options
// exist. This allows users to ignore the specific invalid options in the case
Expand Down
167 changes: 157 additions & 10 deletions lld/MachO/Driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
#include "lld/Common/Reproduce.h"
#include "lld/Common/Version.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/BinaryFormat/MachO.h"
Expand All @@ -41,11 +42,14 @@
#include "llvm/Object/Archive.h"
#include "llvm/Option/ArgList.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Parallel.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/Process.h"
#include "llvm/Support/TarWriter.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/Threading.h"
#include "llvm/Support/TimeProfiler.h"
#include "llvm/TargetParser/Host.h"
#include "llvm/TextAPI/Architecture.h"
Expand Down Expand Up @@ -282,11 +286,100 @@ static void saveThinArchiveToRepro(ArchiveFile const *file) {
": Archive::children failed: " + toString(std::move(e)));
}

static InputFile *addFile(StringRef path, LoadType loadType,
bool isLazy = false, bool isExplicit = true,
bool isBundleLoader = false,
bool isForceHidden = false) {
std::optional<MemoryBufferRef> buffer = readFile(path);
class DeferredFile {
public:
DeferredFile(StringRef path, bool isLazy, MemoryBufferRef buffer)
: path(path), isLazy(isLazy), buffer(buffer) {}
StringRef path;
bool isLazy;
MemoryBufferRef buffer;
};
using DeferredFiles = std::vector<DeferredFile>;

// Most input files have been mapped but not yet paged in.
// This code forces the page-ins on multiple threads so
// the process is not stalled waiting on disk buffer i/o.
void multiThreadedPageInBackground(DeferredFiles &deferred) {
static const size_t pageSize = Process::getPageSizeEstimate();
#if 0
ThreadPoolStrategy oldStrategy = llvm::parallel::strategy;
(void)llvm::make_scope_exit([&]() { llvm::parallel::strategy = oldStrategy; });
llvm::parallel::strategy = llvm::hardware_concurrency(config->readThreads);

size_t totalBytes = parallelTransformReduce(deferred, 0,
[](size_t acc, size_t size) { return acc + size; },
[&](DeferredFile &file) {
const StringRef &buffer = file.buffer.getBuffer();
for (const char *page = buffer.data(), *end = page + buffer.size();
page < end; page += pageSize)
LLVM_ATTRIBUTE_UNUSED volatile char t = *page;
return buffer.size();
}
);
#else
static size_t totalBytes = 0;
std::atomic_int index = 0;

parallelFor(0, config->readThreads, [&](size_t I) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of how you are using parallelFor, config->readThreads is only the maximum number of threads that might be spawn for this, not the exact number of threads, which can be lower, governed by the -threads parameter. parallelFor uses llvm::parallel::strategy internally to decide the actual number of threads, which is setup globally when the driver finds the -threads argument.

Using parallelFor as its authors intended will also avoid the need of the index variable and keeping track of it. I provided a snippet before of how I would switch the strategy to one that fits your idea of having a different number of threads for reading files. It should be in some old comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment. I'd missed yours. I'm only following the benchmarks as this is the thrust of this PR.

while (true) {
int localIndex = index.fetch_add(1);
if (localIndex >= (int)deferred.size())
break;
const StringRef &buff = deferred[localIndex].buffer.getBuffer();
totalBytes += buff.size();

// Reference all file's mmap'd pages to load them into memory.
for (const char *page = buff.data(), *end = page + buff.size();
page < end; page += pageSize)
LLVM_ATTRIBUTE_UNUSED volatile char t = *page;
}
});
#endif
if (getenv("LLD_MULTI_THREAD_PAGE"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we guard this with #ifndef NDEBUG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this as it was generating a warning: totalBytes set but not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If totalBytes is only used in the debug builds, its code can be surrounded in the LLVM_DEBUG() macro to avoid those unused warnings.

#ifndef NDEBUG
static size_t totalBytes = 0;
#endif
...
LLVM_DEBUG(totalBytes += buff.size());
...
LLVM_DEBUG(
if (getenv("LLD_MULTI_THREAD_PAGE"))
  llvm::dbgs() << "multiThreadedPageIn " << totalBytes << "/"
                 << deferred.size() << "\n"
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using a RelWithDebugInfo build so I'd like to leave those messages in for now. I personally think they could stay in as they do not involve significant computation, are unlocked by an improbable environment variable and could be useful in a release build but once we've tuned and decided on the inner loop we can decide on this.

llvm::dbgs() << "multiThreadedPageIn " << totalBytes << "/"
<< deferred.size() << "\n";
}

static void multiThreadedPageIn(const DeferredFiles &deferred) {
static std::deque<std::unique_ptr<DeferredFiles>> queue;
static std::thread *running;
static std::mutex mutex;

mutex.lock();
if (running && (queue.empty() || deferred.empty())) {
mutex.unlock();
running->join();
mutex.lock();
delete running;
running = nullptr;
}

if (!deferred.empty()) {
queue.emplace_back(std::make_unique<DeferredFiles>(deferred));
if (!running)
running = new std::thread([&]() {
while (true) {
mutex.lock();
if (queue.empty()) {
mutex.unlock();
break;
}
DeferredFiles deferred(*queue.front());
queue.pop_front();
mutex.unlock();
multiThreadedPageInBackground(deferred);
}
});
}
mutex.unlock();
}

static InputFile *processFile(std::optional<MemoryBufferRef> buffer,
DeferredFiles *archiveContents, StringRef path,
LoadType loadType, bool isLazy = false,
bool isExplicit = true,
bool isBundleLoader = false,
bool isForceHidden = false) {
if (!buffer)
return nullptr;
MemoryBufferRef mbref = *buffer;
Expand Down Expand Up @@ -379,6 +472,8 @@ static InputFile *addFile(StringRef path, LoadType loadType,
continue;
}

if (archiveContents)
archiveContents->emplace_back(path, isLazy, *mb);
if (!hasObjCSection(*mb))
continue;
if (Error e = file->fetch(c, "-ObjC"))
Expand All @@ -390,7 +485,8 @@ static InputFile *addFile(StringRef path, LoadType loadType,
": Archive::children failed: " + toString(std::move(e)));
}
}
file->addLazySymbols();
if (!archiveContents || archiveContents->empty())
file->addLazySymbols();
loadedArchives[path] = ArchiveFileInfo{file, isCommandLineLoad};
newFile = file;
break;
Expand Down Expand Up @@ -441,6 +537,24 @@ static InputFile *addFile(StringRef path, LoadType loadType,
return newFile;
}

static InputFile *addFile(StringRef path, LoadType loadType,
bool isLazy = false, bool isExplicit = true,
bool isBundleLoader = false,
bool isForceHidden = false) {
return processFile(readFile(path), nullptr, path, loadType, isLazy,
isExplicit, isBundleLoader, isForceHidden);
}

static void deferFile(StringRef path, bool isLazy, DeferredFiles &deferred) {
std::optional<MemoryBufferRef> buffer = readFile(path);
if (!buffer)
return;
if (config->readThreads)
deferred.emplace_back(path, isLazy, *buffer);
else
processFile(buffer, nullptr, path, LoadType::CommandLine, isLazy);
}

static std::vector<StringRef> missingAutolinkWarnings;
static void addLibrary(StringRef name, bool isNeeded, bool isWeak,
bool isReexport, bool isHidden, bool isExplicit,
Expand Down Expand Up @@ -564,13 +678,14 @@ void macho::resolveLCLinkerOptions() {
}
}

static void addFileList(StringRef path, bool isLazy) {
static void addFileList(StringRef path, bool isLazy,
DeferredFiles &deferredFiles) {
std::optional<MemoryBufferRef> buffer = readFile(path);
if (!buffer)
return;
MemoryBufferRef mbref = *buffer;
for (StringRef path : args::getLines(mbref))
addFile(rerootPath(path), LoadType::CommandLine, isLazy);
deferFile(rerootPath(path), isLazy, deferredFiles);
}

// We expect sub-library names of the form "libfoo", which will match a dylib
Expand Down Expand Up @@ -1222,14 +1337,16 @@ static void createFiles(const InputArgList &args) {
bool isLazy = false;
// If we've processed an opening --start-lib, without a matching --end-lib
bool inLib = false;
DeferredFiles deferredFiles;

for (const Arg *arg : args) {
const Option &opt = arg->getOption();
warnIfDeprecatedOption(opt);
warnIfUnimplementedOption(opt);

switch (opt.getID()) {
case OPT_INPUT:
addFile(rerootPath(arg->getValue()), LoadType::CommandLine, isLazy);
deferFile(rerootPath(arg->getValue()), isLazy, deferredFiles);
break;
case OPT_needed_library:
if (auto *dylibFile = dyn_cast_or_null<DylibFile>(
Expand All @@ -1249,7 +1366,7 @@ static void createFiles(const InputArgList &args) {
dylibFile->forceWeakImport = true;
break;
case OPT_filelist:
addFileList(arg->getValue(), isLazy);
addFileList(arg->getValue(), isLazy, deferredFiles);
break;
case OPT_force_load:
addFile(rerootPath(arg->getValue()), LoadType::CommandLineForce);
Expand Down Expand Up @@ -1295,6 +1412,28 @@ static void createFiles(const InputArgList &args) {
break;
}
}

if (config->readThreads) {
multiThreadedPageIn(deferredFiles);

DeferredFiles archiveContents;
std::vector<ArchiveFile *> archives;
for (auto &file : deferredFiles) {
auto inputFile = processFile(file.buffer, &archiveContents, file.path,
LoadType::CommandLine, file.isLazy);
if (ArchiveFile *archive = dyn_cast<ArchiveFile>(inputFile))
archives.push_back(archive);
}

if (!archiveContents.empty()) {
multiThreadedPageIn(archiveContents);
for (auto *archive : archives)
archive->addLazySymbols();
}

DeferredFiles reapThreads;
multiThreadedPageIn(reapThreads);
}
}

static void gatherInputSections() {
Expand Down Expand Up @@ -1687,6 +1826,14 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
}
}

if (auto *arg = args.getLastArg(OPT_read_threads)) {
StringRef v(arg->getValue());
unsigned threads = 0;
if (!llvm::to_integer(v, threads, 0) || threads < 0)
error(arg->getSpelling() + ": expected a positive integer, but got '" +
arg->getValue() + "'");
config->readThreads = threads;
}
if (auto *arg = args.getLastArg(OPT_threads_eq)) {
StringRef v(arg->getValue());
unsigned threads = 0;
Expand Down
3 changes: 3 additions & 0 deletions lld/MachO/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,9 @@ def dead_strip : Flag<["-"], "dead_strip">,
def interposable : Flag<["-"], "interposable">,
HelpText<"Indirects access to all exported symbols in an image">,
Group<grp_opts>;
def read_threads : Joined<["--"], "read-threads=">,
HelpText<"Number of threads to use to proactively page in files for faster disk i/o. 20 would be a typical value, use 0 to disable this feature.">,
Group<grp_lld>;
def order_file : Separate<["-"], "order_file">,
MetaVarName<"<file>">,
HelpText<"Layout functions and data according to specification in <file>">,
Expand Down