[lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. #147134

johnno1962 · 2025-07-05T08:48:36Z

This PR adds a new option to lld --read-threads=20 that defers all disk I/o then performs it multithreaded so the process is never stalled waiting for the I/o of the page-in of mapped input files. This results in a saving of elapsed time. For a large link (iterating on Chromium) these are the baseline linkage times saving a single file and rebuilding (seconds inside Xcode):

26.01, 25.84, 26.15, 26.03, 27.10, 25.90, 25.86, 25.81, 25.80, 25.87

With the proposed code change, and using the --read-threads=20 option, the linking times reduce to the following:

21.13, 20.35, 20.01, 20.01, 20.30, 20.39, 19.97, 20.23, 20.17, 20.23

The secret sauce is in the new function multiThreadedPageIn() in Driver.cpp. Without the option lld behaves as before.

Edit: with subsequent commits I've taken this novel i/o approach to its full potential. Latest linking times are now:

13.2, 11.9, 12.12, 12.01, 11.99, 13.11, 11.93, 11.95, 12.18, 11.97

Chrome is still linking and running so it doesn't look like anything is broken. Despite being multi-threaded all memory access is readonly and the original code paths are not changed. All that is happening is the system is being asked to proactively page in files rather than waiting for processing to page fault which would otherwise stall the process.

llvmbot · 2025-07-05T08:49:09Z

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: John Holdsworth (johnno1962)

Changes

This PR adds a new option to lld --read-threads=20 that defers all disk I/o then performs it multithreaded so the process is never stalled waiting for the I/o of the page-in of mapped files resulting in a saving of elapsed time. For a large link (iterating on Chromium project) these are the baseline linkage times saving a single file and rebuilding (seconds):

26.01, 25.84, 26.15, 26.03, 27.10, 25.90, 25.86, 25.81, 25.80, 25.87

With the proposed code change, and using the --read-threads=20 option, the linking times reduce to the following:

21.13, 20.35, 20.01, 20.01, 20.30, 20.39, 19.97, 20.23, 20.17, 20.23

The secret sauce is in the new function multiThreadedPageIn() in Driver.cpp. Without the option set lld behaves as before.

Full diff: https://github.com/llvm/llvm-project/pull/147134.diff

3 Files Affected:

(modified) lld/MachO/Config.h (+1)
(modified) lld/MachO/Driver.cpp (+94-10)
(modified) lld/MachO/Options.td (+3)

diff --git a/lld/MachO/Config.h b/lld/MachO/Config.h
index a01e60efbe761..92c6eb85f4123 100644
--- a/lld/MachO/Config.h
+++ b/lld/MachO/Config.h
@@ -186,6 +186,7 @@ struct Configuration {
   bool interposable = false;
   bool errorForArchMismatch = false;
   bool ignoreAutoLink = false;
+  int readThreads = 0;
   // ld64 allows invalid auto link options as long as the link succeeds. LLD
   // does not, but there are cases in the wild where the invalid linker options
   // exist. This allows users to ignore the specific invalid options in the case
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 9eb391c4ee1b9..a244f2781c22c 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -47,6 +47,7 @@
 #include "llvm/Support/TarWriter.h"
 #include "llvm/Support/TargetSelect.h"
 #include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/Process.h"
 #include "llvm/TargetParser/Host.h"
 #include "llvm/TextAPI/Architecture.h"
 #include "llvm/TextAPI/PackedVersion.h"
@@ -282,11 +283,11 @@ static void saveThinArchiveToRepro(ArchiveFile const *file) {
           ": Archive::children failed: " + toString(std::move(e)));
 }
 
-static InputFile *addFile(StringRef path, LoadType loadType,
-                          bool isLazy = false, bool isExplicit = true,
-                          bool isBundleLoader = false,
-                          bool isForceHidden = false) {
-  std::optional<MemoryBufferRef> buffer = readFile(path);
+static InputFile *deferredAddFile(std::optional<MemoryBufferRef> buffer,
+                                  StringRef path, LoadType loadType,
+                                  bool isLazy = false, bool isExplicit = true,
+                                  bool isBundleLoader = false,
+                                  bool isForceHidden = false) {
   if (!buffer)
     return nullptr;
   MemoryBufferRef mbref = *buffer;
@@ -441,6 +442,14 @@ static InputFile *addFile(StringRef path, LoadType loadType,
   return newFile;
 }
 
+static InputFile *addFile(StringRef path, LoadType loadType,
+                          bool isLazy = false, bool isExplicit = true,
+                          bool isBundleLoader = false,
+                          bool isForceHidden = false) {
+    return deferredAddFile(readFile(path), path, loadType, isLazy,
+                           isExplicit, isBundleLoader, isForceHidden);
+}
+
 static std::vector<StringRef> missingAutolinkWarnings;
 static void addLibrary(StringRef name, bool isNeeded, bool isWeak,
                        bool isReexport, bool isHidden, bool isExplicit,
@@ -564,13 +573,21 @@ void macho::resolveLCLinkerOptions() {
   }
 }
 
-static void addFileList(StringRef path, bool isLazy) {
+typedef struct { StringRef path; std::optional<MemoryBufferRef> buffer; } DeferredFile;
+
+static void addFileList(StringRef path, bool isLazy,
+  std::vector<DeferredFile> &deferredFiles, int readThreads) {
   std::optional<MemoryBufferRef> buffer = readFile(path);
   if (!buffer)
     return;
   MemoryBufferRef mbref = *buffer;
   for (StringRef path : args::getLines(mbref))
-    addFile(rerootPath(path), LoadType::CommandLine, isLazy);
+    if (readThreads) {
+      StringRef rrpath = rerootPath(path);
+      deferredFiles.push_back({rrpath, readFile(rrpath)});
+    }
+    else
+      addFile(rerootPath(path), LoadType::CommandLine, isLazy);
 }
 
 // We expect sub-library names of the form "libfoo", which will match a dylib
@@ -1215,13 +1232,61 @@ static void handleSymbolPatterns(InputArgList &args,
     parseSymbolPatternsFile(arg, symbolPatterns);
 }
 
-static void createFiles(const InputArgList &args) {
+// Most input files have been mapped but not yet paged in.
+// This code forces the page-ins on multiple threads so
+// the process is not stalled waiting on disk buffer i/o.
+void multiThreadedPageIn(std::vector<DeferredFile> &deferred, int nthreads) {
+    typedef struct {
+        std::vector<DeferredFile> &deferred;
+        size_t counter, total, pageSize;
+        pthread_mutex_t mutex;
+    } PageInState;
+    PageInState state = {deferred, 0, 0,
+        llvm::sys::Process::getPageSizeEstimate(), pthread_mutex_t()};
+    pthread_mutex_init(&state.mutex, NULL);
+
+    pthread_t running[200];
+    int maxthreads = sizeof running / sizeof running[0];
+    if (nthreads > maxthreads)
+        nthreads = maxthreads;
+    for (int t=0; t<nthreads; t++)
+        pthread_create(&running[t], nullptr, [](void* ptr) -> void*{
+            PageInState &state = *(PageInState *)ptr;
+            static int total = 0;
+            while (true) {
+                pthread_mutex_lock(&state.mutex);
+                if (state.counter >= state.deferred.size()) {
+                    pthread_mutex_unlock(&state.mutex);
+                    return nullptr;
+                }
+                DeferredFile &add = state.deferred[state.counter];
+                state.counter += 1;
+                pthread_mutex_unlock(&state.mutex);
+
+                int t = 0; // Reference each page to load it into memory.
+                for (const char *start = add.buffer->getBuffer().data(),
+                     *page = start; page<start+add.buffer->getBuffer().size();
+                     page += state.pageSize)
+                    t += *page;
+                state.total += t; // Avoids whole section being optimised out.
+            }
+        }, &state);
+
+    for (int t=0; t<nthreads; t++)
+        pthread_join(running[t], nullptr);
+
+    pthread_mutex_destroy(&state.mutex);
+}
+
+void createFiles(const InputArgList &args, int readThreads) {
   TimeTraceScope timeScope("Load input files");
   // This loop should be reserved for options whose exact ordering matters.
   // Other options should be handled via filtered() and/or getLastArg().
   bool isLazy = false;
   // If we've processed an opening --start-lib, without a matching --end-lib
   bool inLib = false;
+  std::vector<DeferredFile> deferredFiles;
+
   for (const Arg *arg : args) {
     const Option &opt = arg->getOption();
     warnIfDeprecatedOption(opt);
@@ -1229,6 +1294,11 @@ static void createFiles(const InputArgList &args) {
 
     switch (opt.getID()) {
     case OPT_INPUT:
+      if (readThreads) {
+        StringRef rrpath = rerootPath(arg->getValue());
+        deferredFiles.push_back({rrpath,readFile(rrpath)});
+        break;
+      }
       addFile(rerootPath(arg->getValue()), LoadType::CommandLine, isLazy);
       break;
     case OPT_needed_library:
@@ -1249,7 +1319,7 @@ static void createFiles(const InputArgList &args) {
         dylibFile->forceWeakImport = true;
       break;
     case OPT_filelist:
-      addFileList(arg->getValue(), isLazy);
+      addFileList(arg->getValue(), isLazy, deferredFiles, readThreads);
       break;
     case OPT_force_load:
       addFile(rerootPath(arg->getValue()), LoadType::CommandLineForce);
@@ -1295,6 +1365,12 @@ static void createFiles(const InputArgList &args) {
       break;
     }
   }
+
+  if (readThreads) {
+    multiThreadedPageIn(deferredFiles, readThreads);
+    for (auto &add : deferredFiles)
+      deferredAddFile(add.buffer, add.path, LoadType::CommandLine, isLazy);
+  }
 }
 
 static void gatherInputSections() {
@@ -1687,6 +1763,14 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
     }
   }
 
+  if (auto *arg = args.getLastArg(OPT_read_threads)) {
+    StringRef v(arg->getValue());
+    unsigned threads = 0;
+    if (!llvm::to_integer(v, threads, 0) || threads < 0)
+      error(arg->getSpelling() + ": expected a positive integer, but got '" +
+            arg->getValue() + "'");
+    config->readThreads = threads;
+  }
   if (auto *arg = args.getLastArg(OPT_threads_eq)) {
     StringRef v(arg->getValue());
     unsigned threads = 0;
@@ -2107,7 +2191,7 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
     TimeTraceScope timeScope("ExecuteLinker");
 
     initLLVM(); // must be run before any call to addFile()
-    createFiles(args);
+    createFiles(args, config->readThreads);
 
     // Now that all dylibs have been loaded, search for those that should be
     // re-exported.
diff --git a/lld/MachO/Options.td b/lld/MachO/Options.td
index 4f0602f59812b..3dc98fccc1b7b 100644
--- a/lld/MachO/Options.td
+++ b/lld/MachO/Options.td
@@ -396,6 +396,9 @@ def dead_strip : Flag<["-"], "dead_strip">,
 def interposable : Flag<["-"], "interposable">,
     HelpText<"Indirects access to all exported symbols in an image">,
     Group<grp_opts>;
+def read_threads : Joined<["--"], "read-threads=">,
+    HelpText<"Number of threads to use paging in files.">,
+    Group<grp_lld>;
 def order_file : Separate<["-"], "order_file">,
     MetaVarName<"<file>">,
     HelpText<"Layout functions and data according to specification in <file>">,

github-actions · 2025-07-05T08:50:55Z

✅ With the latest revision this PR passed the C/C++ code formatter.

johnno1962 · 2025-07-06T17:14:00Z

The last commit was to also use the threaded page-in approach with object files in archives. The last ten linking times were:

19.45, 15.43, 15.45, 13.43, 12.30, 12.98, 12.10, 15.35, 15.13, 15.69

Looking at activity monitor as shell cycles through a link+sleep 15, i/o is far more concentrated now (as it should be).

johnno1962 · 2025-07-13T10:24:52Z

I would like to hear more feedback about the structure of the code. I would certainly do things differently, but I have not tried to solve the problem, so I might be thinking other structure is possible when you already have looked at those options. Sorry if I am proposing something you tested and didn't work.

I think the structure is fine even after these changes which are still a proof of concept. If this PR merges you may want to make another pass over the code to structure it how you like. If you have any specific changes you want to make now, for example to the parallelFor loop, let me know and I can benchmark them.

lld/MachO/Driver.cpp

Co-authored-by: Daniel Rodríguez Troitiño <drodrigueztroitino@gmail.com>

johnno1962 · 2025-07-15T17:02:06Z

Some more timing data. Up till now I've been benchmarking linking on an external SSD where the baseline (cold link, without using the new --read-threads=20 option) is about 25-26 seconds, using the option, 12-13, re-linking immediately a second time 7 seconds and the new option makes no difference. You only need to wait 15 second between links for the external drive data to be flushed from the disk cache and the linking time to revert to 12-13 seconds with the option.

Recently, I moved the entire chrome build directory onto my local drive and for linking the baseline times are now 19 seconds, cold link, 11 seconds with --read-threads=20 and 6 seconds with a "warm" link where the data is already in memory (my Mac has 64GB). Oddly, the amount of time for the machine to forget this data it has just loaded is different on the local drive and you can perform a subsequent link much later and still get the "warm" link time of 6 seconds unless you perform some other heavy task e.g. compiling or foregrounding Xcode which pushes the disk pages out of memory. I guess this explains how this issue has remained under the radar all this time.

lld/MachO/Driver.cpp

lld/MachO/Options.td

lld/MachO/Driver.cpp

ellishg · 2025-07-17T18:54:20Z

lld/MachO/Driver.cpp

+    }
+  });
+
+  if (getenv("LLD_MULTI_THREAD_PAGE"))


Should we guard this with #ifndef NDEBUG?

I reverted this as it was generating a warning: totalBytes set but not used.

If totalBytes is only used in the debug builds, its code can be surrounded in the LLVM_DEBUG() macro to avoid those unused warnings.

#ifndef NDEBUG static size_t totalBytes = 0; #endif ... LLVM_DEBUG(totalBytes += buff.size()); ... LLVM_DEBUG( if (getenv("LLD_MULTI_THREAD_PAGE")) llvm::dbgs() << "multiThreadedPageIn " << totalBytes << "/" << deferred.size() << "\n" );

I'm using a RelWithDebugInfo build so I'd like to leave those messages in for now. I personally think they could stay in as they do not involve significant computation, are unlocked by an improbable environment variable and could be useful in a release build but once we've tuned and decided on the inner loop we can decide on this.

lld/MachO/Driver.cpp

ellishg · 2025-07-17T19:19:23Z

By the way, I found TaskQueue in Support/ which seems to do what we want. Could we use that?
https://github.com/microsoft/llvm/blob/f270d88e8d2c496285111e9a600513d460df4633/include/llvm/Support/TaskQueue.h#L32-L35

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>

lld/MachO/Driver.cpp

drodriguez

@ellishg

By the way, I found TaskQueue in Support/ which seems to do what we want. Could we use that?
https://github.com/microsoft/llvm/blob/f270d88e8d2c496285111e9a600513d460df4633/include/llvm/Support/TaskQueue.h#L32-L35

That seems from a fork of LLVM. I couldn't find a good parallel queue abstraction already in LLVM. Parallel.h is the closest, with some other bits in Threading.h and ThreadPool.h which might be helpful.

lld/MachO/Driver.cpp

drodriguez · 2025-07-18T19:14:02Z

lld/MachO/Driver.cpp

+    }
+  });
+
+  if (getenv("LLD_MULTI_THREAD_PAGE"))


If totalBytes is only used in the debug builds, its code can be surrounded in the LLVM_DEBUG() macro to avoid those unused warnings.

#ifndef NDEBUG static size_t totalBytes = 0; #endif ... LLVM_DEBUG(totalBytes += buff.size()); ... LLVM_DEBUG( if (getenv("LLD_MULTI_THREAD_PAGE")) llvm::dbgs() << "multiThreadedPageIn " << totalBytes << "/" << deferred.size() << "\n" );

drodriguez · 2025-07-18T19:34:39Z

lld/MachO/Driver.cpp

+  static size_t totalBytes = 0;
+  std::atomic_int index = 0;
+
+  parallelFor(0, config->readThreads, [&](size_t I) {


Because of how you are using parallelFor, config->readThreads is only the maximum number of threads that might be spawn for this, not the exact number of threads, which can be lower, governed by the -threads parameter. parallelFor uses llvm::parallel::strategy internally to decide the actual number of threads, which is setup globally when the driver finds the -threads argument.

Using parallelFor as its authors intended will also avoid the need of the index variable and keeping track of it. I provided a snippet before of how I would switch the strategy to one that fits your idea of having a different number of threads for reading files. It should be in some old comment.

See my other comment. I'd missed yours. I'm only following the benchmarks as this is the thrust of this PR.

lld/MachO/Driver.cpp

Co-authored-by: Daniel Rodríguez Troitiño <drodrigueztroitino@gmail.com>

llvmbot added lld lld:MachO labels Jul 5, 2025

johnno1962 force-pushed the threaded-paging branch 6 times, most recently from b66eb42 to fd5647a Compare July 5, 2025 11:39

johnno1962 changed the title ~~[lld][Macho]Multi-threaded disk i/o. 20% speedup linking a large project.~~ [lld][Macho]Multi-threaded i/o. 20% speedup linking a large project. Jul 5, 2025

Multi-threaded disk i/o.

c55b5b2

johnno1962 force-pushed the threaded-paging branch 5 times, most recently from 9acbaea to 47bad1d Compare July 6, 2025 10:44

Afterthoughts.

3d11a33

johnno1962 force-pushed the threaded-paging branch from 47bad1d to 3d11a33 Compare July 6, 2025 10:49

carlocab requested review from nico and BertalanD and removed request for nico July 6, 2025 16:10

johnno1962 force-pushed the threaded-paging branch 3 times, most recently from a324caa to 6936449 Compare July 6, 2025 16:56

johnno1962 changed the title ~~[lld][Macho]Multi-threaded i/o. 20% speedup linking a large project.~~ [lld][MachO]Multi-threaded i/o. 40% speedup linking a large project. Jul 6, 2025

johnno1962 force-pushed the threaded-paging branch 2 times, most recently from fdc4c38 to 767b7b1 Compare July 6, 2025 19:14

johnno1962 force-pushed the threaded-paging branch 2 times, most recently from 3099438 to eb420d2 Compare July 12, 2025 20:15

Semms to make a difference.

c07e168

johnno1962 force-pushed the threaded-paging branch from eb420d2 to c07e168 Compare July 12, 2025 20:19

drodriguez reviewed Jul 14, 2025

View reviewed changes

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

lld/MachO/Driver.cpp Show resolved Hide resolved

johnno1962 and others added 2 commits July 15, 2025 09:19

Update lld/MachO/Driver.cpp

eb4827c

Co-authored-by: Daniel Rodríguez Troitiño <drodrigueztroitino@gmail.com>

De-Obfuscate loop and thread reaping.

ce93ae3

drodriguez reviewed Jul 16, 2025

View reviewed changes

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

Avoiding possible deadlock.

c47e5c3

ellishg reviewed Jul 17, 2025

View reviewed changes

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

johnno1962 and others added 5 commits July 17, 2025 22:45

Update lld/MachO/Options.td

5caf5a6

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>

Update lld/MachO/Driver.cpp

890c492

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>

Update lld/MachO/Driver.cpp

9714785

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>

Update lld/MachO/Driver.cpp

85fd77f

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>

Fourth review.

6f5f7cb

johnno1962 force-pushed the threaded-paging branch from 510a036 to 6f5f7cb Compare July 17, 2025 21:36

ellishg reviewed Jul 17, 2025

View reviewed changes

lld/MachO/Driver.cpp Show resolved Hide resolved

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

lld/MachO/Driver.cpp Outdated Show resolved Hide resolved

johnno1962 added 2 commits July 18, 2025 00:13

Switch to std::atomic_int.

e3e0369

Switch to std::unique_ptr.

febf5a9

johnno1962 force-pushed the threaded-paging branch from de31208 to febf5a9 Compare July 17, 2025 22:54

Remove a couple of warnings.

a5f7a42

johnno1962 force-pushed the threaded-paging branch from 8f6d070 to a5f7a42 Compare July 18, 2025 10:12

Try LLVM_ATTRIBUTE_UNUSED

6b874b2

drodriguez reviewed Jul 18, 2025

View reviewed changes

johnno1962 and others added 2 commits July 18, 2025 22:12

Update lld/MachO/Driver.cpp

84154d4

Co-authored-by: Daniel Rodríguez Troitiño <drodrigueztroitino@gmail.com>

Comparing inner loops.

ed9f07e

[lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. #147134

Are you sure you want to change the base?

[lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. #147134

Conversation

johnno1962 commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnno1962 commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnno1962 commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnno1962 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellishg commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drodriguez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnno1962 commented Jul 5, 2025 •

edited

Loading

llvmbot commented Jul 5, 2025 •

edited

Loading

github-actions bot commented Jul 5, 2025 •

edited

Loading

johnno1962 commented Jul 6, 2025 •

edited

Loading

johnno1962 commented Jul 13, 2025 •

edited

Loading

johnno1962 commented Jul 15, 2025 •

edited

Loading