-
Notifications
You must be signed in to change notification settings - Fork 22
Emulate POSIX_SPAWN_CLOEXEC_DEFAULT in fork/exec #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6e9afb5
to
3e47e05
Compare
3e47e05
to
2d9a937
Compare
hi = getdtablesize(); | ||
} | ||
#else | ||
int hi = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does '1024' come from and how safe is it to default to this vs making this #error and requiring a specific implementation for new platforms?
Also, the following additional platforms support getdtablesize: FreeBSD, OpenBSD, NetBSD, Solaris, QNX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of #error
, how about if I just call getdtablesize
unconditionally here and have it naturally error on platforms that don't support it.
FreeBSD PR #1698 adds support for POSIX O_CLOFORK, which could simplify this implementation. |
Ahh thanks. Good to know. |
2d9a937
to
cd79141
Compare
char * _Nullable const args[_Nonnull], | ||
char * _Nullable const env[_Nullable], | ||
gid_t * _Nullable process_group_id | ||
int _shims_snprintf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this shim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Linux this function isn't exposed via import Glibc
struct dirent *dir_entry = NULL; | ||
while ((dir_entry = readdir(dir_ptr)) != NULL) { | ||
char *entry_name = dir_entry->d_name; | ||
int number = _positive_int_parse(entry_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the right way to do this? Seems very ad-hoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and racy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. So there are generally two ways to determine the highest fd number on Linux:
- Use
getrlimit
- Read
/proc/self/fd
The difference between the two is that 1) only tells you the limit whereas 2) tells you the actual highest fd opened. I decided to go for 2) because it's more accurate. Also since this code is executed betweenfork()
andexec()
, there is no threading issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and racy)
It's not racy because you're in a new process with just one thread. It's not ideal because it uses malloc
and in theory the malloc lock might be in a bad state because it's after fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, the problem with the limit is that the limit is often VERY high (i.e. 1M) which means you need to issue A LOT of closes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iCharlesHu I implemented a version of this in AsyncProcess
which should be (v)fork safe because it doesn't allocate: https://github.com/swiftlang/swift-sdk-generator/blob/3f4a6a800414010da0057985f80b2aba5dd4fc60/Sources/CProcessSpawnSync/internal-helpers.h#L57-L98
(glibc (LGPL!) also has an implementation: https://github.com/bminor/glibc/blob/6afece738c2b8408585272a95090ce5d5345dd19/sysdeps/unix/sysv/linux/closefrom_fallback.c#L31)
POSIX_SPAWN_CLOEXEC_DEFAULT is only available on Darwin. Emulate POSIX_SPAWN_CLOEXEC_DEFAULT on other platforms by calling close after fork, before exec. This commit also removes _subprocess_posix_spawn_fallback because we can't emulate POSIX_SPAWN_CLOEXEC_DEFAULT in a thread-safe manner while using posix_spawn.
cd79141
to
c217b39
Compare
Addressed review comments. |
// - Solaris 11.4 (August 2018) | ||
// - NetBSD 10.0 (March 2024) | ||
return posix_spawn_file_actions_addchdir(file_actions, path); | ||
# define _SUBPROCESS_SIG_MAX 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
POSIX 2024 added NSIG_MAX:
{NSIG_MAX}
Some historical implementations provided compile-time constants NSIG or SIGMAX to define the maximum number of signals the implementation supported, but these values did not necessarily reflect the number of signals that could be handled using a sigset_t. With the addition of real-time signals and the desire by some applications to be able to allocate additional real-time signals at run-time, neither of these constants provided a useable, portable value. {NSIG_MAX} was added to the standard to allow applications to determine the maximum number of signals that an implementation will support based on the size of the sigset_t type (defined in <signal.h>).
32 for Darwin, 64 for Linux, 128 for FreeBSD.
Here is what Ruby's vm_core.h does, I suggest we replicate that for maximum portability:
#if defined(NSIG_MAX) /* POSIX issue 8 */
# undef NSIG
# define NSIG NSIG_MAX
#elif defined(_SIG_MAXSIG) /* FreeBSD */
# undef NSIG
# define NSIG _SIG_MAXSIG
#elif defined(_SIGMAX) /* QNX */
# define NSIG (_SIGMAX + 1)
#elif defined(NSIG) /* 99% of everything else */
# /* take it */
#else /* Last resort */
# define NSIG (sizeof(sigset_t) * CHAR_BIT + 1)
#endif
https://github.com/ruby/ruby/blob/c584cc079eef2f4e314a97eff310c9947e1d7010/vm_core.h#L144-L156
hi = getdtablesize(); | ||
} | ||
#else | ||
int hi = getdtablesize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux man page recommends that portable applications call sysconf(_SC_OPEN_MAX) instead (which is defined by POSIX).
flags |= POSIX_SPAWN_SETPGROUP; | ||
rc = posix_spawnattr_setpgroup(&spawn_attr, *process_group_id); | ||
if (rc != 0) { return rc; } | ||
static int _highest_possibly_open_fd(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this platform-conditional? On Linux, /dev/fd exists as symlinks, is there any reason not to always use /dev/fd? That should work on pretty much all platforms.
output: .string, | ||
error: .string | ||
) | ||
close(pipe.writeEnd.rawValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will end up closing writeEnd twice because of the close in the defer block above, which looks unsafe. And I can't see how that behavior is relevant to the test. Instead use the closeAfter
API from FileDescriptor. It doesn't provide an async
overload but here's a polyfill I wrote:
extension FileDescriptor {
/// Runs a closure and then closes the FileDescriptor, even if an error occurs.
///
/// - Parameter body: The closure to run.
/// If the closure throws an error,
/// this method closes the file descriptor before it rethrows that error.
///
/// - Returns: The value returned by the closure.
///
/// If `body` throws an error
/// or an error occurs while closing the file descriptor,
/// this method rethrows that error.
public func closeAfter<R>(_ body: () async throws -> R) async throws -> R {
// No underscore helper, since the closure's throw isn't necessarily typed.
let result: R
do {
result = try await body()
} catch {
_ = try? self.close() // Squash close error and throw closure's
throw error
}
try self.close()
return result
}
}
Just a heads up - the PR has merged. I think the timing means it'll likely make it into the upcoming FreeBSD 15 release in December. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Linux CI failure looks relevant
[7/22] Compiling CSystem shims.c
/__w/swift-subprocess/swift-subprocess/Sources/_SubprocessCShims/process_shims.c:318:28: error: use of undeclared identifier 'val'
318 | if (errno == ERANGE || val <= 0 || val > INT_MAX) {
|
&file_actions, file_descriptors[2], STDOUT_FILENO | ||
); | ||
if (rc != 0) { return rc; } | ||
if (errno == ERANGE || val <= 0 || val > INT_MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what's causing the Linux CI failure
[7/22] Compiling CSystem shims.c
/__w/swift-subprocess/swift-subprocess/Sources/_SubprocessCShims/process_shims.c:318:28: error: use of undeclared identifier 'val'
318 | if (errno == ERANGE || val <= 0 || val > INT_MAX) {
Should be value
instead?
POSIX_SPAWN_CLOEXEC_DEFAULT is only available on Darwin. Emulate POSIX_SPAWN_CLOEXEC_DEFAULT on other platforms by calling close after fork, before exec.
This commit also removes _subprocess_posix_spawn_fallback because we can't emulate POSIX_SPAWN_CLOEXEC_DEFAULT in a thread-safe manner while using posix_spawn.
Resolves: #46