making it a shell script #1

KrishMunot · 2016-07-24T04:36:17Z

No description provided.

bolinfest · 2016-08-08T15:17:46Z

The absence of the file extension is intentional. This way, we can change it to Python, Ruby, etc. without renaming the script and breaking existing workflows.

Summary: We can't allow ~EdenServer to delete the memory until we're sure that the other threads are done. To ensure that, we need to notify the condition variable while the aux thread still holds the lock. This makes sure that the thread destroying the EdenServer waits for the aux thread to release the lock before we check the predicate and proceed to deleting the memory. ``` SUMMARY ThreadSanitizer: data race / /common/concurrency/Event.cpp:107 in facebook::common::concurrency::Event::set() const ================== I0909 14:51:18.543072 4147554 main.cpp:173] edenfs performing orderly shutdown I0909 14:51:18.555794 4148654 Channel.cpp:177] session completed I0909 14:51:18.556011 4148654 EdenServer.cpp:192] mount point "/tmp/eden_test.0ostuc90/mounts/main" stopped ================== WARNING: ThreadSanitizer: data race (pid=4147554) Write of size 8 at 0x7fff9e182d90 by main thread: #0 pthread_cond_destroy <null> (edenfs+0x00000007671a) #1 facebook::eden::EdenServer::~EdenServer() / /eden/fs/service/EdenServer.cpp:93 (libeden_fs_service_server.so+0x0000000b96cd) #2 main / /eden/fs/service/main.cpp:176 (edenfs+0x000000018515) Previous read of size 8 at 0x7fff9e182d90 by thread T73: #0 pthread_cond_broadcast <null> (edenfs+0x0000000765b7) #1 __gthread_cond_broadcast /home/engshare/third-party2/libgcc/4.9.x/src/gcc-4_9/x86_64-facebook-linux/libstdc++-v3/include/x86_64-facebook-linux/bits/gthr-default.h:852 (libstdc++.so.6+0x0000000e14f8) #2 std::condition_variable::notify_all() /home/engshare/third-party2/libgcc/4.9.x/src/gcc-4_9/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/condition_variable.cc:72 (libstdc ++.so.6+0x0000000e14f8) #3 facebook::eden::EdenServer::mount(std::shared_ptr<facebook::eden::EdenMount>, std::unique_ptr<facebook::eden::ClientConfig, std::default_delete<facebook::eden::ClientConfig> >)::$_0::operator()() const / / /eden/fs/service/EdenServer.cpp:145 (libeden_fs_service_server.so+0x0000000bcdb5) #4 std::_Function_handler<void (), facebook::eden::EdenServer::mount(std::shared_ptr<facebook::eden::EdenMount>, std::unique_ptr<facebook::eden::ClientConfig, std::default_delete<facebook::eden::ClientConfig> >)::$_0>::_M_invoke(std::_Any_data const&) / /third-party-buck/gcc-4.9-glibc-2.20-fb/build/libgcc/include/c++/trunk/functional:2039 (libeden_fs_service_server.so+0x0000000bcab0) #5 std::function<void ()>::operator()() const / /third-party-buck/gcc-4.9-glibc-2.20-fb/build/libgcc/include/c++/trunk/functional:2439 (libeden_fuse_fusell.so+0x00000020fbb9) #6 facebook::eden::fusell::MountPoint::start(bool, std::function<void ()> const&)::$_0::operator()() const / /eden/fuse/MountPoint.cpp:69 (libeden_fuse_fusell.so+0x000000237447 ) #7 void std::_Bind_simple<facebook::eden::fusell::MountPoint::start(bool, std::function<void ()> const&)::$_0 ()>::_M_invoke<>(std::_Index_tuple<>) / /third-party-buck/gcc-4.9- glibc-2.20-fb/build/libgcc/include/c++/trunk/functional:1699 (libeden_fuse_fusell.so+0x000000237048) #8 std::_Bind_simple<facebook::eden::fusell::MountPoint::start(bool, std::function<void ()> const&)::$_0 ()>::operator()() / /third-party-buck/gcc-4.9-glibc-2.20-fb/build/libgc c/include/c++/trunk/functional:1688 (libeden_fuse_fusell.so+0x000000236ff8) #9 std::thread::_Impl<std::_Bind_simple<facebook::eden::fusell::MountPoint::start(bool, std::function<void ()> const&)::$_0 ()> >::_M_run() / /third-party-buck/gcc-4.9-glibc-2. 20-fb/build/libgcc/include/c++/trunk/thread:115 (libeden_fuse_fusell.so+0x000000236d8c) #10 execute_native_thread_routine /home/engshare/third-party2/libgcc/4.9.x/src/gcc-4_9/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:84 (libstdc++.so.6+0x0000000e6 ec0) ``` Reviewed By: simpkins Differential Revision: D3844846 fbshipit-source-id: 545474bc1aff8621dbeb487dcd6b54c82828ff3b

Summary: Sandcastle's [1] code coverage builds compile EdenFS and Hg with Clang's -fprofile-generate flag (or a related flag). By default, running a program with with that flag creates a `default.profraw` file in the current directory. This causes some tests to fail, such as HgBackingStoreTest.getTreeForCommit_reimports_tree_if_it_was_deleted_after_import, because `default.profraw` is unintentionally committed to the test repo: ``` [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from HgBackingStoreTest [ RUN ] HgBackingStoreTest.getTreeForCommit_reimports_tree_if_it_was_deleted_after_import eden/fs/store/hg/test/HgBackingStoreTest.cpp:64: Failure Value of: tree1->getEntryNames() Expected: has 2 elements where element #0 is equal to foo, element #1 is equal to src Actual: { default.profraw, foo, src }, which has 3 elements [ FAILED ] HgBackingStoreTest.getTreeForCommit_reimports_tree_if_it_was_deleted_after_import (1563 ms) [----------] 1 test from HgBackingStoreTest (1563 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (1563 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] HgBackingStoreTest.getTreeForCommit_reimports_tree_if_it_was_deleted_after_import ``` When Sandcastle runs the tests, it sets the `LLVM_PROFILE_FILE` environment variable, which configures the path to `default.profraw`. EdenFS' HgRepo class is stripping this variable when invoking hg, so hg uses the default path and not the configured path. Make HgRepo pass `LLVM_PROFILE_FILE` through to hg. This fixes some of EdenFS' tests when run by Sandcastle for code coverage purposes. (This does *not* fix running the tests manually (without setting `LLVM_PROFILE_FILE`), though.) [1] Sandcastle is Facebook's continuous integration system. Reviewed By: simpkins Differential Revision: D15104832 fbshipit-source-id: 3929b9b0ab0904f2552ace7d8e4e4f4a4221192c

Summary: Let's use functionality added in D20389226 so that we can generate diffs even for large files. I've contemplated between two approaches: either "silently" generate placeholder diff for files that are over the limit or add a new option where client can request these placeholders. I choose the latter for a few reasons: 1) To me option #1 might cause surprises e.g. diffing a single large file doesn't fail, but diffing two files whose size is (LIMIT / 2 + 1) will fail. Option #2 let's the client be very explicit about what it needs - and it also won't return a placeholder when the actual content is expected! 2) Option #2 makes the client think about what it wants to be returned, and it seems in line with source control thrift API design in general i.e. commit_file_diffs is not trivial to use by design to because force client to think about edge cases. And it seems that forcing client to think about additional important edge case (i.e. large file) makes sense. 3) The code change is simpler with option #2 I also thought about adding generate_placeholder_diff parameter to CommitFileDiffsParams, but that makes integration with scs_helper.py harder. Reviewed By: markbt Differential Revision: D20417787 fbshipit-source-id: ab9b32fd7a4768043414ed7d8bf39e3c6f4eb53e

Summary: The diffhelpers.c lacks of type checks and segfaults on Python 3: (gdb) r --config patch.eol=auto import -d '0 0' -m 'test patch.eol' --bypass ../test.diff --debug --traceback Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7263016 in __strcmp_sse42 () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff7263016 in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007fffee5e3d3b in testhunk (self=<optimized out>, args=<optimized out>) at edenscm/mercurial/cext/diffhelpers.c:151 Looking at the diffhelpers usage, it seems to be using the `bytes` type (bytes on Python 2, str on Python 3). Let's implement it in Rust so `rust-cpython` will complain if the type is not `bytes`. Reviewed By: xavierd Differential Revision: D20935366 fbshipit-source-id: 8b474555b52caeab4175d7dad44c4c4e7097e557

Summary: At the moment CommitSyncConfig can be set in two ways: 1) Set in the normal mononoke config. That means that config can be updated only after a service is restarted. This is an outdated way, and won't be used in prod. 2) Set in separate config which can be updated on demand. This is what we are planning to use in production. create_commit_syncer_from_matches was used to build a CommitSyncer object via normal mononoke config (i.e. outdated option #1). According to the comments it was done so that we can read configs from the disk instead of configerator, but this doesn't make sense because you already can read configerator configs from disk. So create_commit_syncer_from_matches doesn't look particularly useful, and besides it also make further refactorings harder. Let's remove it. Reviewed By: ikostia Differential Revision: D23811090 fbshipit-source-id: 114d88d9d9207c831d98dfa1cbb9e8ede5adeb1d

Summary: This can cause a deadlock if `func()` calls `create_callsite`. Usually it's rare. But (Python) signal handler makes it realistic. Example backtrace: Traceback (most recent call first): File "/opt/fb/mercurial/edenscm/tracing.py", line 337, in event File "/opt/fb/mercurial/edenscm/mercurial/ui.py", line 1275, in debug tracing.debug(msg.rstrip("\n"), depth=1) File "/opt/fb/mercurial/edenscm/mercurial/commandserver.py", line 608, in _reapworkers self.ui.debug("worker process exited (pid=%d)\n" % pid) File "/opt/fb/mercurial/edenscm/mercurial/commandserver.py", line 591, in _sigchldhandler self._reapworkers(os.WNOHANG) File "/opt/fb/mercurial/edenscm/tracing.py", line 337, in event File "/opt/fb/mercurial/edenscm/mercurial/ui.py", line 1275, in debug tracing.debug(msg.rstrip("\n"), depth=1) File "/opt/fb/mercurial/edenscm/mercurial/commandserver.py", line 608, in _reapworkers self.ui.debug("worker process exited (pid=%d)\n" % pid) File "/opt/fb/mercurial/edenscm/mercurial/commandserver.py", line 591, in _sigchldhandler self._reapworkers(os.WNOHANG) #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x000055d0d65ba339 in <parking_lot::raw_rwlock::RawRwLock>::lock_upgradable_slow () #2 0x000055d0d55b5814 in tracing_runtime_callsite::create_callsite::<tracing_runtime_callsite::callsite_info::EventKindType, pytracing::new_callsite<tracing_runtime_callsite::callsite_info::EventKindType>::{closure#2}> () #3 0x000055d0d5584cb9 in <pytracing::EventCallsite>::__new__ () #4 0x000055d0d55a3eaa in std::panicking::try::<*mut python3_sys::object::PyObject, cpython::function::handle_callback<<pytracing::EventCallsite>::create_instance::TYPE_OBJECT::wrap_newfunc::{closure#0}, pytracing::EventCallsite, cpython::function::PyObjectCallbackConverter>::{closure#0}> () #5 0x000055d0d5589365 in cpython::function::handle_callback::<<pytracing::EventCallsite>::create_instance::TYPE_OBJECT::wrap_newfunc::{closure#0}, pytracing::EventCallsite, cpython::function::PyObjectCallbackConverter> () #6 0x000055d0d55856e1 in <pytracing::EventCallsite>::create_instance::TYPE_OBJECT::wrap_newfunc () #7 0x00007ff88d576230 in type_call ( kwds={'obj': Frame 0x7ff87c1f8c40, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 608, in _reapworkers (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87d...(truncated), args=(), type=0x55d0d8ea5b40 <_RNvNvMs1R_CsgCrAUYYhx1D_9pytracingNtB8_13EventCallsite15create_instance11TYPE_OBJECT.llvm.4665269759137401160>) at Objects/typeobject.c:974 #8 _PyObject_MakeTpCall (callable=<type at remote 0x55d0d8ea5b40>, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:159 #9 0x00007ff88d56dc81 in _PyObject_Vectorcall ( kwnames=('obj', 'name', 'target', 'level', 'fieldnames'), nargsf=<optimized out>, args=<optimized out>, callable=<type at remote 0x55d0d8ea5b40>) at ./Include/cpython/abstract.h:125 #10 _PyObject_Vectorcall (kwnames=('obj', 'name', 'target', 'level', 'fieldnames'), nargsf=<optimized out>, args=<optimized out>, callable=<type at remote 0x55d0d8ea5b40>) at ./Include/cpython/abstract.h:115 #11 call_function (kwnames=('obj', 'name', 'target', 'level', 'fieldnames'), oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at Python/ceval.c:4963 #12 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3515 #13 0x00007ff88d566268 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87cced010, for file /opt/fb/mercurial/edenscm/tracing.py, line 337, in event (message='worker process exited (pid=3953080)', name=None, target=None, level=1, depth=1, meta={}, frame=Frame 0x7ff87c1f8c40, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 608, in _reapworkers (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': ...(truncated)) at Python/ceval.c:741 #14 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x7ff87c0fdc78, kwcount=<optimized out>, kwstep=1, defs=0x7ff87d572558, defcount=4, kwdefs=0x0, closure=0x0, name='event', qualname='event') at Python/ceval.c:4298 #15 0x00007ff88d57fdce in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ff87c0fdc70, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:435 #16 0x00007ff88d57574f in _PyObject_FastCallDict (callable=<function at remote 0x7ff87d5741f0>, args=0x7ff87eb63838, nargsf=<optimized out>, kwargs=<optimized out>) at Objects/call.c:104 #17 0x00007ff88d66d1ab in partial_fastcall (kwargs={'level': 1, 'depth': 1}, nargs=<optimized out>, args=<optimized out>, pto=0x7ff87d572630) at ./Modules/_functoolsmodule.c:169 #18 partial_call (pto=0x7ff87d572630, args=<optimized out>, kwargs=<optimized out>) at ./Modules/_functoolsmodule.c:224 #19 0x00007ff88d576331 in _PyObject_MakeTpCall (callable=<functools.partial at remote 0x7ff87d572630>, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/object.c:2207 #20 0x00007ff88d56dc81 in _PyObject_Vectorcall (kwnames=('depth',), nargsf=<optimized out>, args=<optimized out>, callable=<functools.partial at remote 0x7ff87d572630>) at ./Include/cpython/abstract.h:125 #21 _PyObject_Vectorcall (kwnames=('depth',), nargsf=<optimized out>, args=<optimized out>, callable=<functools.partial at remote 0x7ff87d572630>) at ./Include/cpython/abstract.h:115 #22 call_function (kwnames=('depth',), oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at Python/ceval.c:4963 #23 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3515 #24 0x00007ff88d566268 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87c0e43c0, for file /opt/fb/mercurial/edenscm/mercurial/ui.py, line 1275, in debug (self=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87dc039a0>, 'bookmarks': <itemregister(_generics=se...(truncated)) at Python/ceval.c:741 #25 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x7ff87c1f8de8, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name='debug', qualname='ui.debug') at Python/ceval.c:4298 #26 0x00007ff88d57fdce in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ff87c1f8dd8, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:435 #27 0x00007ff88d56821a in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ff87c1f8dd8, callable=<function at remote 0x7ff87d57e9d0>) at ./Include/cpython/abstract.h:127 #28 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7ff88ca08780) at Python/ceval.c:4963 #29 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486 #30 0x00007ff88d57fd38 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87c1f8c40, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 608, in _reapworkers (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87dc039a0>,...(truncated)) at Python/ceval.c:738 #31 function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>) at Objects/call.c:283 #32 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:410 #33 0x00007ff88d56821a in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ff87c0f26d8, callable=<function at remote 0x7ff87d73b310>) at ./Include/cpython/abstract.h:127 #34 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7ff88ca08780) at Python/ceval.c:4963 #35 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486 #36 0x00007ff88d57fd38 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87c0f2550, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 591, in _sigchldhandler (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemreg ister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87dc039a...(truncated)) at Python/ceval.c:738 #37 function_code_fastcall (globals=<optimized out>, nargs=3, args=<optimized out>, co=<optimized out>) at Objects/call.c:283 #38 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:410 #39 0x00007ff88d592153 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ./Include/cpython/abstract.h:115 #40 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/classobject.c:67 #41 0x00007ff88d5963fb in PyVectorcall_Call (kwargs=0x0, tuple=<optimized out>, callable=<method at remote 0x7ff87c70d5c0>) at Objects/dictobject.c:1802 #42 PyObject_Call (callable=<method at remote 0x7ff87c70d5c0>, args=<optimized out>, kwargs=0x0) at Objects/call.c:227 #43 0x00007ff88d6405ea in _PyErr_CheckSignals () at ./Modules/signalmodule.c:1689 #44 0x00007ff88d5a41a1 in _PyErr_CheckSignals () at Objects/object.c:577 #45 PyErr_CheckSignals () at ./Modules/signalmodule.c:1649 #46 PyObject_Str (v='_reapworkers') at Objects/object.c:561 #47 0x000055d0d557c821 in pytracing::tostr_opt () #48 0x000055d0d55b5a7d in tracing_runtime_callsite::create_callsite::<tracing_runtime_callsite::callsite_info::EventKindType, pytracing::new_callsite<tracing_runtime_callsite::callsite_info::EventKindType>::{closure#2}> () #49 0x000055d0d5584cb9 in <pytracing::EventCallsite>::__new__ () #50 0x000055d0d55a3eaa in std::panicking::try::<*mut python3_sys::object::PyObject, cpython::function::handle_callback<<pytracing::EventCallsite>::create_instance::TYPE_OBJECT::wrap_newfunc::{closure#0}, pytracing::EventCallsite, cpython::function::PyObjectCallbackConverter>::{closure#0}> () #51 0x000055d0d5589365 in cpython::function::handle_callback::<<pytracing::EventCallsite>::create_instance::TYPE_OBJECT::wrap_newfunc::{closure#0}, pytracing::EventCallsite, cpython::function::PyObjectCallbackConverter> () #52 0x000055d0d55856e1 in <pytracing::EventCallsite>::create_instance::TYPE_OBJECT::wrap_newfunc () #53 0x00007ff88d576230 in type_call ( kwds={'obj': Frame 0x7ff87c1f8440, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 608, in _reapworkers (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87d...(truncated), args=(), type=0x55d0d8ea5b40 <_RNvNvMs1R_CsgCrAUYYhx1D_9pytracingNtB8_13EventCallsite15create_instance11TYPE_OBJECT.llvm.4665269759137401160>) at Objects/typeobject.c:974 #54 _PyObject_MakeTpCall (callable=<type at remote 0x55d0d8ea5b40>, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:159 #55 0x00007ff88d56dc81 in _PyObject_Vectorcall ( kwnames=('obj', 'name', 'target', 'level', 'fieldnames'), nargsf=<optimized out>, args=<optimized out>, callable=<type at remote 0x55d0d8ea5b40>) at ./Include/cpython/abstract.h:125 #56 _PyObject_Vectorcall (kwnames=('obj', 'name', 'target', 'level', 'fieldnames'), nargsf=<optimized out>, args=<optimized out>, callable=<type at remote 0x55d0d8ea5b40>) at ./Include/cpython/abstract.h:115 #57 call_function (kwnames=('obj', 'name', 'target', 'level', 'fieldnames'), oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at Python/ceval.c:4963 #58 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3515 #59 0x00007ff88d566268 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87ccec890, for file /opt/fb/mercurial/edenscm/tracing.py, line 337, in event (message='worker process exited (pid=3953122)', name=None, target=None, level=1, depth=1, meta={}, frame=Frame 0x7ff87c1f8440, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 608, in _reapworkers (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _u nserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': ...(truncated)) at Python/ceval.c:741 #60 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x7ff87c0fdd78, kwcount=<optimized out>, kwstep=1, defs=0x7ff87d572558, defcount=4, kwdefs=0x0, closure=0x0, name='event', qualname='event') at Python/ceval.c:4298 #61 0x00007ff88d57fdce in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ff87c0fdd70, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:435 #62 0x00007ff88d57574f in _PyObject_FastCallDict (callable=<function at remote 0x7ff87d5741f0>, args=0x7ff87eb59d18, nargsf=<optimized out>, kwargs=<optimized out>) at Objects/call.c:104 #63 0x00007ff88d66d1ab in partial_fastcall (kwargs={'level': 1, 'depth': 1}, nargs=<optimized out>, args=<optimized out>, pto=0x7ff87d572630) at ./Modules/_functoolsmodule.c:169 #64 partial_call (pto=0x7ff87d572630, args=<optimized out>, kwargs=<optimized out>) at ./Modules/_functoolsmodule.c:224 #65 0x00007ff88d576331 in _PyObject_MakeTpCall (callable=<functools.partial at remote 0x7ff87d572630>, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/object.c:2207 #66 0x00007ff88d56dc81 in _PyObject_Vectorcall (kwnames=('depth',), nargsf=<optimized out>, args=<optimized out>, callable=<functools.partial at remote 0x7ff87d572630>) at ./Include/cpython/abstract.h:125 #67 _PyObject_Vectorcall (kwnames=('depth',), nargsf=<optimized out>, args=<optimized out>, callable=<functools.partial at remote 0x7ff87d572630>) at ./Include/cpython/abstract.h:115 #68 call_function (kwnames=('depth',), oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at Python/ceval.c:4963 #69 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3515 #70 0x00007ff88d566268 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87c0e4200, for file /opt/fb/mercurial/edenscm/mercurial/ui.py, line 1275, in debug (self=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87dc039a0>, 'bookmarks': <itemregister(_generics=se...(truncated)) at Python/ceval.c:741 #71 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x7ff87c1f85e8, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name='debug', qualname='ui.debug') at Python/ceval.c:4298 #72 0x00007ff88d57fdce in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ff87c1f85d8, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:435 #73 0x00007ff88d56821a in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ff87c1f85d8, callable=<function at remote 0x7ff87d57e9d0>) at ./Include/cpython/abstract.h:127 #74 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7ff88ca08780) at Python/ceval.c:4963 #75 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486 #76 0x00007ff88d57fd38 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87c1f8440, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 608, in _reapworkers (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87dc039a0>,...(truncated)) at Python/ceval.c:738 #77 function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>) {<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87dc039a0>,...(truncated)) at Python/ceval.c:738 #77 function_code_fastcall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>) --Type <RET> for more, q to quit, c to continue without paging-- at Objects/call.c:283 #78 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:410 #79 0x00007ff88d56821a in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ff87c0f2528, callable=<function at remote 0x7ff87d73b310>) at ./Include/cpython/abstract.h:127 #80 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7ff88ca08780) at Python/ceval.c:4963 #81 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486 #82 0x00007ff88d57fd38 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7ff87c0f23a0, for file /opt/fb/mercurial/edenscm/mercurial/commandserver.py, line 591, in _sigchldhandler (self=<unixforkingservice(ui=<ui(_buffers=[], _bufferstates=[], _bufferapplylabels=None, _outputui=None, callhooks=True, insecureconnections=False, _colormode=None, _styles={}, _terminaloutput=None, cmdname=None, _uiconfig=<uiconfig(quiet=False, verbose=False, debugflag=False, tracebackflag=False, logmeasuredtimes=False, _rcfg=<localrcfg(_rcfg=<bindings.configparser.config at remote 0x7ff87d7325d0>) at remote 0x7ff87eb73e80>, _unserializable={}, _pinnedconfigs=set(), _knownconfig={'alias': <itemregister(_generics={<configitem(section='alias', name='.*', default=None, alias=[], generic=True, priority=0, _re=<re.Pattern at remote 0x7ff87d69bed0>) at remote 0x7ff87d690a60>}) at remote 0x7ff87dbfde50>, 'annotate': <itemregister(_generics=set()) at remote 0x7ff87d6ad9f0>, 'auth': <itemregister(_generics=set()) at remote 0x7ff87dc037c0>, 'blackbox': <itemregister(_generics=set()) at remote 0x7ff87dc039a...(truncated)) at Python/ceval.c:738 #83 function_code_fastcall (globals=<optimized out>, nargs=3, args=<optimized out>, co=<optimized out>) at Objects/call.c:283 #84 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:410 #85 0x00007ff88d592153 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ./Include/cpython/abstract.h:115 #86 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/classobject.c:67 #87 0x00007ff88d5963fb in PyVectorcall_Call (kwargs=0x0, tuple=<optimized out>, callable=<method at remote 0x7ff87c70d5c0>) at Objects/dictobject.c:1802 #88 PyObject_Call (callable=<method at remote 0x7ff87c70d5c0>, args=<optimized out>, kwargs=0x0) at Objects/call.c:227 #89 0x00007ff88d6405ea in _PyErr_CheckSignals () at ./Modules/signalmodule.c:1689 #90 0x00007ff88d59b7fd in _PyErr_CheckSignals () at ./Modules/signalmodule.c:1660 #91 PyErr_CheckSignals () at ./Modules/signalmodule.c:1649 .... Reviewed By: DurhamG Differential Revision: D27111187 fbshipit-source-id: 1aa29ab24088b57b98de3741eb81c0a7be01237d

Summary: This introduces a basic building block for query batching. I called this rendezvous, since it's about multiple queries meeting up in the same place :) There are a few (somewhat conflicting) goals this tries to satisfy, so let's go over them: 1), we'd like to reduce the total number of queries made by batch jobs. For example, group hg bonsai lookups made by the walker. Those jobs are characterized by the fact that they have a lot of queries to make, all the time. Here's an example: https://fburl.com/ods/zuiep7yh. 2), we'd like to reduce the overall number of connections held to MySQL by our tasks. The main way we achieve this is by reducing the maximum number of concurrent queries. Indeed, a high total number of queries doesn't necessarily result in a lot of connections as long as they're not concurrent, because we can reuse connections. On the other hand, if you dispatch 100 concurrent queries, that _does_ use 100 connections. This is something that applies to batch jobs due to their query volume, but also to "interactive" jobs like Mononoke Server or SCS, just not all the time. Here's an example: https://fburl.com/ods/o6gp07qp (you can see the query count is overall low, but sometimes spikes substantially). 2.1) It's also worth noting that concurrent queries are often the result of many clients wanting the same data, so deduplication is also useful here. 3), we also don't want to impact the latency of interactive jobs when they need to a little query here or there (i.e. it's largely fine if our jobs all hold a few connections to MySQL and use them somewhat consistently). 4), we'd like this to make it easier to do batching right. For example, if you have 100 Bonsais to map to hg, you should be able to just map and call `future::try_join_all` and have that do the right thing. 5), we don't want "bad" queries to affect other queries negatively. One example would be the occasional queries we make to Bonsai <-> Hg mapping in `known` for thousands (if not more) of rows. 6), we want this to be easy to incorporate into the codebase. So, how do we try to address all of this? Here's how: - We ... do batching, and we deduplicate requests in a batch. This is the easier bit and should address #1, #2 and #2.1, #4. - However, batching is conditional. We notably don't batch very large requests with the rest (addresses #5). We also don't batch small queries all the time: we only batch if we are observing a throughput of queries that suggests we can find some benefit in batching (this targets #3). - Finally, we have some utilities for common cases like having to group by repo id (this is `MultiRendezVous`), and this is all configurable via tunables (and the default is to not do anything). Reviewed By: StanislavGlebik Differential Revision: D27010317 fbshipit-source-id: 4a2397255f9785c6722c02e4d419438fd0aafa07

Summary: We already log file count, but file sizes is another useful piece of information. I evaluated two options - either do as I did in this diff or just change ScribeToScuba logging python script to query scs to get file sizes. I opted for option #1 mainly because scs doesn't have a method to query file sizes for many files at once, and querying one by one might be too expensive. We can add a method like that, but that's a bit of a bigger change that I'd like. Reviewed By: andll Differential Revision: D27620507 fbshipit-source-id: 2618e60845bc293535b190d4da85df5667a7ab60

Summary: Implement using of uploading changesets in `hg cloud upload` command. This is the last part for `hg cloud upload` - uploading changesets via Edenapi test ``` ``` # machine #2 liubovd {emoji:1f352} ~/fbsource [15] → hg pull -r 0b6075b4bda143d5212c1525323fb285d96a1afb pulling from mononoke://mononoke.c2p.facebook.net/fbsource connected to twshared27150.03.cln3.facebook.com session RaIPDgvF6l8rmXkA abort: 0b6075b4bda143d5212c1525323fb285d96a1afb not found! ``` ``` # machine #1 devvm1006.cln0 {emoji:1f440} ~/fbsource/fbcode/eden/scm [6] → EDENSCM_LOG="edenapi::client=info" ./hg cloud upload Jul 11 13:26:26.322 INFO edenapi::client: Requesting lookup for 1 item(s) commitcloud: head '0b6075b4bda1' hasn't been uploaded yet Jul 11 13:26:26.472 INFO edenapi::client: Requesting lookup for 6 item(s) commitcloud: queue 1 commit for upload Jul 11 13:26:26.648 INFO edenapi::client: Requesting lookup for 1 item(s) commitcloud: queue 0 files for upload Jul 11 13:26:26.698 INFO edenapi::client: Requesting lookup for 4 item(s) commitcloud: queue 4 trees for upload Jul 11 13:26:27.393 INFO edenapi::client: Requesting trees upload for 4 item(s) commitcloud: uploaded 4 trees commitcloud: uploading commit '0b6075b4bda143d5212c1525323fb285d96a1afb'... Jul 11 13:26:28.426 INFO edenapi::client: Requesting changesets upload for 1 item(s) commitcloud: uploaded 1 commit ``` ``` # machine #2 liubovd {emoji:1f352} ~/fbsource [16] → hg pull -r 0b6075b4bda143d5212c1525323fb285d96a1afb pulling from mononoke://mononoke.c2p.facebook.net/fbsource connected to twshared16001.08.cln2.facebook.com session QCpy1x9yrflRF6xF searching for changes adding commits adding manifests adding file changes added 895 commits with 0 changes to 0 files (running background incremental repack) prefetching trees for 4 commits liubovd {emoji:1f352} ~/fbsource [17] → hg up 0b6075b4bda143d5212c1525323fb285d96a1afb warning: watchman has recently started (pid 93231) - operation will be slower than usual connected to twshared32054.08.cln2.facebook.com session Hw91G8kRYzt4c5BV 1 files updated, 0 files merged, 0 files removed, 0 files unresolved liubovd {emoji:1f352} ~/fbsource [18] → hg diff -c . connected to twshared0965.07.cln2.facebook.com session rrYSvRM6pnBYZ2Fn diff --git a/fbcode/eden/scm/test b/fbcode/eden/scm/test new file mode 100644 --- /dev/null +++ b/fbcode/eden/scm/test @@ -0,0 +1,1 @@ +test ``` Initial perf wins: Having a large stack of 6 commits (total 24 files changed), tested *adding a single line to a file at the top commit*. We can see at least 2X win but it should be more because I have tested with a local instance of edenapi service that runs on my devserver. ``` ╷ ╷ @ 5582fc8ee 6 minutes ago liubovd ╷ │ test ╷ │ ╷ o d55f9bb65 86 minutes ago liubovd D29644738 ╷ │ [hg] edenapi: Implement using of uploading changesets in `hg cloud upload` command ╷ │ ╷ o 561149783 Friday at 15:10 liubovd D29644797 ╷ │ [hg] edenapi: Add request handler for uploading hg changesets ╷ │ ╷ o c3dda964a Friday at 15:10 liubovd D29644800 ╷ │ [edenapi_service] Add new /:repo/upload/changesets endpoint ╷ │ ╷ o 28ce2fa0c Friday at 15:10 liubovd D29644799 ╷ │ [hg] edenapi/edenapi_service: Add new API for uploading Hg Changesets ╷ │ ╷ o 13325b361 Yesterday at 15:23 liubovd D29644798 ╭─╯ [edenapi_service] Implement uploading of hg changesets ``` ``` # adding new line to a file test in the test commit, and then run: devvm1006.cln0 {emoji:1f440} ~/fbsource/fbcode/eden/scm [8] → time hg cloud upload commitcloud: head '4e4f947d73e6' hasn't been uploaded yet commitcloud: queue 1 commit for upload commitcloud: queue 0 files for upload commitcloud: queue 4 trees for upload commitcloud: uploaded 4 trees commitcloud: uploading commit '4e4f947d73e676b63df7c90c4e707d38e6d0a93b'... commitcloud: uploaded 1 commit real 0m3.778s user 0m0.017s sys 0m0.027s ``` ``` # adding another new line to a file test in the test commit, and then run: devvm1006.cln0 {emoji:1f440} ~/fbsource/fbcode/eden/scm [11] → time hg cloud backup connected to twshared30574.02.cln2.facebook.com session uvOvhxtBfeM7pMgl backing up stack rooted at 13325b3612d2 commitcloud: backed up 1 commit real 0m7.507s user 0m0.013s sys 0m0.030s ``` Test force mode of the new command that reupload everything: ``` devvm1006.cln0 {emoji:1f440} ~/fbsource/fbcode/eden/scm [13] → time hg cloud upload --force commitcloud: head '5582fc8ee382' hasn't been uploaded yet commitcloud: queue 6 commits for upload commitcloud: queue 24 files for upload commitcloud: uploaded 24 files commitcloud: queue 61 trees for upload commitcloud: uploaded 61 trees commitcloud: uploading commit '13325b3612d20c176923d1aab8a28383cea2ba9a'... commitcloud: uploading commit '28ce2fa0c6a02de57cdc732db742fd5c8f2611ad'... commitcloud: uploading commit 'c3dda964a71b65f01fc4ccadc9429ee887ea982c'... commitcloud: uploading commit '561149783e2fb5916378fe27757dcc2077049f8c'... commitcloud: uploading commit 'd55f9bb65a0829b1731baa686cb8a6e0c5500cc2'... commitcloud: uploading commit '5582fc8ee382c4c367a057db2a1781377bf55ba4'... commitcloud: uploaded 6 commits real 0m7.830s user 0m0.011s sys 0m0.032s ``` We can see the time is similar to the current `hg cloud backup` command. Reviewed By: markbt Differential Revision: D29644738 fbshipit-source-id: cbbfcb2e8018f83f323f447848b3b6045baf47c5

…mputed Summary: # Goal of the stack There goal of this stack is to make megarepo api safer to use. In particular, we want to achieve: 1) If the same request is executed a few times, then it won't cause corrupt repo in any way (i.e. it won't create commits that client didn't intend to create and it won't move bookmarks to unpredictable places) 2) If request finished successfully, but we failed to send the success to the client, then repeating the same request would finish successfully. Achieving #1 is necessary because async_requests_worker might execute a few requests at the same time (though this should be rare). Achieveing #2 is necessary because if we fail to send successful response to the client (e.g. because of network issues), we want client to retry and get this successful response back, so that client can continue with their next request. In order to achieve #1 we make all bookmark move conditional i.e. we move a bookmark only if current location of the bookmark is at the place where client expects it. This should help achieve goal #1, because even if we have two requests executing at the same time, only one of them will successfully move a bookmark. However once we achieve #1 we have a problem with #2 - if a request was successful, but we failed to send a successful reply back to the client then client will retry the request, and it will fail, because a bookmark is already at the new location (because previous request was successful), but client expects it to be at the old location (because client doesn't know that the request was succesful). To fix this issue before executing the request we check if this request was already successful, and we do it heuristically by checking request parameters and verifying the commit remapping state. This doesn't protect against malicious clients, but it should protect from issue #2 described above. So the whole stack of diffs is the following: 1) take a method from megarepo api 2) implement a diff that makes bookmark moves conditional 3) Fix the problem #2 by checking if a previous request was successful or not # This diff If a previous add_sync_target() call was successful on mononoke side, but we failed to deliver this result to the client (e.g. network issues), then client would just try to retry this call. Before this diff it wouldn't work (i.e. we just fail to create a bookmark because it's already created). This diff fixes it by checking a commit this bookmark points to and checking if it looks like it was created by a previous add_sync_target call. In particular, it checks that remapping state file matches the request parameters, and that config version is the same. Differential Revision: D29848377 fbshipit-source-id: 16687d975748929e5eea8dfdbc9e206232ec9ca6

…as computed Summary: # Goal of the stack There goal of this stack is to make megarepo api safer to use. In particular, we want to achieve: 1) If the same request is executed a few times, then it won't cause corrupt repo in any way (i.e. it won't create commits that client didn't intend to create and it won't move bookmarks to unpredictable places) 2) If request finished successfully, but we failed to send the success to the client, then repeating the same request would finish successfully. Achieving #1 is necessary because async_requests_worker might execute a few requests at the same time (though this should be rare). Achieveing #2 is necessary because if we fail to send successful response to the client (e.g. because of network issues), we want client to retry and get this successful response back, so that client can continue with their next request. In order to achieve #1 we make all bookmark move conditional i.e. we move a bookmark only if current location of the bookmark is at the place where client expects it. This should help achieve goal #1, because even if we have two requests executing at the same time, only one of them will successfully move a bookmark. However once we achieve #1 we have a problem with #2 - if a request was successful, but we failed to send a successful reply back to the client then client will retry the request, and it will fail, because a bookmark is already at the new location (because previous request was successful), but client expects it to be at the old location (because client doesn't know that the request was succesful). To fix this issue before executing the request we check if this request was already successful, and we do it heuristically by checking request parameters and verifying the commit remapping state. This doesn't protect against malicious clients, but it should protect from issue #2 described above. So the whole stack of diffs is the following: 1) take a method from megarepo api 2) implement a diff that makes bookmark moves conditional 3) Fix the problem #2 by checking if a previous request was successful or not # This diff Same as with D29848377 - if result was already computed and client retries the same request, then return it. Differential Revision: D29874802 fbshipit-source-id: ebc2f709bc8280305473d6333d0725530c131872

Summary: # Goal of the stack There goal of this stack is to make megarepo api safer to use. In particular, we want to achieve: 1) If the same request is executed a few times, then it won't cause corrupt repo in any way (i.e. it won't create commits that client didn't intend to create and it won't move bookmarks to unpredictable places) 2) If request finished successfully, but we failed to send the success to the client, then repeating the same request would finish successfully. Achieving #1 is necessary because async_requests_worker might execute a few requests at the same time (though this should be rare). Achieveing #2 is necessary because if we fail to send successful response to the client (e.g. because of network issues), we want client to retry and get this successful response back, so that client can continue with their next request. In order to achieve #1 we make all bookmark move conditional i.e. we move a bookmark only if current location of the bookmark is at the place where client expects it. This should help achieve goal #1, because even if we have two requests executing at the same time, only one of them will successfully move a bookmark. However once we achieve #1 we have a problem with #2 - if a request was successful, but we failed to send a successful reply back to the client then client will retry the request, and it will fail, because a bookmark is already at the new location (because previous request was successful), but client expects it to be at the old location (because client doesn't know that the request was succesful). To fix this issue before executing the request we check if this request was already successful, and we do it heuristically by checking request parameters and verifying the commit remapping state. This doesn't protect against malicious clients, but it should protect from issue #2 described above. So the whole stack of diffs is the following: 1) take a method from megarepo api 2) implement a diff that makes bookmark moves conditional 3) Fix the problem #2 by checking if a previous request was successful or not # This diff We already have it for change_target_config, and it's useful to prevent races and inconsistencies. That's especially important given that our async request worker might run a few identical sync_changeset methods at the same time, and target_location can help process this situation correctly. Let's add target_location to sync_changeset, and while there I also updated the comment for these fields in other methods. The comment said ``` // This operation will succeed only if the // `target`'s bookmark is still at the same location // when this operation tries to advance it ``` This is not always the case - operation might succeed if the same operation has been re-sent twice, see previous diffs for more explanationmotivation. Reviewed By: krallin Differential Revision: D29875242 fbshipit-source-id: c14b2148548abde984c3cb5cc62d04f920240657

Summary: # Goal of the stack There goal of this stack is to make megarepo api safer to use. In particular, we want to achieve: 1) If the same request is executed a few times, then it won't cause corrupt repo in any way (i.e. it won't create commits that client didn't intend to create and it won't move bookmarks to unpredictable places) 2) If request finished successfully, but we failed to send the success to the client, then repeating the same request would finish successfully. Achieving #1 is necessary because async_requests_worker might execute a few requests at the same time (though this should be rare). Achieveing #2 is necessary because if we fail to send successful response to the client (e.g. because of network issues), we want client to retry and get this successful response back, so that client can continue with their next request. In order to achieve #1 we make all bookmark move conditional i.e. we move a bookmark only if current location of the bookmark is at the place where client expects it. This should help achieve goal #1, because even if we have two requests executing at the same time, only one of them will successfully move a bookmark. However once we achieve #1 we have a problem with #2 - if a request was successful, but we failed to send a successful reply back to the client then client will retry the request, and it will fail, because a bookmark is already at the new location (because previous request was successful), but client expects it to be at the old location (because client doesn't know that the request was succesful). To fix this issue before executing the request we check if this request was already successful, and we do it heuristically by checking request parameters and verifying the commit remapping state. This doesn't protect against malicious clients, but it should protect from issue #2 described above. So the whole stack of diffs is the following: 1) take a method from megarepo api 2) implement a diff that makes bookmark moves conditional 3) Fix the problem #2 by checking if a previous request was successful or not # This diff Now that we have target_location in sync_changeset() method, let's move bookmark in sync_changeset conditionally, just as in D29874803 (5afc48a). This would prevent race conditions from happening when the same sync_changeset method is executing twice. Reviewed By: krallin Differential Revision: D29876413 fbshipit-source-id: c076e14171c6615fba2cedf4524d442bd25f83ab

Summary: I've investigated the deadlock we hit recently and attempted to fix it with D31310626 (8ca8816). Now there's a bunch of facts, an unproven theory and a potential fix. Even though the theory is not proven and hence the fix might be wrong, the theory sounds plausible and the fix should be safe to land even if the theory is wrong. But first, the facts, ### Facts a) There's a repro for a deadlock - running `buck-out/gen/eden/mononoke/git/gitimport/gitimport#binary/gitimport --use-mysql-client --repo-name aosp --mononoke-config-path configerator://scm/mononoke/repos/tiers/prod --panic-fate=exit --cachelib-only-blobstore /data/users/stash/gitrepos/boot_images git-range 46cdf6335e1c737f6cf22b89f3438ffabe13b8ae b4106a25a4d8a1168ebe9b7f074740237932b82e` very often deadlocks (maybe not every time, but at least every 3rd run). So even though D31310626 (8ca8816) unblocked the mega_grepo_sync, it doesn't seem like it fixed or even mitigated the issue consistently. Note that I'm running it on devbig which has quite a lot of cpus - this may or may not make deadlock more likely. b) The code deadlocks on [`find_file_changes()`](https://fburl.com/code/7i6tt7om) function, and inside this function we do two things - first we find what needs to be uploaded using [`bonsai_diff()`](https://fburl.com/code/az3v3sbk) function, and then we use [`do_upload()`](https://fburl.com/code/kgb98kg9) function to upload contents to the mononoke repo. Note that both `bonsai_diff()` and `do_upload()` use git_pool. Bonsai_diff produces a stream, and then we apply do_upload to each element of the stream and finally we apply [`buffer_unordered(100)`](https://fburl.com/code/6tuqp3jd) to upload them all in parallel. In simplified pseudo-code it looks like ``` bonsai_diff(..., git_pool, ...) .map(|entry| do_upload(..., git_pool, entry, ....) .try_buffer_unordered(100) ``` c) I've added a few printlns in P462159232, and run the repro command until it, well, repros. These printlns just shows when there was an attempt to grab a semaphore and when it successfully grabbed, and also who was the caller - `bonsai_diff()` or `do_upload()` function. This is the example output I'm getting P462159671. The output is a mess, but what's interesting here is that there exactly 100 more entries of "grabbing semaphore uploading" than "grabbed semaphore uploading". And 100 is exactly the size of the buffer in buffer_unordered. ### Theory So above are the facts, and below is the theory that fits these facts (but as I said above, the theory is unproven yet). Let me start with a few assumption 1) Fact `c)` seem to suggest that when we run into a deadlock there was a full buffer of `do_upload()` futures that were all waiting on a semaphore. 1) If we look at [`buffer_unordered` code](https://docs.rs/futures-util/0.3.17/src/futures_util/stream/stream/buffer_unordered.rs.html#71) we can see that if buffer is full then it stops polling underlying stream until any of the buffered futures is done. 1) Once semaphore is released it [wakes up just a handful of futures](https://docs.rs/tokio/1.12.0/src/tokio/sync/batch_semaphore.rs.html#242) that wait for this semaphore. Given this assumptions I believe the following situation is happening: 1) We get into a state where we have 100 `do_upload()` futures in buffer_unordered stream all waiting for a semaphore. At the same time all the semaphore permits are grabbed by `bonsai_diff()` stream (because of assumption #1) 2) Because of assumption #2 we don't poll underlying `bonsai_diff()` stream. However this stream has already [spawned a few tasks](https://fburl.com/code/9iq3tfad) which successfully grabbed the semaphore permit and are executing 3) Once these `bonsai_diff()` tasks are finished they [drop the semaphore permit](https://fburl.com/code/sw5fwccw), which in turn causes semaphore to be released and one of the tasks that are waiting to be woken up (see assumption #3). But now two things can happen - either a `do_upload()` future will be woken up or `bonsai_diff()` future will be woken up. If the former happens then `buffer_unordered` would poll the `do_upload()` future and continue executing. However if the latter happens (i.e. `bonsai_diff()` future was woken up) then it causes a deadlock - buffer_unordered() aren't going to poll `bonsai_diff()` stream, and so bonsai_diff() stream won't be able to make any progress. At the same time `do_upload()` futures aren't going to be polled at all because they weren't woken up in the first place, and so we deadlock. There are a few things that seem unlikely in this theory (e.g. why we never wake up any of the `do_upload()` futures), so I do think it's worth validating. We could either add a bunch of println! in buffer_unordered and semaphore code, or try to setup [tokio-console](https://github.com/tokio-rs/console). ### Potential fix Potential fix is simple - let's just add another spawn around `GitPool::with()` function. If the theory is right, then this fix should work. Once a semaphore is released and a `bonsai_diff()` future is woken up then this future will be spawned and hence will be able to complete successfully regardless of whether `buffer_unordered()` has its buffer full or not. And once `bonsai_diff()` fully completes then finally `do_upload()` futures will be woken up and `buffer_unordered()` will progress until completion. If the theory is wrong then we just added one useless `tokio::spawn()`. We have a lot of spawns all over the place and `tokio::spawn` are cheap. So landing this fix shouldn't make things worse, but it might actually improve them (and the testing I did suggests that it indeed should improve them - see `Test plan` section below). I don't think it should be seen as a reason to not find the root cause Reviewed By: mojsarn Differential Revision: D31541432 fbshipit-source-id: 0260226a21e6e5e0b41736f86fb54a3abccd0a0b

Summary: We trim off the fragment header from the NFS requests we get. Sometimes during the test_takeover_with_io tests this was crashing in this code with: ``` F0418 20:55:24.216660 25733 IOBuf.h:765] Check failed: amount <= length_ (4 vs. 1) *** Check failure stack trace: *** *** Aborted at 1650340524 (Unix time, try 'date -d 1650340524') *** *** Signal 6 (SIGABRT) (0x50aa) received by PID 20650 (pthread TID 0x7f5f83aa9700) (linux TID 25733) (maybe from PID 20650, UID 0) (code: -6), stack trace: *** @ 00000000000192fd folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*) ./fbcode/folly/experimental/symbolizer/SignalHandler.cpp:449 @ 00000000000102ab folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*) ./fbcode/folly/experimental/symbolizer/SignalHandler.cpp:470 @ 0000000000000000 (unknown) @ 000000000003e530 __GI_raise @ 000000000002551c __GI_abort @ 000000000000dbac google::LogMessage::Fail() /home/engshare/third-party2/glog/0.3.2_fb/src/glog-0.3.2/src/logging.cc:1519 @ 00000000000109f2 google::LogMessage::SendToLog() /home/engshare/third-party2/glog/0.3.2_fb/src/glog-0.3.2/src/logging.cc:1473 @ 000000000000d7f2 google::LogMessage::Flush() /home/engshare/third-party2/glog/0.3.2_fb/src/glog-0.3.2/src/logging.cc:1346 @ 000000000000eaa8 google::LogMessageFatal::~LogMessageFatal() /home/engshare/third-party2/glog/0.3.2_fb/src/glog-0.3.2/src/logging.cc:2013 @ 000000000009dead folly::IOBuf::trimStart(unsigned long) ./buck-out/v2/gen/fbcode/8107dfcc75db0fad/folly/io/__iobuf__/headers/folly/io/IOBuf.h:765 -> ./fbcode/eden/fs/nfs/rpc/Server.cpp @ 000000000009ce3f facebook::eden::RpcTcpHandler::tryConsumeReadBuffer()::$_0::operator()() ./fbcode/eden/fs/nfs/rpc/Server.cpp:243 @ 000000000009bedc void folly::detail::function::FunctionTraits<void ()>::callSmall<facebook::eden::RpcTcpHandler::tryConsumeReadBuffer()::$_0>(folly::detail::function::Data&) ./buck-out/v2/gen/fbcode/8107dfcc75db0fad/folly/__function__/headers/folly/Function.h:364 -> ./fbcode/eden/fs/nfs ... ... ... y/__named_thread_factory__/headers/folly/executors/thread_factory/NamedThreadFactory.h:40 -> ./fbcode/configerator/distribution/api/ConfigeratorStaticData.cpp @ 000000000033b32c void std::__invoke_impl<void, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(std::__invoke_other, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}&&) ./fbcode/third-party-buck/platform009/build/libgcc/include/c++/trunk/bits/invoke.h:60 -> ./fbcode/configerator/distribution/api/ConfigeratorStaticData.cpp @ 000000000033b2bc std::__invoke_result<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>::type std::__invoke<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(std::__invoke_result&&, (folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}&&)...) ./fbcode/third-party-buck/platform009/build/libgcc/include/c++/trunk/bits/invoke.h:95 -> ./fbcode/configerator/distribution/api/ConfigeratorStaticData.cpp @ 000000000033b294 void std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) ./fbcode/third-party-buck/platform009/build/libgcc/include/c++/trunk/thread:244 -> ./fbcode/configerator/distribution/api/ConfigeratorStaticData.cpp @ 000000000033b264 std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> >::operator()() ./fbcode/third-party-buck/platform009/build/libgcc/include/c++/trunk/thread:251 -> ./fbcode/configerator/distribution/api/ConfigeratorStaticData.cpp @ 000000000033b0ed std::thread::_State_impl<std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> > >::_M_run() ./fbcode/third-party-buck/platform009/build/libgcc/include/c++/trunk/thread:195 -> ./fbcode/configerator/distribution/api/ConfigeratorStaticData.cpp @ 00000000000d9660 execute_native_thread_routine @ 000000000000920b start_thread @ 000000000011816e __GI___clone ``` The reason we are crashing is that the first IOBuf in the chain does not necessarily contain the full fragment header. So we might attempt to trim off more data than is in the IOBuf. We can fix this my trimming from the IOBuf chain instead. TBH idk why only this test triggers the problem, it seems like eden should crash more often than that. But it should be fixed now anyways. Reviewed By: xavierd Differential Revision: D35863737 fbshipit-source-id: 21640b252a703fe4fa52f66111e6ae50a94bc347

Summary: When given a (commit, path) pair, `AclRegions` facet returns all Hipster ACLs that permit accessing that (commit, path). See D34272143 for how configuration works. To answer those queries effectively, we precompute a trie over all path prefixes from rules in the config. Then, for each (commit, path) query: 1. Use the trie to extract all rules that match by path. 2. For all rules we got from #1 naively check that the given commit matches them. Design doc: https://docs.google.com/document/d/1MQBk-N-kRS7L-7IoFzkN0BdCif1UfyDQd0P1qOnx8TQ/edit?usp=sharing Reviewed By: markbt Differential Revision: D34674134 fbshipit-source-id: b7ca5c6a1651cd6ef379e5a4d12068f15492c496

Summary: During shutdown, the overlay background thread is terminated, and the overlay closed, but EdenServer timers are still running. One of which being the manageOverlay one which performs a maintenance on the Overlay. In some cases, shutdown and this timer are racing each other, causing the timer to run after the overlay has been closed, causing a use after free and crashing EdenFS. To solve this, we can either add locks around closing the overlay and the maintenance to guarantee that they do not race, or we can move the maintenance operation to a thread that is known to be running only when the overlay is opened. This diff takes the second approach. The bug manifest itself like so: I0712 01:16:48.253296 21620 EdenServiceHandler.cpp:3394] [000002517432E1E0] initiateShutdown() took 368 µs I0712 01:16:48.253838 24700 PrjfsChannel.cpp:1185] Stopping PrjfsChannel for: C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main I0712 01:16:48.258533 19188 EdenServer.cpp:1624] mount point "C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main" stopped V0712 01:16:48.258814 19188 EdenMount.cpp:851] beginning shutdown for EdenMount C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main V0712 01:16:48.259895 19188 EdenMount.cpp:855] shutdown complete for EdenMount C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main V0712 01:16:48.287378 19188 EdenMount.cpp:861] successfully closed overlay at C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main Unhandled win32 exception code=0xC0000005. Fatal error detected at: #0 00007FF707BA3D81 (7a686d9) facebook::eden::SqliteDatabase::checkpoint Z:\shipit\eden\eden\fs\sqlite\SqliteDatabase.cpp:99 #1 00007FF7072D4090 facebook::eden::EdenServer::manageOverlay Z:\shipit\eden\eden\fs\service\EdenServer.cpp:2142 #2 00007FF7074765D0 facebook::eden::PeriodicTask::timeoutExpired Z:\shipit\eden\eden\fs\service\PeriodicTask.cpp:32 #3 00007FF707500F93 folly::HHWheelTimerBase<std::chrono::duration<__int64,std::ratio<1,1000> > >::timeoutExpired Z:\shipit\folly\folly\io\async\HHWheelTimer.cpp:286 #4 00007FF70757BB44 folly::AsyncTimeout::libeventCallback Z:\shipit\folly\folly\io\async\AsyncTimeout.cpp:174 #5 00007FF708E7AD45 (645b6fc) event_priority_set #6 00007FF708E7AA3A event_priority_set #7 00007FF708E77343 event_base_loop #8 00007FF707515FC5 folly::EventBase::loopMain Z:\shipit\folly\folly\io\async\EventBase.cpp:405 #9 00007FF707515C62 folly::EventBase::loopBody Z:\shipit\folly\folly\io\async\EventBase.cpp:326 #10 00007FF7072DC5EE facebook::eden::EdenServer::performCleanup Z:\shipit\eden\eden\fs\service\EdenServer.cpp:1212 #11 00007FF707219BED facebook::eden::runEdenMain Z:\shipit\eden\eden\fs\service\EdenMain.cpp:395 #12 00007FF7071C624A main Z:\shipit\eden\eden\fs\service\oss\main.cpp:23 #13 00007FF708E87A94 __scrt_common_main_seh d:\A01\_work\12\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 #14 00007FFC96DC7034 BaseThreadInitThunk #15 00007FFC98C5CEC1 RtlUserThreadStart Reviewed By: fanzeyi Differential Revision: D37793444 fbshipit-source-id: cd33302789c2c7a29d566d5bac6e119eccf0a5f2

Summary: It turns out that initializing objc types before (disabling appnap) and after (via libcurl [2]) fork() would abort the program like: objc[<pid>]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. It seems objc really dislikes fork. Disabling the objc logic before fork seems to be the only way to unblock the issue. Other approaches considered: - Avoid `fork()`: not really fesiable for performance (startup, Python GIL) reasons. - Ensure chgserver does not create threads (see https://bugs.python.org/issue33725). Not possible since appnope implicitly creates a (short-lived?) thread. - Disable AppNap without using objc: does not seem trivial. - Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY` to `YES`. Abort with a different message [1]. [1]: ``` The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec(). Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug. ``` [2]: ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x00007ff81395f067 libobjc.A.dylib`objc_initializeAfterForkError frame #1: 0x00007ff81395f187 libobjc.A.dylib`performForkChildInitialize(objc_class*, objc_class*) + 274 frame #2: 0x00007ff81394a479 libobjc.A.dylib`initializeNonMetaClass + 617 frame #3: 0x00007ff813949f12 libobjc.A.dylib`initializeAndMaybeRelock(objc_class*, objc_object*, mutex_tt<false>&, bool) + 232 frame #4: 0x00007ff813949c93 libobjc.A.dylib`lookUpImpOrForward + 1087 frame #5: 0x00007ff81394e02f libobjc.A.dylib`object_getMethodImplementation + 153 frame #6: 0x00007ff813b12934 CoreFoundation`_NSIsNSString + 55 frame #7: 0x00007ff813b128dc CoreFoundation`-[NSTaggedPointerString isEqual:] + 36 frame #8: 0x00007ff813b05609 CoreFoundation`CFEqual + 533 frame #9: 0x00007ff813b0b2a3 CoreFoundation`_CFBundleCopyBundleURLForExecutableURL + 220 frame #10: 0x00007ff813b069cb CoreFoundation`CFBundleGetMainBundle + 116 frame #11: 0x00007ff813b28ade CoreFoundation`_CFPrefsGetCacheStringForBundleID + 71 frame #12: 0x00007ff813b2f52f CoreFoundation`-[CFPrefsPlistSource setDomainIdentifier:] + 92 frame #13: 0x00007ff813b2f46c CoreFoundation`-[CFPrefsPlistSource initWithDomain:user:byHost:containerPath:containingPreferences:] + 99 frame #14: 0x00007ff813b2f351 CoreFoundation`__85-[_CFXPreferences(PlistSourceAdditions) withManagedSourceForIdentifier:user:perform:]_block_invoke + 156 frame #15: 0x00007ff813c622a7 CoreFoundation`-[_CFXPreferences withSources:] + 60 frame #16: 0x00007ff813cac80f CoreFoundation`-[_CFXPreferences withManagedSourceForIdentifier:user:perform:] + 240 frame #17: 0x00007ff813b2a1ab CoreFoundation`-[CFPrefsSearchListSource addManagedSourceForIdentifier:user:] + 98 frame #18: 0x00007ff813c83a18 CoreFoundation`__108-[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke.160 + 287 frame #19: 0x00007ff813c836e8 CoreFoundation`-[_CFXPreferences withSearchLists:] + 60 frame #20: 0x00007ff813b29f50 CoreFoundation`__108-[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke + 279 frame #21: 0x00007ff813c83879 CoreFoundation`-[_CFXPreferences withSearchListForIdentifier:container:cloudConfigurationURL:perform:] + 374 frame #22: 0x00007ff813b29a42 CoreFoundation`-[_CFXPreferences copyAppValueForKey:identifier:container:configurationURL:] + 137 frame #23: 0x00007ff813b29978 CoreFoundation`_CFPreferencesCopyAppValueWithContainerAndConfiguration + 101 frame #24: 0x00007ff8145c672b SystemConfiguration`SCDynamicStoreCopyProxiesWithOptions + 155 frame #25: 0x000000010272a315 hg`Curl_resolv(data=0x00007f8dea888a00, hostname="api.github.com", port=443, allowDOH=true, entry=0x000000030a12be10) at hostip.c:675:30 [opt] frame #26: 0x000000010272a68c hg`Curl_resolv_timeout(data=<unavailable>, hostname=<unavailable>, port=<unavailable>, entry=<unavailable>, timeoutms=<unavailable>) at hostip.c:908:8 [opt] [artificial] frame #27: 0x0000000102753e1e hg`create_conn at url.c:3440:12 [opt] frame #28: 0x0000000102753cfe hg`create_conn(data=0x00007f8dea888a00, in_connect=0x000000030a12bed8, async=0x000000030a12bf8f) at url.c:4077:12 [opt] frame #29: 0x000000010275026b hg`Curl_connect(data=0x00007f8dea888a00, asyncp=0x000000030a12bf8f, protocol_done=0x000000030a12bfb2) at url.c:4156:12 [opt] frame #30: 0x000000010273dbd1 hg`multi_runsingle(multi=<unavailable>, nowp=0x000000030a12c020, data=0x00007f8dea888a00) at multi.c:1858:16 [opt] frame #31: 0x000000010273d6ae hg`curl_multi_perform(multi=0x00007f8deb0699c0, running_handles=0x000000030a12c074) at multi.c:2636:14 [opt] frame #32: 0x0000000102726ada hg`curl_easy_perform at easy.c:599:15 [opt] frame #33: 0x0000000102726aa8 hg`curl_easy_perform [inlined] easy_perform(data=0x00007f8dea888a00, events=false) at easy.c:689:42 [opt] frame #34: 0x00000001027269a4 hg`curl_easy_perform(data=0x00007f8dea888a00) at easy.c:708:10 [opt] frame #35: 0x00000001025e1cf6 hg`http_client::request::Request::send::h13a4e600f6bc5508 [inlined] curl::easy::handler::Easy2$LT$H$GT$::perform::h2ba0ae1da25a8852(self=<unavailable>) at handler.rs:3163:37 [opt] ``` Reviewed By: bolinfest Differential Revision: D40538471 fbshipit-source-id: cd8611c8082fbe2d610efb78cb84defdb16d7980

Summary: There are 2 reasons why this test is disabled: 1) `eden chown` crashes on macOS 2) Even if `eden chown` worked on macOS, it would require passwordless `sudo` which is not available on macOS Sandcastle hosts where we run our contbuild and diff time tests. Since #2 is a big issue that we can't work around, let's disable the test. Due to the test war room, I won't be fixing #1 in this diff. However, I created T147554174 to track fixing it. Reviewed By: fanzeyi Differential Revision: D43926151 fbshipit-source-id: f1572d2b876d45affdcdebf9b567e23c7730076a

Summary: ## Context Implicit deletes for a given changeset are found by: - Iterating over all the file changes and keeping only the ones that modify/add, - For each of these file changes, lookup an entry in the parent's manifest for that path. - If an entry is found and it's a leaf, no implicit deletes. - If an entry is found and it's a tree, return all the leaf entries from that tree, i.e. all the files/directories under the directory being implicitly deleted. ## Problem `find_entries` ([code](https://fburl.com/code/ugxvq3jv)) seems to more efficient way to do manifest lookups for multiple paths. The current implementation iterates over each file change and looks up its entry using `find_entry`, which just calls `find_entries` passing a single path. **Improvement #1:** Pass all the paths directly to `find_entries`. The other problem is that the calls to `list_leaf_entries` returns a stream and the streams were being flattened using `try_flatten_stream`, which IIUC preserves the order of the streams and **executes them serially**. **Improvement #2:** Use `try_flatten_unordered` to introduce concurrency in the execution of the streams. #thanks mitrandir77 for helping me flatten streams concurrently and without forcing execution! ## Notes I initially used 100 as the concurrency limit, but increasing it to 10000 led to a big improvement, so I decided to go with that for now. `get_implicit_deletes_single_parent` gets called for each parent of the changeset. I'm assuming that most changesets don't have a high number of parents and, because there's already a high degree of concurrency in `get_implicit_deletes_single_parent`, I decided to set the concurrency limit to 10 when flattening the results of each parent. Reviewed By: markbt Differential Revision: D50083311 fbshipit-source-id: 04f20f691597001b0f60651a0b635bff2d6e1596

Summary: I found this code that still used structopt as the `arg_enum!` macro caused a clippy warning: ``` warning: useless use of `vec!` --> third-party/rust/vendor/clap-2.34.0/src/macros.rs:317:1 | = note: in this expansion of `arg_enum!` (#1) | = note: in this expansion of `arg_enum!` (#2) ::: third-party/rust/vendor/clap-2.34.0/src/macros.rs:380:9 | = note: in this macro invocation (#2) | ::: fbcode/eden/mononoke/facebook/derived_data_service/client/main.rs:33:1 | 33 | / arg_enum! { 34 | | #[derive(Debug)] 35 | | enum ArgDerivationTypes { 36 | | Single, ... | 39 | | } 40 | | } | |_- in this macro invocation (#1) | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_vec = note: `#[warn(clippy::useless_vec)]` on by default warning: 1 warning emitted ``` Time to move-on. We try to minimize behaviour changes as this diff is a refactoring. However, the derivation type now expects a lowercase argument as that's the default behaviour for ValueEnum. We're not going out of our way to stick to the old behaviour and update the tests instead. Reviewed By: mitrandir77 Differential Revision: D51109785 fbshipit-source-id: 8ad55f842466c2305e97b85c7a733d2840d41b90

Summary: Part of the hooks cleanup hackaweek: https://fburl.com/gsd/nviep5e1 ## Steps 1. Implement support for json config, supporting the legacy config using the new implementation. 2. Update the config in configerator, so it uses the new json config Set the `display_title_length` default in configerator. 3. Delete the legacy config code and integration test. This diff is for step #1. ## EDIT Looks like this hook isn't used at all: https://fburl.com/code/8qnzw8cv So I'll ship this and skip step 2. I could remove the legacy code right now, but I want to let this be rolled out in case I missed something (e.g. not sure if dynamically generated hooks exist?). Differential Revision: D54118071 fbshipit-source-id: 3edb35f1776dcc6ad8f07ea2deb71ca7a81910b4

Summary: Part of the hooks cleanup hackaweek: https://fburl.com/gsd/nviep5e1 ## Steps 1. Implement support for json config, supporting the legacy config using the new implementation. 2. Update the config in configerator, so it uses the new json config 3. Delete the legacy config code and integration test. This diff is for step #1. FYI, I added the doc comment based on the summary of D19969689, which introduced this hook. Differential Revision: D54118734 fbshipit-source-id: ea67676ec1f864040a8718ce87e6997c5aaff1e8

Summary: # Context We received several reports of scm-aware Watchman queries returning incorrect results when FilteredFS was being used and filters were actively enabled[1]. In these reports, users would expect that modified files between commits would be reported as changed, however Watchman (err, EdenFS in this case) reported these files as unmodified between commits. This was causing build tools to use stale file content. After further investigation, I found that disabling filters (running `hg filteredfs disable`) caused Watchman to report correct results. Therefore, the bug only occurred when a non-null filter was enabled on FilteredFS. This significantly narrowed down the possible causes. # Root Cause The root cause was a bug in the ObjectID comparison logic for FilteredBackingStores. FilteredBackingStores use FilteredObjectIDs. When determining differences between these IDs, we must take into account three things: 1) Do the FilteredObjectIDs have the same type (Blob, Tree, or Unfiltered Tree)? 2) Do the Filters associated w/ each FilteredObjectID resolve to the exact same coverage (i.e. do they filter the same descendents)? 3) Do the underlying ObjectIDs associated w/ each FilteredObjectID refer to the same object(s). We were successfully determining #1 and #2, but we failed to determine #3 correctly in one specific edge case. This was causing us to incorrectly calculate two FilteredObjectIDs as identical when they were actually different. # This diff This diff introduces a test that shows the bad behavior. We would expect this test to fail if we EXPECT_NE() instead of EXPECT_EQ() on line 824 of FilteredBackingStoreTest.cpp Reviewed By: kmancini Differential Revision: D55345652 fbshipit-source-id: 998427cd09c9f8965ec9c84cb21f3c25222ff7d0

Summary: # Context We received several reports of scm-aware Watchman queries returning incorrect results when FilteredFS was being used and filters were actively enabled[1]. In these reports, users would expect that modified files between commits would be reported as changed, however Watchman (err, EdenFS in this case) reported these files as unmodified between commits. This was causing build tools to use stale file content. After further investigation, I found that disabling filters (running `hg filteredfs disable`) caused Watchman to report correct results. Therefore, the bug only occurred when a non-null filter was enabled on FilteredFS. This significantly narrowed down the possible causes. # Root Cause The root cause was a bug in the ObjectID comparison logic for FilteredBackingStores. FilteredBackingStores use FilteredObjectIDs. When determining differences between these IDs, we must take into account three things: 1) Do the FilteredObjectIDs have the same type (Blob, Tree, or Unfiltered Tree)? 2) Do the Filters associated w/ each FilteredObjectID resolve to the exact same coverage (i.e. do they filter the same descendents)? 3) Do the underlying ObjectIDs associated w/ each FilteredObjectID refer to the same object(s). We were successfully determining #1 and #2, but we failed to determine #3 correctly in one specific edge case. This was causing us to incorrectly calculate two FilteredObjectIDs as identical when they were actually different. # This diff This diff ensures that object equality (aka #3 from above) is checked in all cases. Therefore we don't hit a scenario where #1 and #2 are equal and #3 is ignored altogether. Reviewed By: kmancini Differential Revision: D55350885 fbshipit-source-id: bdafab99e56ddfa98446b4ba26dc0bde96121dad

Summary: ## This stack This stack will introduce proper validation of bonsais to make sure that the submodule expansion is always consistent with the submodule metadata file. I'll start with some refactoring diffs to make everything easier to review. The goal of validation is to make sure that **the submodule metadata file is always kept in sync with the expansion**. How this will be done is more complicated than it seems, but on a high level we will 1 Get the fsnode in the submodule repo from the commit in the metadata file 2. Derive the fsnode for the new commit 3. Get the entry for the expansion and compare with the fsnode from step #1. 4. We'll perform many validations of consistency in this process (e.g. if expansion is changed, metadata file has to be changed as well). ## This diff Encapsulates everything needed by the submodule expansion module into a struct. I'm doing this because I don't want to keep adding more and more parameters to multiple function parameters every time I need something else. One thing I want to make sure is that this type is cheap to clone, because it only contains references. Reviewed By: mitrandir77 Differential Revision: D55696045 fbshipit-source-id: 85ab8756ca80a19ad561e585f301adad501fb718

Summary: ## This stack See D55696045 ## This diff - Adds most of the step #1 in the validation. - Adds comments with the outlines of each validation step. See the comments throughout the code for more context. Reviewed By: mitrandir77 Differential Revision: D55757447 fbshipit-source-id: f6e6ddc2e33d05206c38c1e936b960b82e01276d

Summary: As a precaution, we don't want to allow any backsyncing (e.g. through SCS cross-repo commands) for repos that have submodule expansion enabled. To achieve this I temporarily added a a new `large_repo_id` field to the `SubmoduleExpansionData` and updated all the callsites to make sure we can check that we're not performing backsync before running the submodule expansion code. I started setting up unit tests for this, but it became much more complicated than expected (see diffs above) and, since this is not **needed** to start Phase #1 for a few repos (i.e. read-only, forward syncing expansion), I'll be writing tests after I finish implementing Validation of expansion (which is the purpose of this stack) and have finished updating all unit/integration tests for forward syncing. Reviewed By: mitrandir77 Differential Revision: D55802349 fbshipit-source-id: 50ae5b9bc523aed1aeac34f03e2814c27dc044d8

Summary: ## This stack Continues the implementation of the primitives that will validate submodule expansion in large repo's. bonsais. The goal of validation is to make sure that **the submodule metadata file is always kept in sync with the expansion**. How this will be done is more complicated than it seems, but on a high level we will 1 Get the fsnode in the submodule repo from the commit in the metadata file 2. Derive the fsnode for the new commit 3. Get the entry for the expansion and compare with the fsnode from step #1. 4. We'll perform many validations of consistency in this process (e.g. if expansion is changed, metadata file has to be changed as well). ## Challenge of validation To efficiently verify that the working copies match, we want to get fsnodes of both the commit being expanded in the submodule (easy) and the new commit being synced to the large repo. The second one is a problem, because in order to get it, we would need access to the large repo's blobstore and to derive the fsnode using the large repo's derived data. The problem with is that the primitives that rewrite commits (thus expand the submodules) are, on purpose, only given the target repo's id, instead of the actual repo (and its blobstore). This was done a while back (D43002436) to make sure we don't accidentally write the target repo. We want to keep the maintain that safety net, so the problem this diff tries to solve is: **provide READ-ONLY access to some parts of the large repo to the commit sync primitives** (e.g. CommitInMemorySyncer). ## This diff Introduces a new Repo facet container, called `InMemoryRepo`, which can be passed to the validation primitives and will ensure that any writes performed during validation only happen in memory, i.e. are not persisted to the large repo's blobstore or tables. This InMemoryRepo needs to have `RepoDerivedData` and `Changesets`. But to create a `RepoDerivedData`, I need to provide a bunch of other things, most of which I won't need. So instead of implementing a "wrapper" for all of these things (e..g `BonsaiGitMapping`, `BonsaiHgMapping`), I defined a dummy data type that will implement these to satisfy the type checker. This is good because it will be very clear what the validation primitives will be using and if in the future this changes, the tests will crash because they'll try to read from or write to storage that they're not supposed to. In which case the person making the change will implement a wrapper to ensure the operations only write to memory. Reviewed By: markbt Differential Revision: D55701183 fbshipit-source-id: 706b8d4cbef8cde6a9562a52009db90901d4a685

Summary: We have discovered a deadlock in EdenFS in S412223. The deadlock appears when all fsChannelThreads have a stack trace that looks like: ``` thread #225, name = 'FsChannelThread', stop reason = signal SIGSTOP frame #0: 0x00007f7ca85257b9 frame #1: 0x0000000007d73aa4 edenfs`bool folly::DynamicBoundedQueue<folly::CPUThreadPoolExecutor::CPUTask, false, false, true, 8ul, 7ul, folly::DefaultWeightFn<folly::CPUThreadPoolExecutor::CPUTask>, std::atomic>::canEnqueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>>(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>> const&, unsigned long) + 212 frame #2: 0x0000000007d73557 edenfs`bool folly::DynamicBoundedQueue<folly::CPUThreadPoolExecutor::CPUTask, false, false, true, 8ul, 7ul, folly::DefaultWeightFn<folly::CPUThreadPoolExecutor::CPUTask>, std::atomic>::tryEnqueueUntilSlow<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>, folly::CPUThreadPoolExecutor::CPUTask>(folly::CPUThreadPoolExecutor::CPUTask&&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>> const&) + 39 frame #3: 0x0000000007d722c0 edenfs`facebook::eden::EdenTaskQueue::add(folly::CPUThreadPoolExecutor::CPUTask) + 576 frame #4: 0x0000000005172cbd edenfs`folly::CPUThreadPoolExecutor::add(folly::Function<void ()>) + 557 frame #5: 0x000000000503c4d8 edenfs`void folly::Executor::KeepAlive<folly::Executor>::add<folly::Function<void (folly::Executor::KeepAlive<folly::Executor>&&)>>(folly::Function<void (folly::Executor::KeepAlive<folly::Executor>&&)>&&) && + 216 frame #6: 0x000000000503c257 edenfs`folly::futures::detail::DeferredExecutor::addFrom(folly::Executor::KeepAlive<folly::Executor>&&, folly::Function<void (folly::Executor::KeepAlive<folly::Executor>&&)>) + 135 frame #7: 0x000000000503baf5 edenfs`folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State) + 565 frame #8: 0x00000000052e5e0d edenfs`folly::futures::detail::CoreBase::proxyCallback(folly::futures::detail::State) + 493 frame #9: 0x00000000052e5bb2 edenfs`folly::futures::detail::CoreBase::setProxy_(folly::futures::detail::CoreBase*) + 66 ... ``` This stack means that an FsChannelThread is blocked waiting to enque a task to the EdenTaskQueue. The EdenTaskQueue is the task queue that FsChannelThread's pull work from. So the deadlock is that all FsChannelThreads are blocked trying to add to a full queue which can only be emptied by FsChannelThreads. There are two contributing reasons for this happening: 1. The FsChannelThread will queue a bunch of network fetches into the Sapling queue. When those all finish the future callback chain runs on the FsChannelThreads. So the backing store accumulates a bunch of work then when the fetches complete it dumps a bunch of tasks onto the FsChannelThread's queue. So the backing store is filling up the FsChannelThread task queue. Other threads could theoretically do this to, but backingstore is the main one I have seen. 2. the FsChannelThread might enqueue to their own threads. Folly futures have some smarts to try to prevent a task running on executor a from enqueueing work onto executor a (and instead just run the callback inline), see: https://www.internalfb.com/code/fbsource/[c7c20340562d2eab5f5d2f7f45805546687942d9]/fbcode/folly/futures/detail/Core.cpp?lines=147-148. However, that does not prevent a future that is unknowningly running on an executor's thread from enqueueing to that thread's executor queue. I belive that kind of thing happens when we do stuff like this: https://www.internalfb.com/code/fbsource/[824f6dc95f161e141bf9b821a7826c40b570ddc3]/fbcode/eden/fs/inodes/TreeInode.cpp?lines=375-376 The outerlamba is aware which exector it's running on, but the future we start inside the lambda is not, so when we add that thenValue, it doesn't realize it's enqueuing to it's own executor. I wrote up this toy program to show when folly will enqueue vs run inline: https://www.internalfb.com/intern/commit/cloud/FBS/bce3a906f53913ab8dc74944b8b50d09d78baf9a. script: P1222930602, output: P1222931093. This shows if you return a future from a future callback, the next callback is enqueued. and if you have a callback on a future returned by another future's callback, the callback on the returned future is enqueued. So in summary, backingstore fills up the FsChannelThread work queue then FsChannelThread trys to push work onto it's own queue and deadlocks. Potential solutions: **1- Queued Immediate Executor.** This behavior was likely introduced here: D51302394. We moved from queued immediate executor to the fschannelthreads. queued immediate executor uses an unbounded queue looks like: https://www.internalfb.com/code/fbsource/[c7c20340562d2eab5f5d2f7f45805546687942d9]/fbcode/folly/executors/QueuedImmediateExecutor.h?lines=39 so that is why we didn't have the problem before. I don't love going back to this solution because we are moving away from queued immediate exector because it makes it easy to cause deadlocks. For example the one introduced in D50199539. **2- Make the queue error instead of block when full.** We use to do that for the Eden CPU thread pool, and it resulted in a lot of errors from eden that caused issues for clients. See: D6490979. I think we are kicking the can down the road if we error rather than block. **3- Don't bother switching to the fschannelthreads to complete the fuse request.** This is likely going to be the same as 1. Unless we are going to undo the semifuture-ization we have been doing. or perhaps we could start the fuse request on the fschannel threads, then finish it on the eden cpu threads. Which is pretty much the same thing as this diff except more sets of threads involved. So I prefer this change. **4- add backpressure somewhere else.** If we prevent the backingstore/other threads from being able to fill up the fschannelthread queue then it should be impossible for the queue to fill up. Because there would be no fan out (one task out then one task in). However, this is fragile, we could easily introduce fan out again and then end up back here. Also this would mean we basically block all requests in the fuse layer instead of the lower layers of eden. We would need to redo the queueing in the backing store layer. The fragile-ness and complexity makes me not like this solution. **5 - linearize all future chains.** The reason that the fschannelthreads are enqueing to their own queue is that we have nested futures. If we didn't nest then folly future would run callbacks inline. So if we de-nest all our futures this issue should theoritically go away, because the fschannelthreads will not enqueue to their own queue. So if we de-nest all our futures this issue should theoritically go away. However, I don't know if we can completely get rid of returning a future in a future callback. I don't love this solution as I know there are some explicit places where we choose to nest (I'm looking at you PrjFSDispatcher). so it would be VERY confusing where am I supose to nest and where am I not. it would be easy to do the wrong thing and re-intoduce this bug. Also this is a ton of places we need to change and they are not easy to find. So don't like this option because it's not very "pit of success" - too fragile and too easy to get wrong the first time. **6 - unbound the fschannelthread queue.** It's not great. But there is precident for this. We unbounded the eden cpu thread pool for basically the same reason. See D6513572. The risk here is that we are opening out selves up to OOM. The queue might grow super duper large and then get eden oom killed. We probably should add a config to this change so we can roll this out carefully and watch for ooms as we rollout. Additionally, long term we likely want to rethink how we do threading to archetect eden away from this. I prefer this solution the most. That's what I have implemented here. --------------------------- note: I am removing the limit on the number of outstanding fs request we process at once in this diff. That config was not exactly working how we wanted any ways. Queueing in the backing store let us handle essentially infinite requests at once as the Sapling request queue does not have a max size. I can follow up with a semaphore in the fuse/nfs layers to rate limit the number of active requests. Though fwiw I will likely bump the limit at least initially by a lot when I do that since we realisiticly were allowing clients to do infinite requests previously. Reviewed By: jdelliot, genevievehelsel Differential Revision: D56553375 fbshipit-source-id: 9c6c8a76bd7c93b00d48654cd5fc31d1a68dc0b2

Summary: ## This stack See D57204338. ## This diff As mentioned in D56826268 and D56881703, there's currently a bug in `get_git_submodule_action_by_version` (just renamed from `get_strip_git_submodules_by_version`) that leads using the default submodule action (i.e. STRIP) when it's called during backsyncing (large to small). Luckily this led to a behaviour very close to what we're looking for **now** while backsyncing submodule changes is not supported: 1. If the commit doesn't touch submodule expansions, everything works normally. 2. If it does, it will sync it but the synced commit will be "broken". It will fail to derive fsnodes and other manifests. We still want to keep #1, but we don't want #2. What we want right now is to **explicitly fail sync if someone tries to backsync any changes modifying submodule expansions**. This diff implements this. Reviewed By: mitrandir77 Differential Revision: D57151944 fbshipit-source-id: b4931d3cc58815b2beaeaae90de99dda15b5fbb9

Summary: implement support for eagerepo this is read path that adds a new type to store the augmented manifest blob together with its own digest (similar approach to Mononoke) Reviewed By: markbt Differential Revision: D58868556 fbshipit-source-id: 14f1b85109ebc5d033dada95564c4328d96892ce

Summary: We were `std::move()`ing the `logger` in line 192, but attempted to dereference it in line 198. This caused a crash if the `WindowsNotifier` creation failed: ``` V0807 02:22:25.039028 752 StartupLogger.cpp:116] Starting edenfs 20240806-234409, pid 1416, session_id 1790547674 E0807 02:22:25.049512 5256 WindowsNotifier.cpp:70] Failed to add E-Menu icon: Error (0x80004005) Unspecified error W0807 02:22:25.049825 752 EdenServer.cpp:197] Couldn't start E-Menu: class std::exception: Failed to add E-Menu icon: Error (0x80004005) Unspecified error Unhandled win32 exception code=0xC0000005. Fatal error detected at: param=0x0 param=0x8 #0 00007FF71DB7DA1E Ordinal0 #1 00007FF89E0B1060 <failed to resolve symbol: Attempt to access invalid address.> ``` Reviewed By: jdelliot Differential Revision: D60935437 fbshipit-source-id: 395437203a6e8949ea2c5c92bc3e7e6817532f58

Summary: first test with In Memory Cas Server and mononoke cas sync current expected behaviour is unable to connect Reviewed By: mitrandir77 Differential Revision: D60902768 fbshipit-source-id: 82cb464c369e562ee944523f6d02809483344e0b

Summary: # Context We noticed that ~100 crashes per month are associated with the following stack trace: ``` *** Aborted at 1727306461 (Unix time, try 'date -d 1727306461') *** *** Signal 6 (SIGABRT) (0x38a4c000ee033) received by PID 974899 (pthread TID 0x7fb577cc5a80) (linux TID 974899) (maybe from PID 974899, UID 232012) (code: -6), stack trace: *** @ 0000000003f1c66f folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*) ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:453 @ 000000000004455f (unknown) /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:8 -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c @ 000000000009c993 __GI___pthread_kill /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_kill.c:46 @ 00000000000444ac __GI_raise /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/posix/raise.c:26 @ 000000000002c432 __GI_abort /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/stdlib/abort.c:79 @ 000000000430a8f2 folly::LogCategory::admitMessage(folly::LogMessage const&) const ./fbcode/folly/logging/LogCategory.cpp:71 @ 000000000430978d folly::LogStreamProcessor::logNow() ./fbcode/folly/logging/LogStreamProcessor.cpp:190 @ 0000000004309b5c folly::LogStreamVoidify<true>::operator&(std::ostream&) ./fbcode/folly/logging/LogStreamProcessor.cpp:222 @ 0000000004aa4832 facebook::eden::MountdServerProcessor::registerMount(facebook::eden::detail::AbsolutePathBase<std::basic_string_view<char, std::char_traits<char> > >, facebook::eden::InodeNumber) ./fbcode/eden/fs/nfs/Mountd.cpp:208 @ 0000000004aa5a0c facebook::eden::Mountd::registerMount(facebook::eden::detail::AbsolutePathBase<std::basic_string_view<char, std::char_traits<char> > >, facebook::eden::InodeNumber) ./fbcode/eden/fs/nfs/Mountd.cpp:255 @ 0000000004aa1f6b facebook::eden::NfsServer::registerMount(facebook::eden::detail::AbsolutePathBase<std::basic_string_view<char, std::char_traits<char> > >, facebook::eden::InodeNumber, std::unique_ptr<facebook::eden::NfsDispatcher, std::default_delete<facebook::eden::NfsDispatcher> >, folly::Logger const*, std::shared_ptr<facebook::eden::ProcessInfoCache>, std::shared_ptr<facebook::eden::FsEventLogger>, std::shared_ptr<facebook::eden::StructuredLogger> const&, std::chrono::duration<long, std::ratio<1l, 1000l> >, std::shared_ptr<facebook::eden::Notifier>, facebook::eden::CaseSensitivity, unsigned int, unsigned long) ./fbcode/eden/fs/nfs/NfsServer.cpp:102 @ 000000000483ca1e _ZN5folly6detail8function5call_IZNS_7futures6detail4CoreINS_4UnitEE11setCallbackIZNS4_10FutureBaseIS6_E18thenImplementationIZNOS_6FutureIS6_E9thenValueIZNS_3viaIZN8facebook4eden12_GLOBAL__N_114makeNfsChannelEPNSH_9EdenMountESt8optionalINS_4FileEEE3$_0EENSC_INS_20isFutureOrSemiFutureIDTclscT_fp0_EEE5InnerEEENS_8Executor9KeepAliveISV_EEOSQ_EUlSY_E_EENSC_INS4_19valueCallableResultIS6_SQ_E10value_typeEEESY_EUlOSX_ONS_3TryIS6_EEE_NS4_25tryExecutorCallableResultIS6_S18_EEEENSt9enable_ifIXntsrNT0_13ReturnsFutureE5valueENSC_INS1C_10value_typeEEEE4typeESY_S1C_NS4_18InlineContinuationEEUlS14_S17_E_EEvSY_OSt10shared_ptrINS_14RequestContextEES1H_EUlRNS4_8CoreBaseES14_PNS_17exception_wrapperEE_Lb0ELb0EvJS1O_S14_S1Q_EEET2_DpT3_RNS1_4DataE ./fbcode/eden/fs/inodes/EdenMount.cpp:2125 @ 0000000002d50848 void folly::detail::function::call_<folly::Executor::KeepAlive<folly::Executor>::add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0>(folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0&&) &&::{lambda()#1}, true, false, void>(, folly::detail::function::Data&) fbcode/folly/Function.h:370 @ 0000000002d56df9 folly::EventBase::FuncRunner::operator()(folly::Function<void ()>&&) fbcode/folly/Function.h:370 @ 0000000002d56b57 bool folly::AtomicNotificationQueue<folly::Function<void ()> >::drive<folly::EventBase::FuncRunner&>(folly::EventBase::FuncRunner&) fbcode/folly/io/async/AtomicNotificationQueue-inl.h:254 @ 0000000002d5970f folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::execute() fbcode/folly/io/async/EventBaseAtomicNotificationQueue-inl.h:269 @ 0000000002d5b90c non-virtual thunk to folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::handlerReady(unsigned short) fbcode/folly/io/async/EventBaseAtomicNotificationQueue-inl.h:280 @ 0000000002d5fc11 folly::EventHandler::libeventCallback(int, short, void*) ./fbcode/folly/io/async/EventHandler.cpp:159 @ 0000000002d65947 event_process_active /home/engshare/third-party2/libevent/1.4.14b_hphp/src/libevent-1.4.14b-stable/event.c:390 @ 0000000002d65b67 event_base_loop /home/engshare/third-party2/libevent/1.4.14b_hphp/src/libevent-1.4.14b-stable/event.c:532 @ 0000000002d5880c folly::EventBase::loopMain(int, folly::EventBase::LoopOptions) ./fbcode/folly/io/async/EventBase.cpp:0 @ 0000000002d57d89 folly::EventBase::loopBody(int, folly::EventBase::LoopOptions) ./fbcode/folly/io/async/EventBase.cpp:513 @ 0000000002d59c05 folly::EventBase::loopForever() ./fbcode/folly/io/async/EventBase.cpp:474 @ 0000000004417e98 apache::thrift::ThriftServer::serve() ./fbcode/thrift/lib/cpp2/server/ThriftServer.cpp:1465 @ 000000000427f748 facebook::fb303::withThriftFunctionStats(char const*, facebook::fb303::BaseService*, folly::Function<void ()>&&) fbcode/folly/Function.h:370 @ 0000000002c6305f facebook::eden::EdenMain::runServer(facebook::eden::EdenServer const&) ./fbcode/eden/fs/service/EdenMain.cpp:117 @ 0000000002c672d4 facebook::eden::runEdenMain(facebook::eden::EdenMain&&, int, char**) ./fbcode/eden/fs/service/EdenMain.cpp:454 @ 0000000002c51351 main ./fbcode/eden/fs/service/oss/main.cpp:15 @ 000000000002c656 __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58 -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86/libc-start.c @ 000000000002c717 __libc_start_main_alias_2 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409 -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86/libc-start.c @ 0000000002c50fe0 _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116 ``` Further investigation showed that this crash was occuring during mount time in the registerMount() method. This method is used when mounting an NFS mount to "register" a mount with the NFS mount daemon. In particular, the crash was occuring due to the following XCHECK: ``` auto map = mountPoints_.wlock(); auto [iter, inserted] = map->emplace(path.copy(), ino); XCHECK_EQ(inserted, true); ``` This XCHECK ensure that the registered mount was properly inserted into the mount map that contains all registered mounts. However, this insertion fails if a mount at the given path already exists in the mount map (i.e. emplace() will return false, which causes the check to fail). This shouldn't be able to happen since mounts are removed from the mount map when they are unmounted, however some circumstances will cause mounts to fail to get removed from the mount map. Specifically, the following scenario causes this: 1) A mount request is made for a mount on startup 2) makeNFSChannel creates an NFS channel for the requested mount and completes successfully. This fully registers the mount with the mount daemon and inserts it into the mount map 3) A request to privhelper is made to assist in actually performing the privileged mount operation for the given mount. 4) The privhelper request fails, and the mount is never actually advanced past the initializing state 5) The code for removing the mount from the mount map never gets called 6) A separate request is made to remount the failed mount. The request causes the daemon to crash because the daemon attempts to add a duplicate key to the mount map. # This diff This diff adds a test that shows this behavior exists and will be fixed by a diff further up the stack. Reviewed By: genevievehelsel Differential Revision: D63432633 fbshipit-source-id: aa3ec37ebd3f2f420165931d8175ba6a8190b65e

Summary: # Context We noticed that ~100 crashes per month are associated with the following stack trace: ``` *** Aborted at 1727306461 (Unix time, try 'date -d 1727306461') *** *** Signal 6 (SIGABRT) (0x38a4c000ee033) received by PID 974899 (pthread TID 0x7fb577cc5a80) (linux TID 974899) (maybe from PID 974899, UID 232012) (code: -6), stack trace: *** @ 0000000003f1c66f folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*) ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:453 @ 000000000004455f (unknown) /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:8 -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c @ 000000000009c993 __GI___pthread_kill /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_kill.c:46 @ 00000000000444ac __GI_raise /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/posix/raise.c:26 @ 000000000002c432 __GI_abort /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/stdlib/abort.c:79 @ 000000000430a8f2 folly::LogCategory::admitMessage(folly::LogMessage const&) const ./fbcode/folly/logging/LogCategory.cpp:71 @ 000000000430978d folly::LogStreamProcessor::logNow() ./fbcode/folly/logging/LogStreamProcessor.cpp:190 @ 0000000004309b5c folly::LogStreamVoidify<true>::operator&(std::ostream&) ./fbcode/folly/logging/LogStreamProcessor.cpp:222 @ 0000000004aa4832 facebook::eden::MountdServerProcessor::registerMount(facebook::eden::detail::AbsolutePathBase<std::basic_string_view<char, std::char_traits<char> > >, facebook::eden::InodeNumber) ./fbcode/eden/fs/nfs/Mountd.cpp:208 @ 0000000004aa5a0c facebook::eden::Mountd::registerMount(facebook::eden::detail::AbsolutePathBase<std::basic_string_view<char, std::char_traits<char> > >, facebook::eden::InodeNumber) ./fbcode/eden/fs/nfs/Mountd.cpp:255 @ 0000000004aa1f6b facebook::eden::NfsServer::registerMount(facebook::eden::detail::AbsolutePathBase<std::basic_string_view<char, std::char_traits<char> > >, facebook::eden::InodeNumber, std::unique_ptr<facebook::eden::NfsDispatcher, std::default_delete<facebook::eden::NfsDispatcher> >, folly::Logger const*, std::shared_ptr<facebook::eden::ProcessInfoCache>, std::shared_ptr<facebook::eden::FsEventLogger>, std::shared_ptr<facebook::eden::StructuredLogger> const&, std::chrono::duration<long, std::ratio<1l, 1000l> >, std::shared_ptr<facebook::eden::Notifier>, facebook::eden::CaseSensitivity, unsigned int, unsigned long) ./fbcode/eden/fs/nfs/NfsServer.cpp:102 @ 000000000483ca1e _ZN5folly6detail8function5call_IZNS_7futures6detail4CoreINS_4UnitEE11setCallbackIZNS4_10FutureBaseIS6_E18thenImplementationIZNOS_6FutureIS6_E9thenValueIZNS_3viaIZN8facebook4eden12_GLOBAL__N_114makeNfsChannelEPNSH_9EdenMountESt8optionalINS_4FileEEE3$_0EENSC_INS_20isFutureOrSemiFutureIDTclscT_fp0_EEE5InnerEEENS_8Executor9KeepAliveISV_EEOSQ_EUlSY_E_EENSC_INS4_19valueCallableResultIS6_SQ_E10value_typeEEESY_EUlOSX_ONS_3TryIS6_EEE_NS4_25tryExecutorCallableResultIS6_S18_EEEENSt9enable_ifIXntsrNT0_13ReturnsFutureE5valueENSC_INS1C_10value_typeEEEE4typeESY_S1C_NS4_18InlineContinuationEEUlS14_S17_E_EEvSY_OSt10shared_ptrINS_14RequestContextEES1H_EUlRNS4_8CoreBaseES14_PNS_17exception_wrapperEE_Lb0ELb0EvJS1O_S14_S1Q_EEET2_DpT3_RNS1_4DataE ./fbcode/eden/fs/inodes/EdenMount.cpp:2125 @ 0000000002d50848 void folly::detail::function::call_<folly::Executor::KeepAlive<folly::Executor>::add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0>(folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0&&) &&::{lambda()#1}, true, false, void>(, folly::detail::function::Data&) fbcode/folly/Function.h:370 @ 0000000002d56df9 folly::EventBase::FuncRunner::operator()(folly::Function<void ()>&&) fbcode/folly/Function.h:370 @ 0000000002d56b57 bool folly::AtomicNotificationQueue<folly::Function<void ()> >::drive<folly::EventBase::FuncRunner&>(folly::EventBase::FuncRunner&) fbcode/folly/io/async/AtomicNotificationQueue-inl.h:254 @ 0000000002d5970f folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::execute() fbcode/folly/io/async/EventBaseAtomicNotificationQueue-inl.h:269 @ 0000000002d5b90c non-virtual thunk to folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::handlerReady(unsigned short) fbcode/folly/io/async/EventBaseAtomicNotificationQueue-inl.h:280 @ 0000000002d5fc11 folly::EventHandler::libeventCallback(int, short, void*) ./fbcode/folly/io/async/EventHandler.cpp:159 @ 0000000002d65947 event_process_active /home/engshare/third-party2/libevent/1.4.14b_hphp/src/libevent-1.4.14b-stable/event.c:390 @ 0000000002d65b67 event_base_loop /home/engshare/third-party2/libevent/1.4.14b_hphp/src/libevent-1.4.14b-stable/event.c:532 @ 0000000002d5880c folly::EventBase::loopMain(int, folly::EventBase::LoopOptions) ./fbcode/folly/io/async/EventBase.cpp:0 @ 0000000002d57d89 folly::EventBase::loopBody(int, folly::EventBase::LoopOptions) ./fbcode/folly/io/async/EventBase.cpp:513 @ 0000000002d59c05 folly::EventBase::loopForever() ./fbcode/folly/io/async/EventBase.cpp:474 @ 0000000004417e98 apache::thrift::ThriftServer::serve() ./fbcode/thrift/lib/cpp2/server/ThriftServer.cpp:1465 @ 000000000427f748 facebook::fb303::withThriftFunctionStats(char const*, facebook::fb303::BaseService*, folly::Function<void ()>&&) fbcode/folly/Function.h:370 @ 0000000002c6305f facebook::eden::EdenMain::runServer(facebook::eden::EdenServer const&) ./fbcode/eden/fs/service/EdenMain.cpp:117 @ 0000000002c672d4 facebook::eden::runEdenMain(facebook::eden::EdenMain&&, int, char**) ./fbcode/eden/fs/service/EdenMain.cpp:454 @ 0000000002c51351 main ./fbcode/eden/fs/service/oss/main.cpp:15 @ 000000000002c656 __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58 -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86/libc-start.c @ 000000000002c717 __libc_start_main_alias_2 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409 -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86/libc-start.c @ 0000000002c50fe0 _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116 ``` Further investigation showed that this crash was occuring during mount time in the registerMount() method. This method is used when mounting an NFS mount to "register" a mount with the NFS mount daemon. In particular, the crash was occuring due to the following XCHECK: ``` auto map = mountPoints_.wlock(); auto [iter, inserted] = map->emplace(path.copy(), ino); XCHECK_EQ(inserted, true); ``` This XCHECK ensure that the registered mount was properly inserted into the mount map that contains all registered mounts. However, this insertion fails if a mount at the given path already exists in the mount map (i.e. emplace() will return false, which causes the check to fail). This shouldn't be able to happen since mounts are removed from the mount map when they are unmounted, however some circumstances will cause mounts to fail to get removed from the mount map. Specifically, the following scenario causes this: 1) A mount request is made for a mount on startup 2) makeNFSChannel creates an NFS channel for the requested mount and completes successfully. This fully registers the mount with the mount daemon and inserts it into the mount map 3) A request to privhelper is made to assist in actually performing the privileged mount operation for the given mount. 4) The privhelper request fails, and the mount is never actually advanced past the initializing state 5) The code for removing the mount from the mount map never gets called 6) A separate request is made to remount the failed mount. The request causes the daemon to crash because the daemon attempts to add a duplicate key to the mount map. # This diff This diff fixes the bad behavior described above by adding some logic to the initialization state to ensure the failed mount is removed from the mount daemon's map. To do this, we introduce a tryUnregisterMount method that will not fail if the mount we attempt to remove is not in the map. Reviewed By: genevievehelsel Differential Revision: D63432632 fbshipit-source-id: 90fe04a35bf215f939be93faba6f4051f4b262e2

making it a shell script

6243018

ghost added the CLA Signed label Jul 24, 2016

bolinfest closed this Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

making it a shell script #1

making it a shell script #1

KrishMunot commented Jul 24, 2016

bolinfest commented Aug 8, 2016

making it a shell script #1

making it a shell script #1

Conversation

KrishMunot commented Jul 24, 2016

bolinfest commented Aug 8, 2016