Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix edenfs oss build #6

Closed
wants to merge 1 commit into from
Closed

fix edenfs oss build #6

wants to merge 1 commit into from

Conversation

ahornby
Copy link
Owner

@ahornby ahornby commented Jul 11, 2023

PR for signal

@ahornby ahornby force-pushed the fix_eden_oss branch 7 times, most recently from 4e31bd0 to eab737c Compare July 16, 2023 14:58
@ahornby ahornby force-pushed the fix_eden_oss branch 13 times, most recently from 9ad0003 to 716b58b Compare July 29, 2023 19:33
@ahornby ahornby force-pushed the fix_eden_oss branch 8 times, most recently from 82d3d52 to 67c4274 Compare August 18, 2023 10:24
ahornby pushed a commit that referenced this pull request Sep 3, 2023
Summary:
Creating an initial version of the gitexport CLI (for more context on why we need it, see T160586594).

This tool is supposed to take a repo and a list of paths as input and it should export all the history of those paths in a git repo.

## What does it do now?
Currently, this binary doesn't do anything useful. it just gets the history of a single path to be exported and prints their changeset ids and commit messages (for manual debugging).

The main point of this diff is to **set most of the structure/flow of the tool to get some early feedback** before I start implementing anything more complex.

Most of the functions don't have an actual implementation, but just do something simple (e.g. returning the first element of a vector) so it typechecks.

## What's my current plan?
1) Get the history of all the given paths. (This is mostly done in this diff already)
2) Merge the changesets into a single, topologically sorted, list of changesets
3) Strip irrelevant changes from every commit (T161205476).
4) Create a CommitGraph from this list (T161204758).
5) Export that CommitGraph to a new, temporary, Mononoke repo (T160787114).
6) Use existing tools to export that temporary repo to a git repo (T160787114).

The tricky bits are steps 2,3 and 4, which is where I expect to spend most of my time.

First, I'm not sure if event to create a CommitGraph at all, to be able to export the processed changesets to a new repo. If I do need to, I'm not sure if I should (a) strip the irrelevant file changes before or after creating the graph and (b) how to create a new repo and populate it with the commits from the graph I created. (b) is more of a implementation detail, so I don't worry about now...

The main unknowns for me are #2 and #4. Basically, how can I create a proper commit graph from a set of commits that are not direct descendants of each other. Assuming a linear history, I don't think it would be very complicated, but we also have to support branching, so I'm not sure how to do this efficiently...

## Examples
Let me put as simple example below. Commits with uppercase letters are relevant (i.e. should be exported) and lower case letters should now.

```
A -> b -> C -> D -> e
|-> f -> G
```

In this case, I want to have the following commit graph in the end:
```
A' -> C' -> D'
|-> G'
```
where X' is X stripped of irrelevant changes

## RFC
- This is my first Rust diff ever, so please LMK what horrible things I'm doing, bc I'm very likely doing a few 😂
- Does the plan I described above make sense?
- Any suggestions/ideas on how to efficiently stitch the changesets together would be appreciated! But I'll probably set up some time to discuss this problem specifically once I spend more time thinking about it...

## Next steps
- Implement steps #5 and #6 (T160787114) to get the entire E2E solution working with the simplest case (i.e. one path with linear history). This is basically exporting the commit graph to a git repo (maybe through a temporary mononoke repo).
  - Update integration test case to actually run and test the tool with the simple case.
- Figure out how to properly create a commit graph from a list of changeset lists.
- Add test cases for multiple paths and edge cases, like having multiple branches.

Reviewed By: RajivTS

Differential Revision: D48226070

fbshipit-source-id: eed970a8e4697ab10682e3b93863e6d621adaacc
Summary:
fix missing header errors due to edencommon not in target link libraries

Test Plan:

Run local cmake build with: `./build/fbcode_builder/getdeps.py --allow-system-packages build eden`

Before, broken:
```
[12/387] Building CXX object eden/fs/nfs/CMakeFiles/eden_nfs_utils.dir/NfsUtils.cpp.o
FAILED: eden/fs/nfs/CMakeFiles/eden_nfs_utils.dir/NfsUtils.cpp.o
/usr/bin/c++ -DBOOST_ATOMIC_NO_LIB -DBOOST_CONTEXT_NO_LIB -DBOOST_FILESYSTEM_NO_LIB -DBOOST_PROGRAM_OPTIONS_NO_LIB -DBOOST_REGEX_NO_LIB -DBOOST_SYSTEM_NO_LIB -DBOOST_THREAD_NO_LIB -DFOLLY_XLOG_STRIP_PREFIXES=\"/home/alex/local/sapling:/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/build/eden\" -DGFLAGS_IS_A_DLL=0 -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/build/eden -I/home/alex/local/sapling -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fb303/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fizz/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/wangle/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fbthrift/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/googletest-VdBlGUPMPJitek6Vixf2wJ0d-fkcsAem4x561Ipyl_c/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/lmdb-EeqttKQ21Ssy0Z9v9Td5arfjclo9sZwEycvDeQZ707Q/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/gflags-4vXpjWHz7iQ-flj45DrshPtdnV00lhHzdluegsKfQv8/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/glog-rc1q5fRnLVjdS68JV9erzuj0hzjLqATGYHhZAKGSWHo/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/folly/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/boost-v767l9QNa5ozbwDxCGIOiS4txj6QsxfvDdR4F7KRbYY/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/zlib-I9YVwQvp3FsdWhYv3JD35W5331pwpP-NLFKrJNd9ubU/include -isystem /usr/include/libdwarf-0 -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fmt-XC7kuS1-TdG_KzAiD4Q16Eb-s6mNRaeEz1OrK4d5f7c/include -g -Wall -Wextra -Wno-deprecated -Wno-deprecated-declarations -Wno-unknown-pragmas -O2 -g -DNDEBUG -std=gnu++17 -MD -MT eden/fs/nfs/CMakeFiles/eden_nfs_utils.dir/NfsUtils.cpp.o -MF eden/fs/nfs/CMakeFiles/eden_nfs_utils.dir/NfsUtils.cpp.o.d -o eden/fs/nfs/CMakeFiles/eden_nfs_utils.dir/NfsUtils.cpp.o -c /home/alex/local/sapling/eden/fs/nfs/NfsUtils.cpp
In file included from /home/alex/local/sapling/eden/fs/nfs/NfsUtils.cpp:8:
/home/alex/local/sapling/eden/fs/nfs/NfsUtils.h:13:10: fatal error: eden/common/utils/DirType.h: No such file or directory
   13 | #include "eden/common/utils/DirType.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
[13/387] Building CXX object eden/fs/nfs/test/CMakeFiles/eden_nfs_test.dir/AccessTest.cpp.o
FAILED: eden/fs/nfs/test/CMakeFiles/eden_nfs_test.dir/AccessTest.cpp.o
/usr/bin/c++ -DBOOST_ATOMIC_NO_LIB -DBOOST_CONTEXT_NO_LIB -DBOOST_FILESYSTEM_NO_LIB -DBOOST_PROGRAM_OPTIONS_NO_LIB -DBOOST_REGEX_NO_LIB -DBOOST_SYSTEM_NO_LIB -DBOOST_THREAD_NO_LIB -DFOLLY_XLOG_STRIP_PREFIXES=\"/home/alex/local/sapling:/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/build/eden\" -DGFLAGS_IS_A_DLL=0 -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/build/eden -I/home/alex/local/sapling -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fb303/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fizz/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/wangle/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fbthrift/include -I/home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/lmdb-EeqttKQ21Ssy0Z9v9Td5arfjclo9sZwEycvDeQZ707Q/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/gflags-4vXpjWHz7iQ-flj45DrshPtdnV00lhHzdluegsKfQv8/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/glog-rc1q5fRnLVjdS68JV9erzuj0hzjLqATGYHhZAKGSWHo/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/folly/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/googletest-VdBlGUPMPJitek6Vixf2wJ0d-fkcsAem4x561Ipyl_c/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/boost-v767l9QNa5ozbwDxCGIOiS4txj6QsxfvDdR4F7KRbYY/include -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/zlib-I9YVwQvp3FsdWhYv3JD35W5331pwpP-NLFKrJNd9ubU/include -isystem /usr/include/libdwarf-0 -isystem /home/alex/local/tmp/fridge/fbcode_builder_getdeps-ZhomeZalexZlocalZsaplingZbuildZfbcode_builder/installed/fmt-XC7kuS1-TdG_KzAiD4Q16Eb-s6mNRaeEz1OrK4d5f7c/include -g -Wall -Wextra -Wno-deprecated -Wno-deprecated-declarations -Wno-unknown-pragmas -O2 -g -DNDEBUG -std=gnu++17 -MD -MT eden/fs/nfs/test/CMakeFiles/eden_nfs_test.dir/AccessTest.cpp.o -MF eden/fs/nfs/test/CMakeFiles/eden_nfs_test.dir/AccessTest.cpp.o.d -o eden/fs/nfs/test/CMakeFiles/eden_nfs_test.dir/AccessTest.cpp.o -c /home/alex/local/sapling/eden/fs/nfs/test/AccessTest.cpp
In file included from /home/alex/local/sapling/eden/fs/nfs/test/AccessTest.cpp:14:
/home/alex/local/sapling/eden/fs/nfs/NfsUtils.h:13:10: fatal error: eden/common/utils/DirType.h: No such file or directory
   13 | #include "eden/common/utils/DirType.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

```

After, works
ahornby pushed a commit that referenced this pull request Jun 8, 2024
Summary:
We have discovered a deadlock in EdenFS in S412223. The deadlock appears when
all fsChannelThreads have a stack trace that looks like:

```
  thread facebook#225, name = 'FsChannelThread', stop reason = signal SIGSTOP
    frame #0: 0x00007f7ca85257b9
    frame #1: 0x0000000007d73aa4 edenfs`bool folly::DynamicBoundedQueue<folly::CPUThreadPoolExecutor::CPUTask, false, false, true, 8ul, 7ul, folly::DefaultWeightFn<folly::CPUThreadPoolExecutor::CPUTask>, std::atomic>::canEnqueue<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>>(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>> const&, unsigned long) + 212
    frame #2: 0x0000000007d73557 edenfs`bool folly::DynamicBoundedQueue<folly::CPUThreadPoolExecutor::CPUTask, false, false, true, 8ul, 7ul, folly::DefaultWeightFn<folly::CPUThreadPoolExecutor::CPUTask>, std::atomic>::tryEnqueueUntilSlow<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>, folly::CPUThreadPoolExecutor::CPUTask>(folly::CPUThreadPoolExecutor::CPUTask&&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l>>> const&) + 39
    frame #3: 0x0000000007d722c0 edenfs`facebook::eden::EdenTaskQueue::add(folly::CPUThreadPoolExecutor::CPUTask) + 576
    frame #4: 0x0000000005172cbd edenfs`folly::CPUThreadPoolExecutor::add(folly::Function<void ()>) + 557
    frame #5: 0x000000000503c4d8 edenfs`void folly::Executor::KeepAlive<folly::Executor>::add<folly::Function<void (folly::Executor::KeepAlive<folly::Executor>&&)>>(folly::Function<void (folly::Executor::KeepAlive<folly::Executor>&&)>&&) && + 216
    frame #6: 0x000000000503c257 edenfs`folly::futures::detail::DeferredExecutor::addFrom(folly::Executor::KeepAlive<folly::Executor>&&, folly::Function<void (folly::Executor::KeepAlive<folly::Executor>&&)>) + 135
    frame #7: 0x000000000503baf5 edenfs`folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State) + 565
    frame #8: 0x00000000052e5e0d edenfs`folly::futures::detail::CoreBase::proxyCallback(folly::futures::detail::State) + 493
    frame #9: 0x00000000052e5bb2 edenfs`folly::futures::detail::CoreBase::setProxy_(folly::futures::detail::CoreBase*) + 66
    ...
```

This stack means that an FsChannelThread is blocked waiting to enque a task to
the EdenTaskQueue. The EdenTaskQueue is the task queue that FsChannelThread's
pull work from. So the deadlock is that all FsChannelThreads are blocked trying
to add to a full queue which can only be emptied by FsChannelThreads.

There are two contributing reasons for this happening:
1. The FsChannelThread will queue a bunch of network fetches into the Sapling
queue. When those all finish the future callback chain runs on the
FsChannelThreads. So the backing store accumulates a bunch of work then when
the fetches complete it dumps a bunch of tasks onto the FsChannelThread's queue.
So the backing store is filling up the FsChannelThread task queue. Other threads
could theoretically do this to, but backingstore is the main one I have seen.

2. the FsChannelThread might enqueue to their own threads. Folly futures have
some smarts to try to prevent a task running on executor a from enqueueing work
onto executor a (and instead just run the callback inline), see: https://www.internalfb.com/code/fbsource/[c7c20340562d2eab5f5d2f7f45805546687942d9]/fbcode/folly/futures/detail/Core.cpp?lines=147-148.
However, that does not prevent a future that is unknowningly running on an
executor's thread from enqueueing to that thread's executor queue. I belive that
kind of thing happens when we do stuff like this: https://www.internalfb.com/code/fbsource/[824f6dc95f161e141bf9b821a7826c40b570ddc3]/fbcode/eden/fs/inodes/TreeInode.cpp?lines=375-376
The outerlamba is aware which exector it's running on, but the future we start
inside the lambda is not, so when we add that thenValue, it doesn't realize it's
enqueuing to it's own executor. I wrote up this toy program to show when folly
will enqueue vs run inline: https://www.internalfb.com/intern/commit/cloud/FBS/bce3a906f53913ab8dc74944b8b50d09d78baf9a.
script: P1222930602, output: P1222931093. This shows if you return a future
from a future callback, the next callback is enqueued. and if you have a callback
on a future returned by another future's callback, the callback on the returned
future is enqueued.

So in summary, backingstore fills up the FsChannelThread work queue then
FsChannelThread trys to push work onto it's own queue and deadlocks.

Potential solutions:
**1- Queued Immediate Executor.**

This behavior was likely introduced here: D51302394. We moved from queued
immediate executor to the fschannelthreads. queued immediate executor uses an
unbounded queue looks like: https://www.internalfb.com/code/fbsource/[c7c20340562d2eab5f5d2f7f45805546687942d9]/fbcode/folly/executors/QueuedImmediateExecutor.h?lines=39
so that is why we didn't have the problem before.

I don't love going back to this solution because we are moving away from
queued immediate exector because it makes it easy to cause deadlocks. For
example the one introduced in D50199539.

**2-  Make the queue error instead of block when full.**

We use to do that for the Eden CPU thread pool, and it resulted in a lot of
errors from eden that caused issues for clients. See: D6490979. I think we are
kicking the can down the road if we error rather than block.

**3- Don't bother switching to the fschannelthreads to complete the fuse request.**

This is likely going to be the same as 1. Unless we are going to undo the
semifuture-ization we have been doing. or perhaps we could start the fuse request
on the fschannel threads, then finish it on the eden cpu threads. Which is pretty
much the same thing as this diff except more sets of threads involved. So I
prefer this change.

**4- add backpressure somewhere else.**

If we prevent the backingstore/other threads from being able to fill up the
fschannelthread queue then it should be impossible for the queue to fill up.
Because there would be no fan out (one task out then one task in).

However, this is fragile, we could easily introduce fan out again and then
end up back here. Also this would mean we basically block all requests in the
fuse layer instead of the lower layers of eden. We would need to redo the
queueing in the backing store layer.

The fragile-ness and complexity makes me not like this solution.

**5 - linearize all future chains.**
The reason that the fschannelthreads are enqueing to their own queue is that we
have nested futures. If we didn't nest then folly future would run callbacks
inline. So if we de-nest all our futures this issue should theoritically go away,
because the fschannelthreads will not enqueue to their own queue.

So if we de-nest all our futures this issue should theoritically go away.
However, I don't know if we can completely get rid of returning a future in
a future callback.

I don't love this solution as I know there are some explicit places where we
choose to nest (I'm looking at you PrjFSDispatcher). so it would be VERY
confusing where am I supose to nest and where am I not. it would be easy to
do the wrong thing and re-intoduce this bug. Also this is a ton of places we
need to change and they are not easy to find. So don't like this option because
it's not very "pit of success" - too fragile and too easy to get wrong the first
time.

**6 - unbound the fschannelthread queue.**

It's not great. But there is precident for this. We unbounded the eden cpu
thread pool for basically the same reason. See D6513572.

The risk here is that we are opening out selves up to OOM. The queue might
grow super duper large and then get eden oom killed. We probably should add a
config to this change so we can roll this out carefully and watch for ooms as
we rollout. Additionally, long term we likely want to rethink how we do threading
to archetect eden away from this.

I prefer this solution the most. That's what I have implemented here.

---------------------------
note: I am removing the limit on the number of outstanding fs request we
process at once in this diff. That config was not exactly working how we wanted
any ways. Queueing in the backing store let us handle essentially infinite
requests at once as the Sapling request queue does not have a max size.
I can follow up with a semaphore in the fuse/nfs layers to rate limit the
number of active requests. Though fwiw I will likely bump the limit at least
initially by a lot when I do that since we realisiticly were allowing clients
to do infinite requests previously.

Reviewed By: jdelliot, genevievehelsel

Differential Revision: D56553375

fbshipit-source-id: 9c6c8a76bd7c93b00d48654cd5fc31d1a68dc0b2
@ahornby ahornby closed this Sep 19, 2024
@ahornby ahornby deleted the fix_eden_oss branch September 22, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant