Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition when the redis filter is destroyed. #11466

Merged
merged 14 commits into from
Jun 26, 2020

Conversation

HenryYYang
Copy link
Contributor

@HenryYYang HenryYYang commented Jun 5, 2020

Signed-off-by: Henry Yang hyang@lyft.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message: Fix race condition when the redis filter is destroyed.
Additional Description: This fix the bugs with the following stack traces.
Risk Level: Low
Testing:
Docs Changes: N/A
Release Notes: N/A

Backtrace:
#0 0x000055ff2c9cd3d6 in Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool::ThreadLocalPool (this=0x55ff2ef432d8, parent=..., dispatcher=..., cluster_name=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:74
#0 0x000055ff2c9cd3d6 in Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool::ThreadLocalPool (this=0x55ff2ef432d8, parent=..., dispatcher=..., cluster_name=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:74
#1 0x000055ff2c9d01ea in __compressed_pair_elem<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&, 0, 1, 2> (this=0x55ff2ef432d8, __args=...) at /opt/llvm/bin/../include/c++/v1/memory:2155
#2 __compressed_pair<std::__1::allocatorEnvoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool&, Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (this=0x55ff2ef432d8, __first_args=..., __second_args=..., __pc=...) at /opt/llvm/bin/../include/c++/v1/memory:2258
#3 std::__1::__shared_ptr_emplace<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool, std::__1::allocatorEnvoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool >::__shared_ptr_emplace<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&> (this=0x55ff2ef432c0, __a=..., __args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:3543
#4 0x000055ff2c9d013a in make_shared<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (__args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:4202
#5 make_shared<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool, Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (__args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:4581
#6 operator() (this=, dispatcher=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:53
#7 __invoke<(lambda at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:51:13) &, Envoy::Event::Dispatcher &> (__f=..., __args=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#8 __call<(lambda at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:51:13) &, Envoy::Event::Dispatcher &> (__args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:317
#9 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1533
#10 std::__1::__function::__func<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::InstanceImpl(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Upstream::ClusterManager&, Envoy::Extensions::NetworkFilters::Common::Redis::Client::ClientFactory&, Envoy::ThreadLocal::SlotAllocator&, envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_ConnPoolSettings const&, Envoy::Api::Api&, std::__1::unique_ptr<Envoy::Stats::Scope, std::__1::default_deleteEnvoy::Stats::Scope >&&, std::__1::shared_ptrEnvoy::Extensions::NetworkFilters::Common::Redis::RedisCommandStats const&, std::__1::shared_ptrEnvoy::Extensions::Common::Redis::ClusterRefreshManager)::$_0, std::__1::allocator<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::InstanceImpl(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Upstream::ClusterManager&, Envoy::Extensions::NetworkF
ilters::Common::Redis::Client::ClientFactory&, Envoy::ThreadLocal::SlotAllocator&, envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_ConnPoolSettings const&, Envoy::Api::Api&, std::__1::unique_ptr<Envoy::Stats::Scope, std::__1::default_deleteEnvoy::Stats::Scope >&&, std::__1::shared_ptrEnvoy::Extensions::NetworkFilters::Common::Redis::RedisCommandStats const&, std::__1::shared_ptrEnvoy::Extensions::Common::Redis::ClusterRefreshManager)::$_0>, std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>::operator()(Envoy::Event::Dispatcher&) (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1707
#11 0x000055ff2ca8bd58 in operator() (this=, __args=...) at /opt/llvm/bin/../include/c++/v1/functional:1860
#12 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:2419
#13 operator() (this=, dispatcher=...) at external/envoy/source/common/thread_local/thread_local_impl.cc:101
#14 __invoke<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:100:14) &, Envoy::Event::Dispatcher &> (__f=..., __args=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#15 __call<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:100:14) &, Envoy::Event::Dispatcher &> (__args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:317
#16 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1533
#17 std::__1::__function::__func<Envoy::ThreadLocal::InstanceImpl::Bookkeeper::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_7, std::__1::allocator<Envoy::ThreadLocal::InstanceImpl::Bookkeeper::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_7>, std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>::operator()(Envoy::Event::Dispatcher&) (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1707
#18 0x000055ff2ca8cf88 in operator() (this=, __args=...) at /opt/llvm/bin/../include/c++/v1/functional:1860
#19 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:2419
#20 operator() (this=) at external/envoy/source/common/thread_local/thread_local_impl.cc:214
#21 __invoke<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:214:21) &> (__f=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#22 __call<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:214:21) &> (__args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:348
#23 operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1533
#24 std::__1::__function::__func<Envoy::ThreadLocal::InstanceImpl::SlotImpl::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_13, std::__1::allocator<Envoy::ThreadLocal::InstanceImpl::SlotImpl::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_13>, void ()>::operator()() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1707
#25 0x000055ff2cb0ef76 in operator() (this=0x7f3532c25ad0) at /opt/llvm/bin/../include/c++/v1/functional:1860
#26 operator() (this=0x7f3532c25ad0) at /opt/llvm/bin/../include/c++/v1/functional:2419
#27 Envoy::Event::DispatcherImpl::runPostCallbacks (this=0x55ff2e669080) at external/envoy/source/common/event/dispatcher_impl.cc:233
#28 0x000055ff2ce880a6 in event_process_active_single_queue (base=0x55ff2e2dcdc0, activeq=0x55ff2e4fd030, max_to_process=2147483647, endtime=0x0) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:1713
#29 0x000055ff2ce86c2e in event_process_active (base=0x55ff2e2dcdc0) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:1808
#30 event_base_loop (base=0x55ff2e2dcdc0, flags=) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:2047
#31 0x000055ff2cb035b1 in Envoy::Server::WorkerImpl::threadRoutine (this=0x55ff2e537bd0, guard_dog=...) at external/envoy/source/server/worker_impl.cc:119
#32 0x000055ff2cf397e3 in operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1860
#33 operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:2419
#34 operator() (this=, arg=) at external/envoy/source/common/common/posix/thread_impl.cc:33
#35 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>)::$_0::__invoke(void*) (arg=) at external/envoy/source/common/common/posix/thread_impl.cc:32
#36 0x00007f353380a184 in start_thread (arg=0x7f3532c37700) at pthread_create.c:312
#37 0x00007f35335370ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Signed-off-by: Henry Yang <hyang@lyft.com>
@mattklein123 mattklein123 self-assigned this Jun 5, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this makes sense but needs more tests including the failure to lock cases, thank you!

/wait

Signed-off-by: Henry Yang <hyang@lyft.com>
@yuval-k
Copy link
Contributor

yuval-k commented Jun 8, 2020

for future reference, this is related to #9940

Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks like this is on the right track. I don't think you need all the locks() that you have, so would prefer to simplify if possible. Thank you!

/wait

});
}

// this method is always called from a InstanceSharedPtr we don't have to worry about tls_->getTyped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/this/This, same below

@@ -99,6 +118,11 @@ void InstanceImpl::ThreadLocalPool::onClusterAddOrUpdateNonVirtual(
if (cluster.info()->name() != cluster_name_) {
return;
}
// ensure the filter is still exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: start with capital, end with period, etc. Same elsewhere.

@@ -265,6 +296,12 @@ Common::Redis::Client::PoolRequest* InstanceImpl::ThreadLocalPool::makeRequestTo
return nullptr;
}

// ensure filter is not removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced you need all of these locks everywhere. How can we be making a request with a removed filter? I'm pretty sure you only need it in the construction path for the original set call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that when the connection pool instance is destroyed the TLS slot is not immediately destroyed. This leads to the code that cancel the pending requests and closing connections to be delayed as well. During this delay, a redirection from a pending request would cause a new request to be sent to the new host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Please add more comments around this and also fix format.

@@ -379,7 +419,9 @@ void InstanceImpl::PendingRequest::onResponse(Common::Redis::RespValuePtr&& resp
void InstanceImpl::PendingRequest::onFailure() {
request_handler_ = nullptr;
pool_callbacks_.onFailure();
parent_.parent_.onFailure();
if (auto shared_conn_pool = parent_.parent_.lock()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, if the filter goes away we should be cancelling things. I don't think these are all needed.

Co-authored-by: Matt Klein <mattklein123@gmail.com>
Signed-off-by: Henry Yang <hyang@lyft.com>

Signed-off-by: Henry Yang <4411287+HenryYYang@users.noreply.github.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable with one nit, but per offline convo did you plan on reworking this?

/wait

Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
# Conflicts:
#	source/extensions/filters/network/redis_proxy/conn_pool_impl.cc

Signed-off-by: Henry Yang <hyang@lyft.com>
@HenryYYang
Copy link
Contributor Author

This is already the reworked code. We're no longer doing a lock on each request, the lock is checked prior to construction of ThreadLocalPool and during the cluster update callback.

Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Comment on lines +235 to +237
if (resolve_timer_) {
resolve_timer_->disableTimer();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? If so comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment. Sorry I missed this one.

Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 2705235 into envoyproxy:master Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants